Business Solutions: Cleaning Up the Data: Using Queries in Access

        by James D. Duncan, CPA     January 2004

 

First of all, Happy New Year!  Hope your year went well and that your resolutions list is not too long.  This month’s article is about cleaning up data that you have inherited or data that you pulled over from your legacy system.

            Let’s look at a sample of the data.  I have only selected one column to use in this example.  That column is the person’s social security number.

 

SSN

442-82-6738

441-623965

 

123456789

123-45-6789

 

Now, you and I know that the field should be a standard length.  That length should be 9 or 11 depending on whether it contains the separators. And we would want the field to be standard throughout the system. Our challenge is to print a list that does not match those criteria.

            Assume we have a table named “SampleData” and that is has one field named SSN that contains the information above.  Open up a new query and add the table SampleData.  Then select the SSN field and add it to the fields in the criteria section.

The screen should look similar to this:

 

 

 

Now enter in the second field “SSNLen: len(ssn)”  without the quotes.  What we are doing is adding a field of our choosing.  We are using the len command which will determine the length of the SSN field.  The screen now displays:

 

Click on the view button and you will get the list of all the social security numbers and the length of each field.  Looks like this.

 

SSN

SSNLen

442-82-6738

11

441-623965

10

 

 

123456789

9

123-45-6789

11

 

Notice we have one line with nothing displayed.  This is null data; we will look at this in a little bit.  Now what we want to see is all the data that does not have a length of 11 in this case.  The test of your data would be up to you.  Now, how do we test the length of the SSNLen field?  Simple, we just ask for the records that are other than a length of 11.  In the criteria field of the SSNLen column enter in “<>11” without the quotes.  We are thinking we will get 3 records.  However we only get two.  Here is the list we get.

 

SSN

SSNLen

441-623965

10

123456789

9

 

That tricky null data is not playing nice with us.  We have to test for null data in a different way.  The real question we want to ask is the length of the field other than 11 OR is the data null?  In the field below the “<>11” enter IsNull.  Look at the left of the screen and you will see “or:” shown.  Now run the query.  We get all three records listed.

 

SSN

SSNLen

441-623965

10

 

 

123456789

9

 

The query screens now looks like this.

 

 

Now, you can run a list to have the data corrected.  You would want to add other pertinent data like name, address, etc., in order to make the correction process easier.  Then download the data again and run this process until you get no exception records.

The actual data in my case had an extra space loaded on the end of each SSN.  I had to run the trim command to remove that.  You will no doubt have unique problems.  Look at your data and reconcile the number of known good records to your expected list.  Ask the data questions that you are sure of the answers.  Then figure out why you did not get the answer you were so sure you would get. 

Send me your questions, to bj4duncan@aol.com.  I will try to get you an answer.  It is always interesting seeing what your problems are compared to what I have tripped over in the past.

 


James is a member of the OKCPCUG and an expert in Access and Excel. James can be reached at  BJ4DUNCAN@aol.com