Access 2007 Tutorial 5.3. Delete Duplicate Records


Uploaded by djdates on 08.02.2010

Transcript:
In this video, I will very briefly talk about how you fix a problem if you have duplicate
data. In my previous video, I used this query to
append data to a table. If I ran this select query, I bring back 72 records. This is only
because, in design view, I selected my query, opened properties, and have set unique values
to yes. If I set unique values to no and ran the query,
notice that I'm bringing in 342 records and I have a large number of repeat records. Let's
say that I created my table and I inserted my records to my table, clicking on Append,
and selecting the rental table, and I accidentally inserted these records without making sure
that I was inserting unique values. I will run this query and notice that I am
appending 342 rows. So when I look at this destination table, I have all of these repeat
values which I do not want. So we will take a look at how we can fix this,
after the fact. The first thing I want to do, is create a
query that shows me the primary keys for the records that have duplicate information. I
will open query design, select my rental table, and pull down my customer ID, my movie ID,
and then I will pull down the ID field.
I want to see how many duplicates I have, so I will click on the totals button in the
ribbon and count on my duplicate IDs. Running this, I can see that I have duplicate
rows, but in some cases, I might not have any duplicate rows at all, so I want to make
sure I am not deleting single records. I am only going to delete records only where I
have duplicates. I will go back into query design and add a criteria, so I only see records
that are actually duplicates per group of Movie ID and Customer.
Clicking on run, notice that my group of single records has disappeared.
Back in design view, I will pull down my primary key again, and this time, instead of selecting
Count, I will select the option First. I will click on run and now it shows me the primary
key for the first record of a group of duplicate records.
Back in design, I will choose to not show my customer ID, movie ID, and rental count
that shows the number of duplicate records in the group. So I just have my query that
just shows me the primary key for the first record of a group of duplicate records.
Notice, that I am only seeing 67 records. I will return to design view. I want to take
this query and build on it. I will right click on it and go to SQL view
and look at the SQL code. I am selecting the first rental ID from rental,
grouping by customer ID and movie ID, having a rental ID count greater than 1.
I will select this SQL code and copy it. I will now create a new query, performing
this query against my rental table. I will select my ID and the criteria is going to
be records that are in my copied SQL query. I will type the keyword in and paste my SQL
code into a set of parenthesis. Notice that I am selecting records that are
only in my previous query. Again, I get 67 records.
Now, I can return to design view and select delete. Now I am deleting records where the
ID in the rental table was in my previous SQL statement.
I can run this query. Each time I run this query, I delete one record from each group
of duplicate records. Most of the duplicate records were duplicated five times, but some
were duplicated six times or four times. Each time I run this query, I delete some of the
duplicates. I can keep running this query until I run out of duplicate records.
Finally, I will run my query and the query will not be able to delete any duplicate records
because each record is unique. Now, if I look at my rental table, I can see
that I have gone from 342 record to 72 records, and I have deleted all of my duplicate records.
As you can see, a little SQL goes a long way. If I look at design view and look at my SQL
view, I see my delete statement deleting rows from the rental table where rental ID is in
my SQL select statement of rows that have counts greater than one. Knowing a little
SQL, such as knowing how queries can be nested in SQL, can be very helpful in Access.