Access 2007 Tutorial 2.1. Importing Data

Uploaded by djdates on 22.11.2009

In this video, we are going to import some data into Microsoft Access. In the US, because
of campaign finance laws, when individuals donate to a campaign for the house, the senate,
or the presidency, their information is recorded. I am going to open Firefox. I am going to
go to the website for the Federal Elections Commission in the United States. We are going
to take a look at one of these datasets because it is publicly available and it’s a great
opportunity to query the dataset. So I am going to go to the FEC website and
I am going to go to the section on the 2008 presidential elections. I am going to look
at New York state, since that is the state I live in. It looks like it is the second
largest state for campaign contributions so it is going to be a nice rich dataset.
I actually want to look at the entire state, so instead of clicking on the map, I am going
to come over and I am going to click on the campaign contributions for President Obama.
Again it is showing me a map, but this time the map just shows me the datasets for each
zip code for President Obama. I am going to click on the button that says export NY contributor
data. I am going to download this file to my desktop.
Now I am selecting President Obama, not for any particular political reasons, instead
he is the largest single destination for campaign contributions and by selecting a single candidate,
we have a one less variable in our analysis. My file is downloaded, so I am going to take
a look at my file. I am going to right click on my file and extract
it to the desktop. When I open up the folder, I see a single text file that is roughly 50
megabytes in size. And as you can see, the data is all comma delimited. That means that
each field in the file is separated by a comma. The text fields have quotation marks denoting
the text fields. This isn’t very useful for me, so I am going to close out of this
file. Now because it is a comma delimited text file, I can rename the file, and I am
going to change the extension from “.txt” to “.csv”.
Now it is a CSV document and it is associated with Microsoft Excel. I am going to double
click on this file to open it. Now I am not actually interested in bringing all of this
information into my database, so I am going to be selective and delete some of this information.
So what I have ended up with is fields for the contributor name, city, employer, occupation,
the receipt amount, and the receipt date. I am going to save my file as an Excel document.
Compare the file size of my original file in text format, which was roughly 50 megabytes,
to my Excel file, which is roughly 13 megabytes. That is because the Office 2007 formats are
actually compressed, so you end up with much smaller files than you did with previous versions
of Office. I am going to open Microsoft Access. I am
going to create a blank database. I am going to save it on my desktop and I am going to
call this database Obamabase. I am going to click on OK. I am going to click on Create
and it has opened a default blank table. I am not going to use this table, so I am going
to close the table, and the table disappears. I am going to go to the External Data tab
and I am going to import an Excel document. I am going to browse for my file, going to
my desktop, and selecting my Excel file. I am going to click on OK, importing the source
data into a new table in the current database. It starts the import spreadsheet wizard. I
am going to keep the checkbox checked saying “First row contains column headings”,
because my first row does contain column headings. If I uncheck this, it would bring the column
headings in as data, which I do not want. I am going to click on that checkbox and I
am going to go to the next screen. It is asking me for the data type for each
field. At this point, if I chose not to a import field, I could select that. Notice
that it has correctly identified this as a double numeric data type with a contribution
amount and the contribution receipt date has been correctly identified as a date/time.
So everything here looks good. If I wanted to at this point I could create indexes. Those
will also be created when I begin to analyze my tables. I am going to click on next. I
am going to let Access add my primary key for me.
I am going to click on next. And I am going to import it to a table and that name is fine,
so I will go ahead and finish. It is going to take a period of time to import my data.
Notice that I could have, at this point in time, checked to have the wizard analyze my
table after importing the data, but I can also do this by going to the appropriate icon
on the menu. My data has been imported. Notice that it
took several minutes to import my data. If I wanted to repeat this process again, I could
save the import steps. I am not interested in doing that. I am click on close. Here is
my new table containing the same data that was in my spreadsheet. I am going to close
out of this table. I am going to go to the database tools ribbon
and I am going to click on Analyze table. The table analyzer wizard discusses the benefits
of a relational database. For example, if you have a supplier listed multiple times
in your database, there is a potential that the information could be misspelled, the information
is repeated. This is why split data in multiple tables.
I am going to start the wizard and I want to decide what fields in go in what tables.
And I am going to click on next. So it shows me a table here and what I am going to do
is I am going to split the contributor table from the contribution table. I am going to
drag all the contributor information into its own table.
Notice as I do this, that the original table that is created has a primary key. As I create
a new table, it creates a new primary key in the new table I create, plus it creates
a one to many relationship to a foreign key that it creates in my original table.
I am going to click on next. It is going to analyze my table for typos. At this point,
if I wanted to, I would have the opportunity to apply corrections, but I am going to ignore
this, and click on next. It asks me if I want to create a query, but do this at a later
time and I will click on finish. It is now beginning to split my data in multiple tables,
generating the primary and foreign keys necessary. So now I have three tables in my database.
I have my contributor table, I have table 1 which I have to close first before I can
rename it, and then I will rename it as contribution, and I have my original table, which I no longer
need, so I am going to delete this table. If I go to the relationships view, I see my
primary keys in each table and I see the one to many relationship to the foreign key in
the contribution table. If I look at the contributor table, I have the name, city, employer, occupation,
and primary key for each contributor. I can click on the plus sign and expand it, and
see the individual contributions they have made. If I go to the contribution table, I
see the primary key, the contribution amount, and the date of the contribution, and I see
the foreign key to the contributor table. Access, when you perform the analyze table
function, creates the foreign keys as a lookup field. If I go into design view on the contribution
table, and look at the foreign key contributor ID, I can see the caption that says it is
a lookup field. And in the lookup field, I can see the SQL query which displays the lookup
information instead of the foreign key. Because I am used to dealing enterprise relational
databases such as Oracle, I generally dislike the lookup functionality and I will usually
delete the row source SQL information so that I actually see my foreign keys as numerals.
Now when I reopen the contribution table, I am going to see foreign keys instead of
seeing the name lookups. This is just a personal preference.
If you look at the number of records, I have 317,000 records in the contribution table
and I have 76,000 records in contributor table.