Scatterplots in SPSS/PASW

Uploaded by bartonpoulson on 21.05.2009

Hi, my name's Bart Poulson and I'm gonna take a few minutes to show how to do a bivariate
scatterplot in SPSS Version 17. Now SPSS used to stand for "Statistical Package for the
Social Sciences" but most recently they decided to change their name to PASW which is what
you'll see up in the corner over here which stands for "Predictive Analytics Software."
Go figure. Anyhow, we're gonna show you how to do this. I'm doing this in Version 17 on
my Macintosh, however, it is 99.9% the same in the Windows version, uh, and most of what
I'm gonna do is exactly the same in all previous versions of SPSS down to at least Version
12 that I've worked with. Anyhow, the first thing we need to do is open up a data set.
I'm gonna use one that already exists in SPSS so if you've got it you can follow along.
W-—the first—what I do is I come over here to under File to the file folder
to open up a data document. Click on that. And SPSS has a bunch of data files already.
You might have to dig through the folders to find them a little bit, but here they are.
I'm gonna go to one that's near the end that is called "world95.sav." It's some global
statistics from 1995. The ".sav" is the suffix for an SPSS data file. So I'm gonna open that
one up. It does this little wooshy thing. And what I have is statistics for 109 different
countries from Afghanistan through Georgia through Peru down through Zambia. And the
United States is in there on line 102. A bunch of statistics on population, population density,
the percent who live in urban, the predominant religion in the area... This one is life expectancy
for women. Life expectancy for men... The, uh, literacy rate—the percentage of
people who read; population increase; infant mortality; and so on and so forth. Daily calories...
Um, what I'm gonna do is I'm gonna pick two of these variables, um, and show you how to
do a bivariate scatterplot. So the first thing I'm gonna do is I'm gonna come over here to
"Graphs." Now, um, SPSS has a fancy new thing called "Chart Builder," um, and you can learn
to use it if you want. However, since most of the previous versions do not have that,
truthfully, it's not what I'm accustomed to, I'm gonna come down here to "Legacy Dialogs."
And I'm gonna come down to this bottom one right here near the bottom. It's called Scatter
and Dot plot. Alright, so "Graphs", "Legacy Dialogs", "Scatter/Dot." Click on that. And
I'm given the choice of several different kinds of scatterplots. This is the kind I'm
gonna use. It's just a regular x and y axis. This is if you have a whole bunch of variables
you wanna use. Maybe we'll do that some other time. and here's some other versions. This
one's kind of neat. It's a three dimensional scatterplot. But I'm gonna stay with a simple
one right here, the "Sample Scatter" which is already selected. You can tell that by
the thick black border around it. I'm gonna press "Define" and it's gonna ask me for the
variables. Now over here is a big list of all of the variables. The ones with the little
rulers next to them mean that they are scaled variables or measured variables, uh, indicating
more or less in certain units. Uh, this one, on the other hand, is a categorical variable
and so is this one with the—with the dots. Uh, truthfully, you can stick just about
anything in a scatterplot though it's only with the ranked and the measured variables
that it's gonna make sense. Anyhow, what I'm gonna do is I'm gonna look at the relationship
between literacy, which I'm gonna put on the x-axis 'cuz I'm gonna use it—that's
across the bottom—I'm gonna use it as a predictor, and the average female life
expectancy on the y-axis. So I'm using literacy to predict woman's life expectancy. Now, um,
what I'm gonna do before I, uh, move ahead is I'm gonna put a title on this. You should
always, always, always, title your charts. And so I'm gonna put here [typing] "Literacy
& Women's Life—" oops "Expectancy for 109 Countries." Good. I'm gonna press
"Continue." And then I'm just gonna come down here and press "OK." Alright. This is the
SPSS output window. If you're accustomed to, uh, Excel, it—it sticks everything
on the same spreadsheet. SPSS does it differently. The data is on one page, the output comes
on another page. There's also something called a syntax page which is somewhere else. Uh,
truthfully, I find this to be convenient, but uh, anyhow. What I have right here is
this gives us the code for what we just did, the syntax. Uh, at another point I'll show
why that's handy to have. But this is the scatterplot right here. This one right here.
And what you can see is that this is the percentage of people who can read across the bottom.
It goes from 0 right here up to 100 right here. And you can tell there's a lot of people,
uh, centered right on 100. Uh, most countries have very high literacy rates. Average female
life expectancy here, you know, the tragedy is some of these places are as low as, you
know, 40 something, um, but, uh, a larger number go up into the 80s. Now this is the
default, uh, scatterplot, however there's a few things I wanna do that I think make
it a much more informative and better chart. Um, to edit this chart you have to double
click on it and it brings up an editing window. So I'm gonna click twice on the chart and
then I'm in a, uh, chart editor window. You can see that right here. The first thing I'm
gonna do is I'm gonna stick a regression line through it, a straight line through the data
which I think is—should always be a first step. I can do that by just clicking
on this thing right here that says "Add Fit Line at Total." That just means add a straight
regression line. So I click that and there comes a line right through the data. This
is, uh, what's called the least squares, uh, regression line. It's the most common one.
And I'm gonna do just two more things here. Number one, I think that charts are to give
general impressions and not necessarily to give specifics and so something that I don't
want is this thing there that talks about the R2 linear. This says .749. By the way,
that's a very high association. You can tell by how closely things adhere to the regression
line. But what I'm gonna do is I'm gonna click on that little thing there once to select
it then I'm just gonna delete it because I don't want it there. The other one is I think
it's a little easier to look at the chart if you change the way that the dots appear.
These ones are black circles and you know, truthfully, the circles, I don't quite understand.
So what I'm gonna do is—strangely enough you can't choose to have them be just a dot.
So I'm gonna keep 'em as a circle, but what I'm gonna do then is I'm gonna make the circles
really small, size 3, that's as small as it goes. I'm gonna make the border a little bigger
and I'm gonna make it red. And when I click that, that tells me a little bit what it's
gonna look like. Um, you have to press "Apply" in the editor, uh, windows to make anything
happen. So I press "Apply" and tada! There I am with a bunch of small red dots. And with
a black regression line, it's much easier to read it all. So I'm gonna press "Close"
on this one over here. "Close." And this is the editing window. It's still open and I
need to close that one as well. So I'm just gonna come up here to the—on my Mac
it's the red dot, on Windows there'll be a—a little x box over here, I believe. But I close
on that one and there's my scatterplot. We see that it's a strong uphill trend that countries
with higher literacy rates tend to have substantially longer female life expectancies and there's
a number of reasons why that may be the case. Uh, often just having to do with the level
of development in education and health and available in the country, but this is a great
chart for showing a strong association between the two variables. A little bit later, we'll
go over how to calculate the regression, the actual statistical numerical description of
the relationship as well as look at some other perspectives on the data. Thank you.