Correlation Coefficients in SPSS/PASW

Uploaded by bartonpoulson on 22.05.2009

Hi, this is Bart Poulson and in this short tutorial, I'm gonna show how to calculate
correlations or several correlations in a matrix using SPSS, a statistical program that's
now known as PASW for "Predictive Analytic Software." I'm using version 17, but everything
I'm gonna do is identical in previous versions. Also, I'm doing this on my Mac, but the Windows
version works essentially exactly the same. I'm gonna use a data set called "world95"
that exists in the, uh, SPSS sample data so you can fish that one up if you want. This
is a, uh, data set that has information on 109 countries from Afghanistan through Zambia.
And it includes statistics like the population density, the major religion, literacy, and
so on. What I'm going to do is I'm gonna look at the association between several variables.
Uh, beginning with, about female life expectancy. And I'm gonna use some variables that I will
later use in a tutorial on multiple regressions. We can see what they look like individually
and then collectively. So the first thing you need to do to get the correlations is
to come up here to "Analyze" down to this one called "Correlate." We're gonna be doing
bivariate correlations and so this is the one we wanna select. The first variable I'm
gonna use is "Average female life expectancy" 'cuz in my regression, I'm gonna use that
as the outcome variable. I can just double click that and it goes over. Oops. I'm gonna
also use, um, literacy, the "Population who can read," the gross domestic—the "Gross
domestic product per capita," "Daily calorie intake," and "Birth rate per 1000." Uh, so
I have 5 variables here total and all I'm gonna do is I'm gonna leave all of the defaults—the
Pearson correlation, the two-tail test of significance, and flag significant correlations—I'm
gonna leave all of those the way they are and just press "OK." And what I have is an
output window. Unlike a spreadsheet, like Excel, when you perform a procedure in SPSS,
uh, the data exists in a window that looks like a spreadsheet, but the results of the
analyses come in a separate window called an output window. And when you're performing,
uh, many statistical analyses this is a much more sensible procedure. The first thing up
here is what's called a syntax statement. It's a written record of the command. The
nice thing about it is you can go back and use that again later if you're so inclined.
This tells us that we performed correlations. This is the name of the active data set, "world95.sav."
And this is the correlation matrix. Notice that we have the names of the five variables
here down the side from "Average female life expectancy" to "Birth rate per 1000 people."
And we have the same 5 variables listed across the top right here. Now, if in each cell,
that's each box, we have the correlation of the— of the variable in the row and
the variable in the column. So, what you have here, you see there's a bunch with a "1"?
That just is a variable correlated with itself and that's a perfect correlation. Correlations
go from 0, which indicates no linear relationship, to a 1, which indicates a perfect linear relationship
means everything falls exactly on a regression line. Um, plus and minus are simply indications
of whether it's an uphill or downhill relationship, either a direct or an inverse association.
Um, so the 1's down the diagonal are simply each variable correlated with itself and the
matrix is symmetrical on the diagonal. So you see, for instance, the 865 right here
is the same as the 865 right here. The 862, -862 right here is the same as the 862 over
here. Um, that's because life expectancy and birth rate is the s— the correlation
between the two is the same as the correlation between birthrate and life expectancy. It's—
the order doesn't matter. Now, what we have in each one of these cells is three numbers.
Let's take this one, the relationship between female life expectancy and the percent of
people in the country who can read. The first number is the Pearson Correlation—
that is the Pearson product moment correlation— better known to most people as R. In this
one, it's .865 which is a very high positive value. So for countries with higher levels
of literacy, women also have higher average life expectancies. As we go dow— oh,
the second number is the significance level or the P-value, and generally if this number
is less than .05 and all of these are, they're actually less than 001, then the, uh, correlation
is considered statistically significant, or you can consider it reliably different from
0. It's not exactly what it is, but it's close enough. And then this last number is the n,
is the— is the number of countries that have data on both of these variables.
You can see it varies a little bit. Here it's 107, whereas here it's 109. Right here, only
75 countries had information on both of the variables. Um, but you're still able to get
the correct correlations that we need. Now, the important thing that I wanna point out
about this because I'm also gonna use this correlation matrix when I do multiple regression,
is that all of the associations are statistically significant. Every one of 'em has these two
asterisks next to it and what that means is that the correlation is significant at less
than 01. In fact, normally people would put three asterisks to indicate it's less than
001. A single asterisk would indicate less than 05 which is the standard level of statistical
significance. So they are all statistically significant. All of these variables are highly
associated with each other, uh, not just in terms of P-values, but in terms of absolute
values. The smallest one is this correlation between literacy and GDP, and it's 552, which
by most standards is still a very large association. These ones down here with birth rate are negative
which means that a, again, the more children a woman has, eh, for each country, the lower
the life expectancy. Also, the lower the literacy rate. Also, the lower the GDP. Also, the lower
daily caloric intake. Uh, it probably has more to do with these countries seeing—
simply being less developed, having fewer resources and poorer health care. Anyhow,
this is a correlation matrix and it gives us the associations between each of these
variables, two variables at a time. In the next one I'm gonna look at multiple regression
which looks at the association between all of these variables collectively. Um, but that's
it for right now.