Uploaded by bartonpoulson on 22.05.2009

Transcript:

Hi, this is Bart Poulson and in this short tutorial, I'm gonna show how to calculate

correlations or several correlations in a matrix using SPSS, a statistical program that's

now known as PASW for "Predictive Analytic Software." I'm using version 17, but everything

I'm gonna do is identical in previous versions. Also, I'm doing this on my Mac, but the Windows

version works essentially exactly the same. I'm gonna use a data set called "world95"

that exists in the, uh, SPSS sample data so you can fish that one up if you want. This

is a, uh, data set that has information on 109 countries from Afghanistan through Zambia.

And it includes statistics like the population density, the major religion, literacy, and

so on. What I'm going to do is I'm gonna look at the association between several variables.

Uh, beginning with, about female life expectancy. And I'm gonna use some variables that I will

later use in a tutorial on multiple regressions. We can see what they look like individually

and then collectively. So the first thing you need to do to get the correlations is

to come up here to "Analyze" down to this one called "Correlate." We're gonna be doing

bivariate correlations and so this is the one we wanna select. The first variable I'm

gonna use is "Average female life expectancy" 'cuz in my regression, I'm gonna use that

as the outcome variable. I can just double click that and it goes over. Oops. I'm gonna

also use, um, literacy, the "Population who can read," the gross domesticâ€”the "Gross

domestic product per capita," "Daily calorie intake," and "Birth rate per 1000." Uh, so

I have 5 variables here total and all I'm gonna do is I'm gonna leave all of the defaultsâ€”the

Pearson correlation, the two-tail test of significance, and flag significant correlationsâ€”I'm

gonna leave all of those the way they are and just press "OK." And what I have is an

output window. Unlike a spreadsheet, like Excel, when you perform a procedure in SPSS,

uh, the data exists in a window that looks like a spreadsheet, but the results of the

analyses come in a separate window called an output window. And when you're performing,

uh, many statistical analyses this is a much more sensible procedure. The first thing up

here is what's called a syntax statement. It's a written record of the command. The

nice thing about it is you can go back and use that again later if you're so inclined.

This tells us that we performed correlations. This is the name of the active data set, "world95.sav."

And this is the correlation matrix. Notice that we have the names of the five variables

here down the side from "Average female life expectancy" to "Birth rate per 1000 people."

And we have the same 5 variables listed across the top right here. Now, if in each cell,

that's each box, we have the correlation of theâ€” of the variable in the row and

the variable in the column. So, what you have here, you see there's a bunch with a "1"?

That just is a variable correlated with itself and that's a perfect correlation. Correlations

go from 0, which indicates no linear relationship, to a 1, which indicates a perfect linear relationship

means everything falls exactly on a regression line. Um, plus and minus are simply indications

of whether it's an uphill or downhill relationship, either a direct or an inverse association.

Um, so the 1's down the diagonal are simply each variable correlated with itself and the

matrix is symmetrical on the diagonal. So you see, for instance, the 865 right here

is the same as the 865 right here. The 862, -862 right here is the same as the 862 over

here. Um, that's because life expectancy and birth rate is the sâ€” the correlation

between the two is the same as the correlation between birthrate and life expectancy. It'sâ€”

the order doesn't matter. Now, what we have in each one of these cells is three numbers.

Let's take this one, the relationship between female life expectancy and the percent of

people in the country who can read. The first number is the Pearson Correlationâ€”

that is the Pearson product moment correlationâ€” better known to most people as R. In this

one, it's .865 which is a very high positive value. So for countries with higher levels

of literacy, women also have higher average life expectancies. As we go dowâ€” oh,

the second number is the significance level or the P-value, and generally if this number

is less than .05 and all of these are, they're actually less than 001, then the, uh, correlation

is considered statistically significant, or you can consider it reliably different from

0. It's not exactly what it is, but it's close enough. And then this last number is the n,

is theâ€” is the number of countries that have data on both of these variables.

You can see it varies a little bit. Here it's 107, whereas here it's 109. Right here, only

75 countries had information on both of the variables. Um, but you're still able to get

the correct correlations that we need. Now, the important thing that I wanna point out

about this because I'm also gonna use this correlation matrix when I do multiple regression,

is that all of the associations are statistically significant. Every one of 'em has these two

asterisks next to it and what that means is that the correlation is significant at less

than 01. In fact, normally people would put three asterisks to indicate it's less than

001. A single asterisk would indicate less than 05 which is the standard level of statistical

significance. So they are all statistically significant. All of these variables are highly

associated with each other, uh, not just in terms of P-values, but in terms of absolute

values. The smallest one is this correlation between literacy and GDP, and it's 552, which

by most standards is still a very large association. These ones down here with birth rate are negative

which means that a, again, the more children a woman has, eh, for each country, the lower

the life expectancy. Also, the lower the literacy rate. Also, the lower the GDP. Also, the lower

daily caloric intake. Uh, it probably has more to do with these countries seeingâ€”

simply being less developed, having fewer resources and poorer health care. Anyhow,

this is a correlation matrix and it gives us the associations between each of these

variables, two variables at a time. In the next one I'm gonna look at multiple regression

which looks at the association between all of these variables collectively. Um, but that's

it for right now.

correlations or several correlations in a matrix using SPSS, a statistical program that's

now known as PASW for "Predictive Analytic Software." I'm using version 17, but everything

I'm gonna do is identical in previous versions. Also, I'm doing this on my Mac, but the Windows

version works essentially exactly the same. I'm gonna use a data set called "world95"

that exists in the, uh, SPSS sample data so you can fish that one up if you want. This

is a, uh, data set that has information on 109 countries from Afghanistan through Zambia.

And it includes statistics like the population density, the major religion, literacy, and

so on. What I'm going to do is I'm gonna look at the association between several variables.

Uh, beginning with, about female life expectancy. And I'm gonna use some variables that I will

later use in a tutorial on multiple regressions. We can see what they look like individually

and then collectively. So the first thing you need to do to get the correlations is

to come up here to "Analyze" down to this one called "Correlate." We're gonna be doing

bivariate correlations and so this is the one we wanna select. The first variable I'm

gonna use is "Average female life expectancy" 'cuz in my regression, I'm gonna use that

as the outcome variable. I can just double click that and it goes over. Oops. I'm gonna

also use, um, literacy, the "Population who can read," the gross domesticâ€”the "Gross

domestic product per capita," "Daily calorie intake," and "Birth rate per 1000." Uh, so

I have 5 variables here total and all I'm gonna do is I'm gonna leave all of the defaultsâ€”the

Pearson correlation, the two-tail test of significance, and flag significant correlationsâ€”I'm

gonna leave all of those the way they are and just press "OK." And what I have is an

output window. Unlike a spreadsheet, like Excel, when you perform a procedure in SPSS,

uh, the data exists in a window that looks like a spreadsheet, but the results of the

analyses come in a separate window called an output window. And when you're performing,

uh, many statistical analyses this is a much more sensible procedure. The first thing up

here is what's called a syntax statement. It's a written record of the command. The

nice thing about it is you can go back and use that again later if you're so inclined.

This tells us that we performed correlations. This is the name of the active data set, "world95.sav."

And this is the correlation matrix. Notice that we have the names of the five variables

here down the side from "Average female life expectancy" to "Birth rate per 1000 people."

And we have the same 5 variables listed across the top right here. Now, if in each cell,

that's each box, we have the correlation of theâ€” of the variable in the row and

the variable in the column. So, what you have here, you see there's a bunch with a "1"?

That just is a variable correlated with itself and that's a perfect correlation. Correlations

go from 0, which indicates no linear relationship, to a 1, which indicates a perfect linear relationship

means everything falls exactly on a regression line. Um, plus and minus are simply indications

of whether it's an uphill or downhill relationship, either a direct or an inverse association.

Um, so the 1's down the diagonal are simply each variable correlated with itself and the

matrix is symmetrical on the diagonal. So you see, for instance, the 865 right here

is the same as the 865 right here. The 862, -862 right here is the same as the 862 over

here. Um, that's because life expectancy and birth rate is the sâ€” the correlation

between the two is the same as the correlation between birthrate and life expectancy. It'sâ€”

the order doesn't matter. Now, what we have in each one of these cells is three numbers.

Let's take this one, the relationship between female life expectancy and the percent of

people in the country who can read. The first number is the Pearson Correlationâ€”

that is the Pearson product moment correlationâ€” better known to most people as R. In this

one, it's .865 which is a very high positive value. So for countries with higher levels

of literacy, women also have higher average life expectancies. As we go dowâ€” oh,

the second number is the significance level or the P-value, and generally if this number

is less than .05 and all of these are, they're actually less than 001, then the, uh, correlation

is considered statistically significant, or you can consider it reliably different from

0. It's not exactly what it is, but it's close enough. And then this last number is the n,

is theâ€” is the number of countries that have data on both of these variables.

You can see it varies a little bit. Here it's 107, whereas here it's 109. Right here, only

75 countries had information on both of the variables. Um, but you're still able to get

the correct correlations that we need. Now, the important thing that I wanna point out

about this because I'm also gonna use this correlation matrix when I do multiple regression,

is that all of the associations are statistically significant. Every one of 'em has these two

asterisks next to it and what that means is that the correlation is significant at less

than 01. In fact, normally people would put three asterisks to indicate it's less than

001. A single asterisk would indicate less than 05 which is the standard level of statistical

significance. So they are all statistically significant. All of these variables are highly

associated with each other, uh, not just in terms of P-values, but in terms of absolute

values. The smallest one is this correlation between literacy and GDP, and it's 552, which

by most standards is still a very large association. These ones down here with birth rate are negative

which means that a, again, the more children a woman has, eh, for each country, the lower

the life expectancy. Also, the lower the literacy rate. Also, the lower the GDP. Also, the lower

daily caloric intake. Uh, it probably has more to do with these countries seeingâ€”

simply being less developed, having fewer resources and poorer health care. Anyhow,

this is a correlation matrix and it gives us the associations between each of these

variables, two variables at a time. In the next one I'm gonna look at multiple regression

which looks at the association between all of these variables collectively. Um, but that's

it for right now.