Uploaded by bozemanbiology on 30.11.2012

Transcript:

Hi. It's Mr. Andersen and in this video I'm going to talk about statistics

for science. And the reason why statistics is important in science is that's what statistics

is. It's basically collecting, organizing, analyzing, interpreting and then presenting

data so that people can use it. And so we don't usually do a lot of statistics in high

school science and that's too bad because it's very important we understand what to

do with data once we've collected it. And there's a big push right now to improve statistics

knowledge of high school students. And the reason why is that they become college students

and they're eventually going to have to start working with what's called Big Data. And so

what it is Big Data? Well the days of a scientist just lonely sitting by themself collecting

data are gone. Most science now is done by huge teams or groups or centers. And a lot

of it is crowd sourced and we're generating so much data right now that we actually have

to go through that. So what's an example? Well meteorology or the study of the weather

and climate on our planet, we just get more and more data, better data, but we have to

pour through this to make models and make predictions. Or genomics is sequencing of

the genome. So looking at the actual letters in the nucleotides in DNA and RNA. And we

sequenced the human genome but now we're sequencing all these different organisms and so all that

data is pouring in and we have to look through it. Or this is a term I hadn't even heard

of before, connectomics, which is basically using intense magnetic resonance imaging to

look at neurons. Looking at the brain. And then modeling that, using computers to model

individual neurons and then grow that into like a virtual brain. So you can see over

the next 20 or 30 years we're going to need scientists who understand what to do with

big data. And to understand big data, let's start by going through kind of the basics

of statistics. And so when you're dealing with statistics, one big thing when we're

talking broad is the idea of what a population is. And so a population is big. And so a population

is going to be everything. So it could be like all of the students in a class. So that

could be a population. But it can get much bigger than that. And so when we're studying

the population, not to be confused with like a population that we study in ecology, the

population, all of the characteristics of that are going to be called parameters. And

so an example of one that we'll actually use in science is N. That's the population size.

But I said that the population is everything. And so it could also be like all of the stars

in the universe. Or it could be all of the planets in the universe. Or it could be not

only one scientific experiment but an infinite number of scientific experiments that you

could do. And so it really is everything when we're talking about the population. And so

if we go back to an example of a population, well in science what we can do is take a sample

of that. So this is the population and then this is a sample of the population. And we

move from population where we study parameters and we get to the sample we have what are

called statistics. And so statistics are going to be characteristics of a sample. And hopefully

that's a random sample. And so a question, a really good question at this point might

be, which is more important? Is the population important or is the sample important? In other

words, which one do we use more? And I used to think, you know, the population has to

be the most important thing. We want to know everything. We want to know all the outcomes.

We want to know what the universe looks like and in fact it's the wrong answer. The right

answer and the most important thing is the sample, because you can never know everything,

but you can know a sample of that. And if you have a good understanding of the statistics

we can make predictions about everything. Predictions about the population. And so everything

I'm going to talk about, I'm talking about the sample because that's what scientists

do. We can't do every conceivable experiment. We can't gather every conceivable piece of

data. We just have to work with what's called the sample and make sense of that. So let

me give you an example of thus. This right here is, I remember reading there was a survey

and they asked scientists like what's the greatest scientific discovery of the last

100 years. So from 1900-2000. And I thought maybe it was going to be Einstein, relativity,

or quantum physics or all of those things. Actually the right answer, or the winner we'll

say was this guy. His name is Edwin Hubble. And you've probably heard of his name because

they named the Hubble space telescope after him. But you might not know what he did. And

so he sat here at the Mt. Wilson observatory and he looked at galaxies in the universe.

And what he found is that no matter where he looked in the universe, they seemed to

be shifted towards the red. So they were more red in color. What does that tell us? Well,

as objects move away from us, they get red-shifted. And so it told him that all of these galaxies

were moving away from us. In other words, everything in the universe is moving away.

And you can see that he just plotted that in a nice little scatter plot and then we

have a line of fit. And so did he measure all of the galaxies in the universe? No. But

he sampled, or he had a sample set of those. And from that we can make predictions and

what's the prediction that we make based on this? It's the idea of an expanding universe.

And the idea that all, since everything is expanding that means everything was together

at one point. And so this is that big bang theory. That idea that all of the universe

began at one singularity. And so let's get to some statistics. Let's actually get to

some numbers of the sample. And so let's go through these. The first one is going to be

the sample size. That's going to be the number of observations that you make. So that could

be the number in your sample group. In your random sample. Next we have what's called

an X bar or the mean. The mean and the average are going to be the exact same thing. So if

you know what an average is and how to figure it out, that's going to be the mean. Next

is the Median. Median is simply going to be the midpoint in between all of our data sets.

And then finally we have a range. And so this is a sample set over here. So let's say in

the science lab this is some data that you collect. And so could you figure out these

four things: sample size, median, mean and range? Well let me walk you through it. So

the first thing we could do is the sample size. And so sample size or n, get used to

that letter n right here, sample size is just going to be the number of samples that we

made. And so in this set we have 1, 2, 3, 4, 5, 6, 7. And so our n value is going to

be 7. Let's go to the next one. What's the mean or what's the average? Well to figure

that out all you do is add up all of these quantities and you're going to divide it by

the number of of quantities. And so if I add all these up together I get 35. If I divide

that by 7 which is the total number in my sample size I'm going to get a mean or average

of 5. How do you do the median? Or how do you find the midpoint? Well, you have to line

them up in order. So when I line it up in order basically what I can do is I can cross

it out from the sides. So I'll cross one out from the sides. I'll cross another one out

from the side and then we have the midpoint which is right here. So the median and the

mean in this case is going to equal 5. But you might think to yourself, what do I do

if it's not even or if it is even? In other words, what do I do here? Well I could knock

off 2 from each side and let me knock off another one from each side and now I have

5 and 6. So if this is our sample set, then our median is going to be the average between

5 & 6 or the average is going to be 5.5 Let's get to the range then. What's the range? The

range is going to be the difference between the extremes. And so this is the number 2

and this is the number 13. So this is my low and this is my high, then my range is simply

going to be 11 in this case. And so what are these? These are all simple statistics that

we can gather from a sample set. And again, it's a random sample from everything from

this big population. Last thing I want to leave you with is an idea that is sometimes

is confusing to students and that's called degrees of freedom. And we refer to that as

n minus 1. And so what is n? Well, n remember is going to be the sample size and where does

the freedom come from. Well I drew a, I drew a flag right here to help you remember that.

So what does it mean? What does a degree of freedom? Well think of it like this. This

is the best way to understand it. So imagine I have these three numbers and they are going

to add up to ten. And so this is A + B + C equals 10. And I say choose a random number.

And let's make it easy by just choosing a whole number. Well you might say that this

is 3. And so I'm going to choose this to be 3. And did you notice I had total freedom?

I had a freedom in my choice as to what number I was going to choose to represent A. I had

total freedom here. That was fun. Let me get a little more freedom. So let's say I've got

to choose the next number B. I want to go crazy. Maybe I want the next one to be 13.

I could choose any number in the world. In other words I have freedom to choose what

that is. And so now this is fun. I have a lot of freedom. So let's go to the last one

then. So we're going to say that this plus this plus this equals 10. So I've got a constraint

here. It's got to equal 10. Well now we've got to choose C. Well what can C be? Well

all of a sudden I've lost my freedom here. In other words if this is 3, this is 13, this

has to be negative 6 if I want this to be 10 because this is 16 minus 6. That's got

to be 10. And so all of a sudden I lost my freedom. And so when we're talking about degrees

of freedom how do you figure that out? Well you take the number in your data set, in this

case it's going to be 3 and you're going to subtract one from that. And so in this case

I have 2 degrees of freedom of there were two numbers at which I had a choice as to

what I was going to choose. And so this will be important in a couple of different ways.

Number 1 when we're figuring out standard deviation using n minus 1 or degrees of freedom,

we're going to get more accurate results, or more precise results. And so you'll see

this again when we calculate standard deviation. And then when we start comparing data sets,

when we do a Chi Squared test, it's important that you understand what a degree of freedom

is. So if we have two different groups then we'd only have 1 degree of freedom. Or if

we have eight different groups or eight different choices then we have seven degrees of freedom.

And so those are all statistics. Again their parts of the sample set which is part of everything

and it allows us to give meaning to math. And what I mean, I learned so much math in

high school especially in algebra two, but I didn't always know like when am I going

to use this? Statistics is something I promise you that you are going to use. If you move

on to college and hopefully get some kind of advanced degree or find an awesome job,

statistics will come back and it will find you at some point so you might as well learn

it now. So this is an intro on statistics and I hope that was helpful.