Statistics for Science


Uploaded by bozemanbiology on 30.11.2012

Transcript:



Hi. It's Mr. Andersen and in this video I'm going to talk about statistics
for science. And the reason why statistics is important in science is that's what statistics
is. It's basically collecting, organizing, analyzing, interpreting and then presenting
data so that people can use it. And so we don't usually do a lot of statistics in high
school science and that's too bad because it's very important we understand what to
do with data once we've collected it. And there's a big push right now to improve statistics
knowledge of high school students. And the reason why is that they become college students
and they're eventually going to have to start working with what's called Big Data. And so
what it is Big Data? Well the days of a scientist just lonely sitting by themself collecting
data are gone. Most science now is done by huge teams or groups or centers. And a lot
of it is crowd sourced and we're generating so much data right now that we actually have
to go through that. So what's an example? Well meteorology or the study of the weather
and climate on our planet, we just get more and more data, better data, but we have to
pour through this to make models and make predictions. Or genomics is sequencing of
the genome. So looking at the actual letters in the nucleotides in DNA and RNA. And we
sequenced the human genome but now we're sequencing all these different organisms and so all that
data is pouring in and we have to look through it. Or this is a term I hadn't even heard
of before, connectomics, which is basically using intense magnetic resonance imaging to
look at neurons. Looking at the brain. And then modeling that, using computers to model
individual neurons and then grow that into like a virtual brain. So you can see over
the next 20 or 30 years we're going to need scientists who understand what to do with
big data. And to understand big data, let's start by going through kind of the basics
of statistics. And so when you're dealing with statistics, one big thing when we're
talking broad is the idea of what a population is. And so a population is big. And so a population
is going to be everything. So it could be like all of the students in a class. So that
could be a population. But it can get much bigger than that. And so when we're studying
the population, not to be confused with like a population that we study in ecology, the
population, all of the characteristics of that are going to be called parameters. And
so an example of one that we'll actually use in science is N. That's the population size.
But I said that the population is everything. And so it could also be like all of the stars
in the universe. Or it could be all of the planets in the universe. Or it could be not
only one scientific experiment but an infinite number of scientific experiments that you
could do. And so it really is everything when we're talking about the population. And so
if we go back to an example of a population, well in science what we can do is take a sample
of that. So this is the population and then this is a sample of the population. And we
move from population where we study parameters and we get to the sample we have what are
called statistics. And so statistics are going to be characteristics of a sample. And hopefully
that's a random sample. And so a question, a really good question at this point might
be, which is more important? Is the population important or is the sample important? In other
words, which one do we use more? And I used to think, you know, the population has to
be the most important thing. We want to know everything. We want to know all the outcomes.
We want to know what the universe looks like and in fact it's the wrong answer. The right
answer and the most important thing is the sample, because you can never know everything,
but you can know a sample of that. And if you have a good understanding of the statistics
we can make predictions about everything. Predictions about the population. And so everything
I'm going to talk about, I'm talking about the sample because that's what scientists
do. We can't do every conceivable experiment. We can't gather every conceivable piece of
data. We just have to work with what's called the sample and make sense of that. So let
me give you an example of thus. This right here is, I remember reading there was a survey
and they asked scientists like what's the greatest scientific discovery of the last
100 years. So from 1900-2000. And I thought maybe it was going to be Einstein, relativity,
or quantum physics or all of those things. Actually the right answer, or the winner we'll
say was this guy. His name is Edwin Hubble. And you've probably heard of his name because
they named the Hubble space telescope after him. But you might not know what he did. And
so he sat here at the Mt. Wilson observatory and he looked at galaxies in the universe.
And what he found is that no matter where he looked in the universe, they seemed to
be shifted towards the red. So they were more red in color. What does that tell us? Well,
as objects move away from us, they get red-shifted. And so it told him that all of these galaxies
were moving away from us. In other words, everything in the universe is moving away.
And you can see that he just plotted that in a nice little scatter plot and then we
have a line of fit. And so did he measure all of the galaxies in the universe? No. But
he sampled, or he had a sample set of those. And from that we can make predictions and
what's the prediction that we make based on this? It's the idea of an expanding universe.
And the idea that all, since everything is expanding that means everything was together
at one point. And so this is that big bang theory. That idea that all of the universe
began at one singularity. And so let's get to some statistics. Let's actually get to
some numbers of the sample. And so let's go through these. The first one is going to be
the sample size. That's going to be the number of observations that you make. So that could
be the number in your sample group. In your random sample. Next we have what's called
an X bar or the mean. The mean and the average are going to be the exact same thing. So if
you know what an average is and how to figure it out, that's going to be the mean. Next
is the Median. Median is simply going to be the midpoint in between all of our data sets.
And then finally we have a range. And so this is a sample set over here. So let's say in
the science lab this is some data that you collect. And so could you figure out these
four things: sample size, median, mean and range? Well let me walk you through it. So
the first thing we could do is the sample size. And so sample size or n, get used to
that letter n right here, sample size is just going to be the number of samples that we
made. And so in this set we have 1, 2, 3, 4, 5, 6, 7. And so our n value is going to
be 7. Let's go to the next one. What's the mean or what's the average? Well to figure
that out all you do is add up all of these quantities and you're going to divide it by
the number of of quantities. And so if I add all these up together I get 35. If I divide
that by 7 which is the total number in my sample size I'm going to get a mean or average
of 5. How do you do the median? Or how do you find the midpoint? Well, you have to line
them up in order. So when I line it up in order basically what I can do is I can cross
it out from the sides. So I'll cross one out from the sides. I'll cross another one out
from the side and then we have the midpoint which is right here. So the median and the
mean in this case is going to equal 5. But you might think to yourself, what do I do
if it's not even or if it is even? In other words, what do I do here? Well I could knock
off 2 from each side and let me knock off another one from each side and now I have
5 and 6. So if this is our sample set, then our median is going to be the average between
5 & 6 or the average is going to be 5.5 Let's get to the range then. What's the range? The
range is going to be the difference between the extremes. And so this is the number 2
and this is the number 13. So this is my low and this is my high, then my range is simply
going to be 11 in this case. And so what are these? These are all simple statistics that
we can gather from a sample set. And again, it's a random sample from everything from
this big population. Last thing I want to leave you with is an idea that is sometimes
is confusing to students and that's called degrees of freedom. And we refer to that as
n minus 1. And so what is n? Well, n remember is going to be the sample size and where does
the freedom come from. Well I drew a, I drew a flag right here to help you remember that.
So what does it mean? What does a degree of freedom? Well think of it like this. This
is the best way to understand it. So imagine I have these three numbers and they are going
to add up to ten. And so this is A + B + C equals 10. And I say choose a random number.
And let's make it easy by just choosing a whole number. Well you might say that this
is 3. And so I'm going to choose this to be 3. And did you notice I had total freedom?
I had a freedom in my choice as to what number I was going to choose to represent A. I had
total freedom here. That was fun. Let me get a little more freedom. So let's say I've got
to choose the next number B. I want to go crazy. Maybe I want the next one to be 13.
I could choose any number in the world. In other words I have freedom to choose what
that is. And so now this is fun. I have a lot of freedom. So let's go to the last one
then. So we're going to say that this plus this plus this equals 10. So I've got a constraint
here. It's got to equal 10. Well now we've got to choose C. Well what can C be? Well
all of a sudden I've lost my freedom here. In other words if this is 3, this is 13, this
has to be negative 6 if I want this to be 10 because this is 16 minus 6. That's got
to be 10. And so all of a sudden I lost my freedom. And so when we're talking about degrees
of freedom how do you figure that out? Well you take the number in your data set, in this
case it's going to be 3 and you're going to subtract one from that. And so in this case
I have 2 degrees of freedom of there were two numbers at which I had a choice as to
what I was going to choose. And so this will be important in a couple of different ways.
Number 1 when we're figuring out standard deviation using n minus 1 or degrees of freedom,
we're going to get more accurate results, or more precise results. And so you'll see
this again when we calculate standard deviation. And then when we start comparing data sets,
when we do a Chi Squared test, it's important that you understand what a degree of freedom
is. So if we have two different groups then we'd only have 1 degree of freedom. Or if
we have eight different groups or eight different choices then we have seven degrees of freedom.
And so those are all statistics. Again their parts of the sample set which is part of everything
and it allows us to give meaning to math. And what I mean, I learned so much math in
high school especially in algebra two, but I didn't always know like when am I going
to use this? Statistics is something I promise you that you are going to use. If you move
on to college and hopefully get some kind of advanced degree or find an awesome job,
statistics will come back and it will find you at some point so you might as well learn
it now. So this is an intro on statistics and I hope that was helpful.