Math Seminar: Why People Trust Statistics


Uploaded by GRCCtv on 08.12.2011

Transcript:
>> SO, I THOUGHT, "THIS IS COOL."
SO, THIS IS COOL.
I HOPE YOU HAVE A GREAT TIME WITH-- STEVE, IT'S ALL YOURS.
>> THANK YOU.
WELL, THANKS FOR COMING.
I'M INTERESTED IN WHOEVER THE SECOND SPEAKER IS.
HE'S GOT MY PICTURE PLUS SOMEONE ELSE UP HERE.
I DON'T KNOW... (laughing)
>> ON THE LARK, WE HAVE (distinct), ON THE RIGHT, WE HAVE LAPLACE.
>> OH, OKAY... ALL RIGHT.
UM...
SO, NOT VERY MANY PEOPLE LAUGHED.
THAT MUST MEAN IT'S GETTING LATE IN THE DAY.
OKAY, UH, LET'S SEE.
IT'S INTERESTING BECAUSE SOME OF YOU TOLD ME THAT--
AND MAYBE THE INSTRUCTOR'S HERE, I DON'T KNOW--
BUT ONE OF THE 215 INSTRUCTORS TOLD THEIR CLASS THAT IF THEY COME,
THEY GET EXTRA CREDIT.
AND I TALKED TO A COUPLE OF YOU ALREADY AND IT'S LIKE, "I WANNA SEE THIS."
AND IT'S NOT BECAUSE OF ME, IT'S BECAUSE OF THE TOPIC.
OKAY, I THINK A LOT OF PEOPLE STRUGGLE WITH WHAT THIS WHOLE THING
ABOUT CENTRAL LIMIT THEOREM IS.
I STARTED TEACHING STATS, UM, DIFFERENT TIMES AS AN ADJUNCT
AND IT DIDN'T TAKE VERY LONG BEFORE I FIGURED OUT
THAT A LOT OF STUDENTS REALLY, REALLY STRUGGLED WITH UNDERSTANDING--
COME IN-- THE BASICS OF "WHY IN THE WORLD STATISTICS WORKS,"
AND FOR SOME PEOPLE, THIS WHOLE THING ABOUT CENTRAL LIMIT THEOREM
IS JUST ANOTHER CHAPTER IN THE BOOK.
BUT IT REALLY IS CENTRAL TO THE WHOLE CONCEPT
OF WHAT IN THE WORLD IS GOING ON WITH STATISTICS.
UM...
I'M GONNA GIVE ABOUT TWO MINUTES' WORTH OF HISTORY ON STATISTICS,
IF I CAN FIND IT HERE.
TRIED TO PULL SOME THINGS UP HERE.
HOPE YOU CAN READ RUSSIAN.
(audience chuckling)
WENT-- I WENT OUT LOOKING, ON THE INTERNET,
FOR THE HISTORY OF THE CENTRAL LIMIT THEOREM.
I FOUND THIS DOCUMENT THAT SOMEBODY WROTE UP, PROBABLY FOR CLASS.
IT'S-- IF YOU LOOK AT THE LINK ON IT, IT'S GOT .FI,
SO IT'S SOMEBODY AT THE UNIVERSITY IN FINLAND, I WOULD PRESUME.
THIS GUY-- MAX METHER.
BUT WHAT I'M INTERESTED IN HERE-- A COUPLE OF QUICK THINGS.
ONE IS JUST SOME NAMES.
OKAY?
LOT OF NAMES THAT, IF YOU HAVE A HISTORY OF MATHEMATICS,
WILL BE REALLY FAMILIAR TO YOU.
MAYBE YOU DON'T KNOW WHAT THESE PEOPLE DID,
BUT THESE ARE NAMES THAT YOU WOULD HEAR-- LAPLACE.
UH, WHERE ELSE COULD WE GO?
POISSON.
UM, DIRICHLET, BESSEL, CAUCHY, CHEBYSHEV--
LOT OF THINGS GOING ON WITH HIM WITH STATISTICS.
MARKOV... UM...
LIONA-- I KNEW I WAS GONNA TRY TO SAY THIS-- LYAPUNOV.
THAT'S ONE I DON'T KNOW.
AND THEN, SOME MORE RECENT NAMES HERE.
LINDEBURG, FELLER, LEVY.
UM... THE HISTORY OF STATISTICS IS ONLY, UH...
THIS GOES BACK TO THE 1600s, I THINK.
16, 1700s.
SO, STATISTICS, AS A FIELD, IS NOT VERY OLD.
BASED ON PROBABILITY, WHICH CAME ALONG RIGHT BEFORE THAT.
SO...
AND PEOPLE HAVE ONLY REALLY BEEN USING STATISTICS
SINCE EARLY IN THE 20th CENTURY.
WE'RE EARLY IN THE 21st CENTURY,
SO PEOPLE HAVE ONLY BEEN USING STATISTICS FOR ABOUT 100 YEARS,
AND THE BIGGEST ISSUE WAS "WHY IN THE WORLD COULD YOU TRUST IT?"
WHY WOULD YOU MAKE-- YOU KNOW, IN THE EQUIVALENT OF OUR DOLLARS TODAY,
WHY WOULD YOU MAKE BILLION DOLLAR DECISIONS
ON A BUNCH OF NUMBERS THAT--
YOU KNOW, THE REPUTATION FOR STATISTICS
IS YOU CAN MAKE IT SAY ANYTHING YOU WANT...
WHICH, IF YOU'RE GOOD, YOU CAN.
(laughing) THERE'S SOME TRUTH TO THAT.
BUT ANYWAY, THESE PEOPLE SPENT ALL-- QUITE A BIT OF TIME--
THIS-- ALL-- EVERYTHING THAT'S HERE OCCURRED
OVER A PERIOD OF ABOUT 250 YEARS.
AND SPECIFICALLY, WHAT HE'S LISTED HERE
ARE TOPICS THAT PEOPLE
HAVE ADDRESSED, FIRMING UP THE CENTRAL LIMIT THEOREM.
LAPLACE KINDA STUMBLED ACROSS IT...
AND THE COMMENT HE MAKES
IS THAT HE ALMOST HAD THE WHOLE THING FIGURED OUT.
FROM A, UM-- FROM A-- I'M LOSING THE WORD HERE--
FROM A REAL LIFE PERSPECTIVE, HE HAD IT FIGURED OUT.
BUT FROM A THEORETICAL PERSPECTIVE, HE DIDN'T.
AND IT TOOK PEOPLE QUITE A WHILE
TO COME ALONG BEHIND HIM AND FIRM PIECES UP.
BUT PEOPLE END UP TRUSTING STATISTICS BECAUSE OF WHAT THEY DID.
AND SO, WHAT I'M GOING TO DO
IS I'M GOING TO EXPLAIN ALL THESE FORMULAS.
IF I DO THAT, NOBODY'S GONNA STAY MORE THAN TWO MINUTES.
(laughing) SO...
NO, I JUST WANTED TO LET YOU SEE THERE'S SOME PRETTY SERIOUS MATH BEHIND ALL THIS.
OKAY?
BUT IF YOU ARE IN AN INTRODUCTORY, ELEMENTARY STATS CLASS,
YOU DON'T CARE ABOUT THAT STUFF.
OKAY?
YOU HEAR PEOPLE SAYING, "THIS THING'S REALLY IMPORTANT."
OKAY?
IN FACT, SOME OF THESE PEOPLE...
WE'LL SEE-- WELL, IF YOU'RE FAMILIAR WITH IT,
YOU UNDERSTAND WHAT CENTRAL LIMIT THEOREM DOES A LITTLE BIT, MAYBE.
MAYBE YOU DON'T-- THAT'S WHY YOU'RE HERE.
BUT, UM...
I WAS GONNA SHOW SOMETHING ELSE.
WHERE'D IT GO?
OH, MAYBE THAT'S IT.
UM...
THEY NAMED THIS "THE CENTRAL LIMIT THEOREM"
BECAUSE OF ITS IMPORTANCE, OKAY?
NOT FOR THE NORMAL DISTRIBUTION THAT YOU GET OF SAMPLE MEANS.
IT REALLY WAS CALLED "THE CENTRAL LIMIT THEOREM"
BECAUSE IT IS SO CENTRAL TO THE FIELD OF STATISTICS.
SO, THAT'S A BIT OF THE HISTORY OF WHAT'S GOING ON.
WHEN I STARTED TEACHING STATISTICS, OFF AND ON,
I PICKED UP REAL QUICKLY
THAT STUDENTS STRUGGLED WITH THIS IDEA,
NOT JUST OF THE CENTRAL LIMIT THEOREM,
BUT THIS WHOLE IDEA OF DISTRIBUTION OF SAMPLE MEANS.
AND, IN FACT, THAT'S WHAT I PUT IN THE ABSTRACT FOR THIS--
IS THAT PEOPLE GET LOST WHEN THEY HEAR THAT OR READ IT.
AND SO, I STARTED TRYING TO FIGURE OUT,
"HOW IN THE WORLD DO YOU EXPLAIN THIS THING?"
AND I WENT LOOKING ON THE WEB FOR WHAT SOMEBODY HAD DONE
THAT MADE CENTRAL LIMIT THEOREM REALLY CLEAR TO PEOPLE...
AND I DIDN'T FIND ANYTHING THAT I LIKED.
WHICH MEANS, I DIDN'T FIND ANYTHING I UNDERSTOOD.
SO, I STARTED PLAYING AROUND WITH SOME THINGS.
I HAD AN IDEA, INITIALLY,
OF DOING SOMETHING LIKE THE LOTTERY BALLS THAT POP UP.
I ACTUALLY WENT INTO MICROSOFT WORD, CREATED SOME CIRCLES,
PUT NUMBERS ON 'EM, AND THEN TRIED TO FIGURE OUT HOW TO ANIMATE THEM.
AND IT DIED RIGHT THERE, 'CAUSE WORD IS NOT THAT KIND OF A PRODUCT.
BUT I PLAYED-- I DON'T KNOW HOW I ENDED UP IN EXCEL,
BUT THROUGH A BUNCH OF TRIAL AND ERROR, SOMETHING HAPPENED--
AND YOU'LL SEE IT HERE IN A LITTLE BIT THAT--
THAT I THOUGHT, "AH-HA!
"THIS, I THINK, WILL DO THE JOB."
SO, ABOUT FOUR OR FIVE YEARS AGO, I FINISHED UP A SHELL OF THIS THING.
I SPENT ABOUT A YEAR, A YEAR AND A HALF, PLAYING WITH IT.
AND THEN, IT HAS SAT-- I'VE USED IT IN A FEW CLASSES, A LITTLE BIT--
BUT WHEN JOHN ASKED FOR SOMEBODY WHO MIGHT WANT TO PRESENT SOMETHING,
I THOUGHT ABOUT THIS.
AND SO, IN THE LAST THREE OR FOUR WEEKS,
I'VE PUT A LOT OF TIME INTO FIRMING THIS THING UP A LITTLE BIT.
THE BIGGEST DRAWBACK IS THAT IT'S IN EXCEL.
I'M AN OLD PROGRAMMER.
I CAN-- I CODED SOME OF THIS IN VISUAL BASIC, BUT IT'S NOT STABLE.
IT'S EASY TO BREAK.
AND SO, ONE OF THE THINGS THAT I'VE GOT MY RADAR OUT FOR SOMEBODY
WHO CAN PROGRAM THIS IN A LANGUAGE
THAT'LL PRESERVE THE CHARACTERISTICS HERE,
SO THAT I CAN PUT IT OUT ON THE WEB
AND PEOPLE COULD USE IT WITHOUT IT FALLING APART.
I MADE A LITTLE CHANGE A COUPLE OF DAYS AGO,
AND TWO HOURS AGO, IT WAS BROKEN.
SO... (chuckling)
I FOUND IT AND FIXED IT.
BUT IT'S NOT REAL STABLE, IN SOME SENSES.
BUT IT DOES A LOT OF STUFF.
OKAY?
AND WHAT I WANT TO DO IS SHOW YOU A LITTLE BIT
OF WHAT I SEE GOING ON HERE,
AND YOU MAY COME UP WITH SOME IDEAS
ON HOW YOU THINK SOMEBODY COULD USE IT, TOO.
SO, UM... I GUESS WE'LL JUST GO.
SO...
THE FIRST THING I DID IS...
THE FIRST THING I GOT HERE-- IF YOU LOOK AT THIS,
I'VE GOT SOME NUMBERS IN PINK HERE.
ANY OF THESE NUMBERS ARE NUMBERS THAT YOU CAN CHANGE.
AND YOU CAN USE THIS
TO ILLUSTRATE A LOT OF DIFFERENT THINGS, OKAY?
IT'S GOT ONE TECHNICAL--
I DON'T KNOW IF YOU CALL IT A LIMITATION--
BUT ONE TECHNICAL FEATURE THAT'S BUILT INTO WHAT MAKES THIS WORK,
AND THAT IS THAT THIS SAMPLE SIZE NUMBER IS PRETTY CRITICAL
TO HOW THIS THING WORKS, AND I'LL SHOW YOU A LITTLE OF THAT.
I'M NOT GONNA GET INTO THE PROGRAMMING.
I DON'T KNOW THAT VERY MANY OF US ARE INTERESTED IN PROGRAMMING,
BUT WE CAN LOOK AT THAT LATER, IF YOU WANT TO.
BUT I WANNA SHOW YOU WHAT THIS THING CAN DO.
THIS-- I JUST SAVED THIS SOMEWHERE ALONG THE LINE WHEN I ACTUALLY USED IT HERE.
SO, I HAVE A TABLE HERE
THAT HAS 5,000 VALUES IN IT BETWEEN 20 AND 245.
AND THIS IS MY POPULATION.
OKAY?
AND IF YOU LOOK, WITH A SAMPLE SIZE OF 12,
EACH OF THESE ROWS HAS 12 NUMBERS IN IT.
THAT'S THE ONLY THING THAT'S NOT, UM...
EXPECTED, MAYBE, AS YOU LOOK THROUGH THIS,
IS THAT THIS WHOLE APPLICATION IS BUILT ON THE SAMPLE SIZE.
SO, IF I'M GONNA CHANGE THE SAMPLE SIZE-- WHICH I'M GONNA SHOW YOU--
IT'LL CHANGE THIS TABLE.
BUT, UH, WHAT DO I HAVE HERE?
I'VE GOT 5,000 ENTRIES IN THE POPULATION OF THE--
OR THE, UH...
SAMPLE SIZE OF 12,
WHICH MEANS SOMEWHERE DOWN HERE ABOUT--
ONE THING THAT'S REALLY NICE IS COMPUTERS HAVE GOTTEN A LOT FASTER
SINCE I STARTED WORKING ON THIS.
COMPUTERS AND MEMORY, OKAY, HAVE REALLY INCREASED
SINCE I PUT THIS THING TOGETHER.
SO, NOW IT WORKS THE WAY I'VE ALWAYS WANTED IT TO.
FIVE, SIX YEARS AGO, IT WAS SLOW
AND THERE WERE A LOT OF LIMITATIONS ON THE NUMBERS
THAT YOU COULD PUT IN THERE.
BUT WITH IMPROVEMENTS IN SOFTWARE NOW, IT'S PRETTY NICE.
SO, WE GO DOWN HERE, WHAT, 416, 417 ROWS,
AND YOU CAN EVEN USE IT
TO SHOW PEOPLE 12 IS NOT AN EVEN DIVISOR OF 5,000.
AND THE REASON I POINT THAT OUT IS IT TOOK A BIT OF DOING
TO GET THAT RESIDUE BUILT INTO THE PROGRAM RIGHT, SO...
(laughing) BUT IT'S RIGHT.
I'VE COUNTED SOME OF THESE NUMBERS ON SMALLER POPULATIONS.
THEY'RE RIGHT.
SO, I'VE GOT THIS POPULATION, AND IT'S LIKE, "SO?
"WHO CARES?"
UM, LET'S DO THIS.
THIS IS KINDA THE COMMAND CENTER OVER HERE.
WHAT I'M SAYING...
WHAT I'VE GOT SET UP HERE IS, I'VE GOT A SAMPLE SIZE OF 12, OKAY?
AND LET'S GO OUT AND USE THIS THE WAY SOMEBODY WOULD TO INTRODUCE THIS THING.
THIS MAXIMUM NUMBER OF SAMPLES--
WHEN I PUT 10,000 IN HERE, ORIGINALLY,
THE PROGRAM WOULD BLOW UP SOMEWHERE AROUND THERE,
BECAUSE THE SIZE AND MEMORY ON MY COMPUTER
AND EXCEL AND THAT KIND OF STUFF.
I DON'T KNOW HOW BIG YOU CAN MAKE IT NOW.
I DID SOMETHING THE OTHER DAY WITH--
I DON'T REMEMBER WHAT I PUT-- 20,000 OR SOMETHING,
AND IT WORKED FINE, SO...
YOU CAN PLAY AROUND WITH THAT.
BUT I'M GONNA TAKE THIS WHOLE THING
AND I'M GONNA GO OUT AND GET ONE SAMPLE OF--
ONE RANDOM SAMPLE-- AND THAT'S KEY, OKAY?
IF-- WHEN I TEACH MY STATISTICS CLASS,
I TELL THEM THE FIRST REALLY IMPORTANT WORD IN STATISTICS IS "RANDOM."
OKAY, IF IT'S NOT RANDOM, IT'S GARBAGE.
AND SO, EXCEL HAS A RANDOM NUMBER GENERATOR, WHICH IS WHAT--
I GUESS THAT'S WHAT DREW ME TO EXCEL IN THE FIRST PLACE.
THAT IT HAS A RANDOM NUMBER GENERATOR.
SO, I'M GONNA GO OUT AND I'M GONNA GET A SAMPLE
OF 12 NUMBERS OUT OF THESE 5,000.
OKAY?
AND...
WE'LL COME BACK AND TALK ABOUT SOME OF THE NUMBERS THAT ARE--
WELL, NO, LET ME DO THIS FIRST,
BEFORE WE START FILLING IT UP WITH OTHER NUMBERS.
ONCE I BUILD THIS POPULATION, THERE ARE SOME NUMBERS THAT ARE LOCKED IN.
I'M GONNA PLAY WITH THIS POPULATION QUITE A BIT,
BUT THESE NUMBERS ARE NOT GONNA CHANGE BECAUSE THERE'S A FIXED POPULATION.
IT HAS A MEAN, OKAY?
AND AS YOU GET INTO STATISTICS,
YOU FIND OUT THAT MOST OF THE TIME, IN REAL LIFE,
YOU DON'T KNOW WHAT THAT IS.
OKAY?
AND BECAUSE IN A REAL LIFE SITUATION,
A LOT OF TIMES YOU DON'T KNOW THE MEAN,
PEOPLE THREW OUT THE WHOLE IDEA OF EVEN TRUSTING STATISTICS,
WHEN IT FIRST CAME ABOUT.
OKAY?
THIS IS A POPULATION WHERE WE DO KNOW THE MEAN.
AND IT'S NICE, BECAUSE EXCEL CALCULATES ALL THIS STUFF, OKAY?
I DON'T HAVE TO GO OUT AND CALCULATE IT.
BUT HERE'S THE MEAN, HERE'S THE STANDARD DEVIATION.
AND THEN, THERE'S THIS LITTLE THING CALLED "STANDARD ERROR" OVER HERE...
WHICH TURNS OUT TO BE THE MOST POWERFUL NUMBER ON THE SHEET.
OKAY?
AND WE'LL TALK ABOUT THAT.
UM, SO, YOU SEE THESE NUMBERS.
THEY'RE NOT GONNA CHANGE FOR A WHILE, UNTIL I CHANGE THE POPULATION.
SO, I'M GONNA GO OUT AND I'M GONNA GET THIS ONE SAMPLE.
OH, IT DIDN'T-- WE DIDN'T SEE IT HAPPEN.
WHAT-- WHAT-- WE WILL, AFTER A WHILE,
WHEN I GET SOME MORE.
UM... AND OKAY.
SO, THE MEAN OF THE POPULATION IS 132.
THIS MEAN DOESN'T LOOK LIKE IT'S CLOSE TO THAT... OKAY?
SO, THE QUESTION AUTOMATICALLY CAME TO EVERYBODY
WHO FIRST LOOKED AT STATISTICS... "WHAT'S THIS TELL ME?"
YOU KNOW, IS THIS ANYTHING THAT'S GONNA BE USEFUL?
AND, FOR A LONG TIME, EVERYBODY'S ANSWER WAS "NO."
WELL, SO IF I GET ANOTHER SAMPLE OF SIZE ONE...
NOW, I'VE GOT ENOUGH FOR EXCEL TO BUILD A HISTOGRAM.
STILL NOT REAL PRODUCTIVE.
BUT WE'RE GONNA SEE SOME THINGS HAPPEN HERE.
I COULD-- I COULD GET A LOT OF SAMPLES OF ONE, OKAY?
BUT YOU DIDN'T COME TO SEE THAT, SO LET'S BUMP THIS UP HERE.
LET ME GET EIGHT SAMPLES AT ONCE, AND BUMP THIS UP TO--
THIS COUNT UP TO TEN.
THERE WE GO.
OKAY?
THE FIRST THING THAT REALLY CAUGHT MY ATTENTION
WHEN I PLAYED AROUND IN EXCEL WITH THIS
IS THAT, AS IT TAKES LOTS OF SAMPLES,
THEY ALL APPEAR ON THAT POPULATION.
AND YOU CAN SEE IT FLASH.
AND THAT CAPTURED MY ATTENTION,
BECAUSE I THOUGHT IT WOULD BE HELPFUL FOR MY STUDENTS
TO SEE THAT, "OOO, SOMETHING IS REALLY GOING ON,"
AND SEE WHAT THE IDEA OF A SAMPLE IS.
THE OTHER THING IS THIS.
WE'VE GOT A TOTAL OF TEN SAMPLES RIGHT NOW AND...
I KNOW WHAT'S GOING ON.
BUT A LOT OF FIRST-TIME STUDENTS DON'T.
OKAY?
SO, LET'S BUMP THIS UP A LITTLE BIT.
I'VE GOT TEN-- LET'S MAKE IT 100.
WHOOPS.
SO, I'M GONNA GO OUT AND GET 90 SAMPLES.
AND YOU CAN WATCH IT TAKE SOME SAMPLES THIS--
I'VE GOT A POPULATION OF 1,000, SO IT'S WAY DOWN THERE, SO...
BUT THESE ARE RANDOM.
AND IT REBUILDS THE THING.
AND ALL OF A SUDDEN, WE SEE SOMETHING STARTING TO HAPPEN HERE.
AND IF I...
DO YOU KNOW WHAT CENTRAL LIMIT THEOREM SAYS?
IN LAYMAN'S TERMS?
IT BASICALLY SAYS NO MATTER WHAT YOUR POPULATION IS LIKE,
IF YOU TAKE A BUNCH OF SAMPLES--
IF YOU TAKE LOTS AND LOTS AND LOTS OF SAMPLES--
THE MEAN OF THOSE SAMPLES IS GONNA CLUMP AROUND THE MIDDLE.
AND IT'S ACTUALLY GONNA CLUMP AROUND THE MEAN OF THE ENTIRE POPULATION.
AND WE'LL TALK A LITTLE BIT ABOUT WHY THAT HAPPENS,
BUT THAT TURNS OUT TO BE INCREDIBLY IMPORTANT, OKAY?
SO, LET'S BUMP THIS UP SOME MORE.
WE'VE GOT 100 OVER HERE, SO LET'S MAKE IT 1,000.
SO, I'LL GO GET 900 MORE SAMPLES.
THIS IS ONE REASON IT'S REALLY NICE TO HAVE MORE COMPUTER POWER
THAN WHEN I FIRST STARTED ON THIS,
BECAUSE 900 SAMPLES TOOK ABOUT HALF AND HOUR
THE FIRST FEW TIMES I TRIED IT.
(laughing) IT'S LIKE, "NO!
"DON'T DO THAT!"
BUT NOW, WE CAN DO IT--
IT DOESN'T TAKE TOO LONG TO PULL OFF 900 RANDOM SAMPLES.
AND...
WE'LL LOOK A LITTLE BIT MORE HERE AT WHAT'S GOING ON.
AND YOU CAN WATCH-- THERE'S A COUNTER UP THERE.
YOU CAN SEE WHAT'S GOING ON.
ANOTHER NICE THING ABOUT COMPUTERS IS, WHILE THIS IS WORKING,
I CAN GO PULL UP SOMETHING ELSE I WANTED TO PULL UP.
DU-DU-DU-DUH... HERE.
HERE WE GO.
I'LL COME BACK TO THAT IN A BIT.
WE'LL GO BACK TO THIS.
IT'S STILL CHUGGING ALONG.
WHERE ARE WE?
>> I HAVE A QUESTION ABOUT YOUR CHART. >> OKAY.
>> ON THE GRAPH, WHAT ARE THE AXISES-- ER, THE AXES?
>> WHAT ARE THE AXES HERE?
UM...
WHEN THIS STOPS, I'LL-- WE'LL TALK THROUGH THAT.
IT'S A GOOD QUESTION.
I'M TRYING TO CONDENSE THIS INTO A SHORT TIME,
SO I'M NOT COVERING EVERYTHING, BUT THAT'S A VERY GOOD QUESTION HERE.
YEAH, I'M ASSUMING YOU UNDERSTAND MAYBE MORE THAN...
THAN WHAT YOU DO, AS THIS IS GOING HERE.
IF YOU LOOK AT THIS, OKAY?
THIS IS A FREQUENCY DISTRIBUTION...
OH!
I HAVE NOTES, BUT I DIDN'T LOOK AT THEM.
IF YOU LOOK AT THIS SHEET OVER HERE,
HERE ARE ALL THE SAMPLES WE'VE BEEN TAKING.
AND THIS HAS TO DO WITH THE ANSWER TO HIS QUESTION.
SO, THE VERY FIRST ONE WE TOOK...
THE VERY FIRST SAMPLE WE TOOK-- COUNT ONE--
HAD A MEAN OF 158.3, OKAY?
AND IT'S A SAMPLE OF SIZE 12, SO THEY GO ACROSS THERE.
AND AS WE GO THROUGH HERE NOW,
EVERY ONE OF THESE SAMPLES HAS A MEAN
AND A STANDARD DEVIATION.
AND THIS COLUMN "A" RIGHT HERE IS WHAT WE MEAN
WHEN WE TALK ABOUT A DISTRIBUTION OF SAMPLE MEANS.
AND A LOT OF TIMES, THAT IS WHAT--
THAT'S KIND OF THE IDEA THAT PEOPLE MISS...
GET CONFUSED ABOUT, OKAY?
THIS COLUMN "A" HERE
BECOMES A POPULATION ON ITS OWN.
AND THE CENTRAL LIMIT THEOREM--
WELL, WE'LL COME BACK AND TALK ABOUT WHAT THAT MEANS A LITTLE BIT.
IF I GO BACK OVER HERE, NOTICE SOMETHING.
THE SMALLEST NUMBER IN THAT COLUMN "A" THERE ON THE SAMPLES
IS 75.4...
WHICH IS A WAYS AWAY FROM 132.4.
RIGHT?
AND THE LARGEST NUMBER'S 181.5, AND THAT'S A WAYS AWAY.
BUT THIS VALUE RIGHT HERE, OKAY?
IS THE MEAN OF COLUMN "A,"
AS WE'VE ACCUMULATED 1,000 SAMPLES.
AND THAT'S WHERE THE POWER OF THE CENTRAL LIMIT THEOREM SAYS.
BECAUSE LOOK AT THE MEAN OF 1,000 OF THESE SAMPLES,
COMPARED TO THE MEAN OF THE POPULATION.
LOOK AT WHAT'S HAPPENING.
SEE THAT?
I'LL GO BACK THROUGH ANOTHER ROUND OF THESE THINGS IN A LITTLE BIT, UM...
AND LET US WATCH WHAT'S GOING ON.
BUT THERE'S A-- THIS IS PRETTY POWERFUL.
THE FACT THAT YOU CAN TAKE A LOT OF SAMPLES,
AND THE MEAN OF THOSE SAMPLES
CLUMP AROUND THE MEAN OF THE POPULATION.
THAT'S HALF OF THE POWER OF THE CENTRAL LIMIT THEOREM.
THE OTHER HALF IS WHAT MAKES THAT HAPPEN.
OKAY, AND WE'LL TALK ABOUT THAT HERE IN JUST A MINUTE.
BUT THIS IS THE DISTRIBUTION OF MEANS.
THIS IS THE ACTUAL FREQUENCY DISTRIBUTION OF THOSE MEANS.
AND IF WE KEPT GOING-- YOU KNOW, IF WE TOOK--
WELL, I'VE GOT AN ARTIFICIAL LIMIT
OF 10,000 SAMPLES ON HERE.
WE COULD BUMP THAT UP.
WE COULD TAKE LOTS OF THEM, OKAY?
THE MORE WE TOOK...
THESE TWO NUMBERS ARE NOT GOING TO CHANGE--
ER, THE BLUE ONES AREN'T GONNA CHANGE.
THE GREEN ONES HERE-- THIS IS NOT GOING TO CHANGE VERY MUCH NOW,
BECAUSE IT'S ALMOST IDENTICAL TO THE POPULATION MEAN
BY THE TIME WE TOOK 1,000 SAMPLES.
OKAY?
AND THE THING THAT MAKES THAT HAPPEN, OKAY?
IS THIS STANDARD ERROR.
BECAUSE THE STANDARD ERROR-- DOES THAT SHOW UP?
I JUST ADDED THIS FORMULA TODAY HERE.
EITHER YOU KNOW THE STANDARD DEVIATION OF THE SAMPLE, WHICH IS "S,"
OR THE POPULATION, WHICH IS "SIGMA."
AND IF YOU TAKE THE STANDARD DEVIATION OF EITHER ONE OF THOSE
AND DIVIDE BY THE SAMPLE SIZE, WHICH IS "N"--
SO, IN THIS CASE, WE'RE-- ER, DIVIDE BY THE SQUARE ROOT OF THAT.
IN THIS CASE, WE'RE DIVIDING BY THE SQUARE ROOT OF "N," OKAY?
WHAT THAT DOES IS IT SAYS,
DOWN HERE ON THIS FREQUENCY DISTRIBUTION,
WITH A STANDARD ERROR OF SOMEWHERE AROUND 18 AND A HALF.
IF WE GO 18 AND A HALF ON EITHER SIDE OF 132.4--
SO, DOWN TO ABOUT 114 ON ONE END
AND UP TO 150 ON THE OTHER END--
CHEBYSHEV'S THEORY SAYS WE'RE GONNA FIND TWO-THIRDS OF THE POPULATION
OF THOSE SAMPLE MEANS IN THAT RANGE.
DOES THAT MAKE SENSE?
AND FURTHERMORE, 95 PERCENT OF 'EM
ARE GONNA BE WITHIN TWO STANDARD ERRORS OF THE MEAN OF THE--
THE THING THAT HAPPENS IS, IS THAT THIS STANDARD ERROR OF THE POPULATION
BECOMES THIS STANDARD DEVIATION OF THE DISTRIBUTION OF THE MEANS.
AND IN FACT... WE'LL SEE THIS AGAIN.
BUT MY TAKE-- IF I CAN FIND IT HERE--
MY TAKE ON THE CENTRAL LIMIT THEOREM IS THIS...
THIS.
"THERE ISN'T MUCH WIGGLE ROOM FOR THE MEAN OF A SAMPLE,
"MOST OF THE TIME.
"AND THE SQUARE ROOT OF 'N' ENFORCES THAT LIMIT ON THE WIGGLE."
THAT'S MY VERSION OF THE CENTRAL LIMIT THEOREM.
NOW, THAT VASTLY UNDERSTATES THE CENTRAL LIMIT THEOREM, OKAY?
MY NAME DOES NOT BELONG TO THOSE OTHER GUYS THAT--
I JUST WANTED TO HAVE 'EM ALL COME UP IN THE SAME SEMINAR...
WITH MINE.
(laughing) BUT THAT'S WHAT--
THIS IS WHAT'S GOING ON.
AND THE IMPLICATION OF THAT IS THIS...
IF YOU'RE DOING--
WELL, LET'S GO BACK HERE.
IF YOU'RE DOING ACTUAL RESEARCH
ON A NEW DRUG OR A MODEL OF A CAR OR...
WHATEVER, OKAY?
IF YOU'RE DOING RESEARCH
AND YOU DON'T LIKE HOW MUCH VARIATION--
"VARIANCE," YOU'VE HEARD THAT WORD, RIGHT?
IF YOU DON'T LIKE HOW MUCH VARIATION THERE IS AMONG THE MEAN OF--
YOU KNOW, HOW MUCH YOUR SAMPLES COULD VARY.
THERE'S A REAL EASY FIX TO THAT.
OKAY?
YOU CAN COME UP HERE AND SAY,
"IF YOU LOOK AT STANDARD ERROR, SQUARE ROOT OF 'N'"--
IF YOU LOOK ON THERE--
SQUARE ROOT OF "N"
IS WHAT CONTROLS THE SIZE OF THE WIGGLE.
SO, IF THERE'S TOO MUCH WITH A SAMPLE SIZE OF 12,
LET'S MAKE IT 25,
BECAUSE I KNOW THE SQUARE ROOT OF 25.
AND WHAT THAT'S GONNA DO
IS IT'S GONNA TAKE THIS STANDARD DEVIATION OF OUR ORIGINAL POPULATION,
WHICH IS ALMOST 65,
DIVIDE IT BY FIVE, WHICH IS THE SQUARE ROOT OF 25,
AND WHEN I COME OUT HERE NOW AND RESIZE THIS SAMPLE,
TWO THINGS ARE GOING TO HAPPEN.
ONE IS THESE ROWS ARE GONNA CHANGE
FROM 12 TO 25 NUMBERS WIDE.
BUT IT'S STILL GONNA BE THE SAME POPULATION.
THAT WAS ONE OF THE FEATURES I BUILT IN, IN THE LAST MONTH.
SO, I'M GONNA GO OUT AND RESIZE THIS SAMPLE.
AS SOON AS I DO THAT, ONE OF THE THINGS THAT MEANS
IS THAT ALL MY ORIGINAL RESULTS ARE NO GOOD ANYMORE.
BECAUSE I'VE GOT A DIFFERENT SAMPLE SIZE.
AND THE CENTRAL LIMIT THEOREM REQUIRES
THAT I TALK ABOUT THE SAME SAMPLE SIZE EVERY TIME.
SO, I HAD TO SCRAP ALL THOSE ORIGINAL ONES WE DO,
AND WE'LL GO OUT AND GET SOME MORE.
WE DON'T HAVE TO GET... 900 OF 'EM.
LET ME RUN ABOUT 200.
THAT'LL GO FAIRLY QUICKLY.
AND YOU'LL SEE NOW...
WE HAVE A NEW--
OH, I SHOULD'VE HAD US NOTE WHAT THESE NUMBERS WERE.
THE ORIGINAL-- DID ANYBODY CAPTURE THAT?
WHAT THAT ORIGINAL STANDARD ERROR WAS?
EIGHTEEN POINT SOMETHING?
NOTICE WHAT IT IS NOW.
I INCREASED THE SIZE OF MY DENOMINATOR ON THE CALCULATION
FOR STANDARD ERROR.
AND EVERYBODY KNOWS THAT IF YOU INCREASE THE SIZE OF THE DENOMINATOR,
THE WHOLE NUMBER GETS SMALLER.
SO NOW-- I SAID IT WAS GONNA BE ABOUT 13-SOMETHING,
SOMETHING LIKE THAT-- AND THERE IT IS.
NOW, I HAVE LESS WIGGLE ROOM.
AND IF I TAKE SAMPLES OF SIZE 25 OUT OF THIS POPULATION,
I'M GUARANTEED THAT, TWO-THIRDS OF THE TIME,
THE MEAN OF THAT SAMPLE-- EVEN IF I JUST TAKE ONE--
IS GONNA BE SOMEWHERE
WITHIN 25 UNITS EITHER SIDE OF 132.4.
AND IF THAT'S NOT BIG ENOUGH, TAKE A BIGGER SAMPLE.
SO, ONE OF THE THINGS--
IF WE HAD TIME, WE'D LOOK AT SOME OF THE THINGS,
BUT ONE OF THE THINGS THAT MEANS
IS IF YOU WANT TO SAMPLE PEOPLE'S PREFERENCE IN A PRESIDENTIAL ELECTION,
IF YOU'VE LISTENED TO ANY OF THE NEWS REPORTS THAT COME OUT--
REPUTABLE NEWS ORGANIZATIONS ARE GETTING TO WHERE THEY TELL YOU
A LOT OF THE BASIC STATISTICAL INFO TO VALIDATE--
FOR PEOPLE LIKE US THAT PICK THEM APART-- WHAT THEY'RE DOING.
AND IT TURNS OUT THAT IF YOU WANT TO BE WITHIN, I THINK--
YOU CAN BACK-CHECK-- I THINK IF YOU WANT TO BE WITHIN THREE POINTS, PLUS OR MINUS,
OF WHAT THE POPULATION REALLY THINKS, YOU NEED A SAMPLE SIZE OF 601.
AND THAT SAMPLE SIZE OF 601
IS DIRECTLY RELATED TO THIS IDEA OF STANDARD ERROR
AND IT'S GONNA CUT DOWN THE MEAN OF YOUR SAMPLE
AND HOLD IT WITHIN THREE POINTS, PLUS OR MINUS,
OF WHAT THE REAL POPULATION MEAN IS... AS LONG AS PEOPLE ARE HONEST.
AND AS LONG AS THE SAMPLE IS RANDOM.
EXTREMELY CRITICAL.
IN FACT, THE MONEY IN STATISTICAL SAMPLING GOES INTO MAKING SURE
THE SAMPLES ARE RANDOM.
THAT'S WHERE THE MONEY IS, OKAY?
UM, THIS MAKING SENSE?
I'M...
I COULD GO ON FOR A WHILE HERE.
(laughing) THIS HAS BEEN FUN TO DO THIS.
UM, IF WE LOOK AT WHAT HAPPEN--
WELL, YOU TELL ME NOW, WHAT'S GONNA HAPPEN IF I TAKE MORE SAMPLES?
>> YOUR STANDARD ERROR IS GONNA GET SMALLER.
>> WHOA, WHOA, WHOA-- IS HE RIGHT? >> NO.
(audience laughing)
>> THE STANDARD ERROR'S NOT GONNA CHANGE.
>> THAT'S IF YOU INCREASE THE SAMPLE SIZE, IT WILL.
>> RIGHT-- IF WE CHANGE--
IF WE INCREASE THE SAMPLE SIZE, IT'LL CHANGE.
BUT WHAT'S GONNA HAPPEN IS I TAKE MORE AND MORE SAMPLES.
>> (indistinct) ALREADY FOUND AND GET CLOSER TO THE MEAN?
>> IT'S GONNA FILL IN THIS CHART, OKAY?
WE'RE STILL GONNA HAVE THESE OUTLIERS OUT HERE.
REMEMBER, MY TAKE ON THIS SAID "MOST OF THE TIME."
OKAY, THAT'S THERE BY DESIGN,
BECAUSE CHEBYSHEV'S THEOREM SAYS "99.7" ON A NORMALIZED POPULATION AND...
THESE SAMPLE MEANS WILL BE NORMALIZED, OKAY?
BUT THERE WILL STILL BE SOME OUTLIERS.
BUT...
ONE OF THE IMPLICATIONS OF THAT IS, IF YOU TAKE ONE SAMPLE
WHERE YOU KNOW SOMETHING ABOUT THE POPULATION,
AND YOU GET A BIZARRE RESULT THAT'S OUT HERE,
IT COULD JUST BE THAT OUTLIER...
BUT ISN'T IT MORE LIKELY THERE'S SOMETHING ELSE GOING ON?
WHICH IS CALLED "HYPOTHESIS TESTING."
THAT'S EXACTLY WHAT'S HYPOTHESIS TESTING IS TRYING TO ADDRESS IS,
"OH, I GOT SOMETHING WAY OUT HERE.
"I WONDER WHY?"
WE COULD GO OFF IN THAT DIRECTION, BUT WE WON'T.
OKAY, SO LET'S--
ONE OF THE QUESTIONS PEOPLE HAD IS, "WELL, OKAY.
"YOU'RE EXERCISING A LOT OF CONTROL OVER THIS.
"EXCEL'S RANDOM NUMBER GENERATOR, BASE ONE,
"GIVES YOU A UNIFORM DISTRIBUTION."
WHICH MEANS THIS-- IF I GO OVER HERE
AND I HIT F9 TO RECALCULATE...
UM...
NOW...
YOU HAVE TO UNDERSTAND SOME THINGS ABOUT GRAPHING WITH STATISTICS, TOO.
THIS LOOKS LIKE IT'S WILDLY VARYING...
BUT IT'S NOT.
AND IT'S BACK TO YOUR QUESTION ABOUT WHAT ARE THE AXES.
BECAUSE LOOK AT THIS.
THIS IS JUST GOING FROM 440 TO 550.
SO, IT'S EXAGGERATING THE GAPS HERE.
IF WE WENT ALL THE WAY DOWN TO ZERO, THIS WOULD BE ALMOST FLAT.
OKAY?
SO, YOU GOTTA-- AND THAT'S ONE THING YOU GOTTA PAY ATTENTION TO,
BECAUSE WHEN YOU READ STATISTICS IN A NEWSPAPER OR ONLINE OR SOMETHING,
IF SOMEBODY HAS AN AGENDA,
THEY'LL CHOP OFF THE BASE LINE FOR A GRAPH IN A HEARTBEAT...
TO FIRE UP PEOPLES' EMOTIONS.
IT WORKS, OKAY?
SO, YOU GOTTA WATCH FOR THAT.
BUT... YOU COME ALONG, TAKE A LOOK AT THIS THING.
IT'S LIKE, OKAY-- IF--
AND THAT'S ONE THING I HAVE NOT BUILT ANY CONTROLS IN THIS,
IS THE KIND OF CHART THAT EXCEL PUTS OUT.
IT JUST-- EXCEL DOES 'EM.
IT'S LIKE, "OH, THAT'S NICE!
"I'LL LET THEM DO IT."
SO, I HAVEN'T DONE ANYTHING WITH THAT.
THERE'S A LOT OF IMPROVEMENTS COULD BE MADE TO THIS THING.
BUT IF WE GO BACK, ONE OF THE EARLY ARGUMENTS
AGAINST STATISTICS
WAS IT WORKS IF YOU HAVE NICE, CLEAN POPULATIONS, OKAY?
OKAY, SO LET'S WATCH WHAT HAPPENS HERE.
SUPPOSE I SCRAP THIS POPULATION NOW, OKAY?
AND I SAY-- I'LL SKIP A NORMAL ONE, JUST FOR THE SAKE OF TIME,
BUT LET'S SAY I BUILD A REALLY SKEWED POPULATION.
AND THERE'S A COOL EXAMPLE, I'LL SHOW YOU IN A FEW MINUTES,
OF WHAT YOU CAN DO WITH THIS.
SO, THIS IS A BRAND NEW POPULATION STILL WITHIN THESE LIMITS.
OH, LET ME SHOW YOU SOMETH--
ANOTHER FEATURE OF THIS, WHILE WE'RE AT IT.
LET ME CHANGE THIS TO 20.3
AND 245.8,
AND GO OUT AND GET A NEW RIGHT-SKEWED POPULATION.
WATCH WHAT HAPPENS HERE.
NOW, IT'S GOT THE DECIMAL PLACES IN THERE,
WITH THE SAME DEGREE OF ACCURACY THAT WE LOOKED AT HERE,
AND THESE GO OUT ONE ADDITIONAL DECIMAL PLACE,
WHICH IS THE STANDARD FOR MATHEMATICS,
WHEN YOU'RE CALCULATING MEANS AND THINGS LIKE THAT.
YOU ALWAYS GO OUT ONE MORE DECIMAL PLACE IN YOUR RAW DATA.
SO, WE'VE GOT THIS POPULATION NOW,
AND IF YOU GO OUT AND LOOK AT WHAT THIS POPULATION LOOKS LIKE,
IT REALLY IS SKEWED.
OKAY?
SO, YOU KNOW, ONE OF THE FIRST QUESTIONS PEOPLE STARTED ASKING,
"WELL, SURELY THIS THING ISN'T GONNA HAPPEN."
IF I STILL TAKE SAMPLES OF SIZE 25...
THAT MEANS DISTRIBUTION IS GONNA BE SKEWED,
JUST LIKE THE POPULATION,
BECAUSE IT'S SUPPOSED TO REFLECT THE POPULATION, RIGHT?
HOPE I DIDN'T FIRE OFF 10,000 HERE.
NOPE, JUST 200.
DUH-DUH-DUH-DUH-DUH...
LOOK WHAT'S HAPPENED.
AND, YOU KNOW, FOR THE MATHEMATICIANS
WHO WERE STUDYING THE CENTRAL LIMIT THEOREM
AND THEY WERE PROVING THINGS, THEY KNEW WHAT WAS GONNA HAPPEN.
BUT GENERAL BUSINESSPEOPLE,
WHO WERE THE FIRST ONES TO REALLY GET A HOLD
OF STATISTICS BACK IN THE EARLY 1900s, LATE 1800s,
THIS KIND OF RESULT...
TENDED TO SELL THEM ON USING STATISTICS.
NOT VERY MANY OF 'EM...
BUT A FEW COMPANIES OUT THERE SAW SOME COMPETITIVE ADVANTAGE
TO DOING SOME ANALYSIS ON SOME OF THEIR DATA.
AND WHEN THEY SAW THIS,
THAT YOU CAN START WITH...
A REALLY SKEWED POPULATION.
AND THE CENTRAL LIMIT THEOREM SAYS,
"I DON'T CARE WHAT YOUR ORIGINAL POPULATION IS,
"AS LONG AS IT'S RANDOM, AND THERE ARE A FEW OTHER CONDITIONS."
BUT AS LONG AS IT'S RANDOM, PRIMARILY,
THE DISTRIBUTION OF SAMPLE MEANS IS GONNA BE NORMAL.
PERIOD.
AND THAT WAS WHAT LAID THE FOUNDATION FOR THE FIELD OF STATISTICS.
BECAUSE PEOPLE COULD COUNT ON--
'CAUSE WHAT THIS MEANS IS...
YOU DON'T HAVE TO TAKE ALL THE SAMPLES.
JUST A FEW WILL DO.
AND IF YOU MEET CERTAIN CONDITIONS,
ONE SAMPLE OF A POPULATION IS ENOUGH.
AND "N" DOESN'T HAVE TO BE VERY LARGE.
OKAY?
WHICH THERE'S A LOT OF THINGS WE COULD GET INTO, BUT...
BUT THAT'S WHAT'S GOING ON.
AND THAT'S A REASON
WHY PEOPLE TRUST STATISTICS.
AT LEAST PEOPLE WHO UNDERSTAND WHAT'S GOING ON...
IS THAT YOU CAN HAVE ALL THESE VARIED POPULATIONS,
YOU CAN MONKEY WITH 'EM A LITTLE BIT,
BUT WHEN YOU TAKE A LOT OF SAMPLES
AND YOU TRACK THE SAMPLE MEAN,
YOU'RE STILL GONNA GET THIS WIDE-- IF YOU LOOK--
IF WE HAD TIME, IF YOU GO LOOK AT THESE SAMPLES HERE,
THERE'S A LOT OF VARIATION IN HERE.
LOOK AT THESE STANDARD DEVIATIONS.
THERE'S STILL A LOT OF STANDARD-- LOT OF VARIETY INSIDE THOSE SAMPLES.
BUT THIS COLUMN "A" IS CONTROLLED BY THE SQUARE ROOT OF "N"
AND... THE ORIGINAL...
STANDARD DEVIATION OF THE POPULATION.
THOSE TWO NUMBERS LOCK IN WHAT CAN HAPPEN TO THE MIDDLE OF THIS GRAPH.
SO, WHEN HE ASKED ABOUT THE NUMBERS ON HERE,
THERE'S A LOT OF VARIETY, BUT LOOK-- IT DOESN'T GO VERY FAR.
LOOK AT MY RANGE-- 225-- FROM 20 TO 245--
LOOK WHERE MOST OF THE DATA IS.
FROM 65 TO 90-- ER, 95.
MOST OF THE DATA IS IN THERE.
AND THAT'S REALLY WHAT THE...
CENTRAL LIMIT THEOREM'S ALL ABOUT.
NOW, LET ME TAKE A COUPLE MINUTES.
I'LL SHOW YOU ONE OR TWO THINGS YOU CAN DO WITH THIS.
YOU CAN PLAY WITH THIS.
LET'S-- LET ME CUT THIS DOWN, SO IT DOESN'T TAKE SO LONG.
BUT SUPPOSE YOU WANNA DO THIS.
FOR RIGHT NOW, I'LL DO THIS.
WE DON'T NEED TO WORRY ABOUT THIS RIGHT NOW.
OH, I'LL LEAVE THAT 200.
ANYBODY TELL ME WHAT I'M GONNA DO RIGHT HERE?
CAN YOU SEE WHERE I'M GOING WITH THIS?
>> SIX-SIDED DICE?
>> THIS IS-- I'M GONNA SIMULATE ROLLING A SIX-SIDED DIE-- DICE--
DIE, 1,000 TIMES.
YOU SEE THAT?
SO...
IT MEANS I GOTTA HIT "NEW POPULATION."
AND THERE WE ARE.
THERE IS...
THERE'S A SET OF 1,000 ROLLS OF A FAIR DIE.
AND WE KNOW IT'S-- OOO!
I DIDN'T CHANGE-- DID ANYBODY CATCH-- I MADE A MISTAKE.
GUESS WHAT?
I'VE GOT "CALCULATE" TURNED OFF,
BUT IF WE GO OUT AND LOOK AT THESE, GUESS WHAT?
IF YOU CAN PULL THAT OFF, YOU CAN WIN A LOT OF MONEY.
(audience laughing) RIGHT?
THIS AIN'T A FAIR DIE.
THAT MAKE SENSE?
THAT'S CENTRAL LIMIT THEOREM.
SO, FLOOR'S OPEN TO QUESTIONS, IF YOU HAVE THEM, OR...
I DON'T KNOW WHAT I LEFT OUT.
HOPEFULLY-- THAT'S PROBABLY ENOUGH.
SO, QUESTIONS?
>> HOW DO YOU REALLY KNOW, WHEN YOU SEE, LIKE, A POLL--
HOW DO YOU REALLY KNOW IF THEY GOT A TRUE RANDOM SAMPLING OR NOT?
>> MAN, HOW'S THAT FOR A QUESTION?
(audience chuckling) OKAY.
>> REVERSE SOCIAL ENGINEERING? >> PARDON?
>> REVERSE SOCIAL ENGINEERING? >> REVERSE SOCIAL ENGINEERING?
I THINK THE BIGGEST INDICATOR IS WHO'S THEIR CLIENT BASE?
YOU CAN'T TELL.
DOES THAT MAKE SENSE?
YOU CANNOT TELL BY LOOKING AT THE RESULTS HOW GOOD THEIR SAMPLES ARE.
YOU GOTTA KNOW SOMETHING ABOUT THEIR CLIENT BASE.
UM...
MY BROTHER WAS A STATE LEGISLATOR IN INDIANA FOR A WHILE,
AND MAN, DID WE LEARN A LOT...
ABOUT SURVEYING AND SAMPLING.
YOU CAN MAKE THINGS HAPPEN, YOU KNOW?
UH, CREATE A TALK-SHOW AND HAVE PEOPLE CALL IN.
WELL, WHO CALLS IN?
WHO CALLS IN ON THOSE TALK-SHOWS WHEN THEY TAKE POLLS?
>> PEOPLE WATCHING. >> AND WHO WATCHES?
PEOPLE THAT CARE. >> (indistinct speaking).
>> THEY EITHER AGREE WITH THAT SHOW OR THEY HATE THAT GUY OR GAL, RIGHT?
VERY STRONG FEELINGS DRIVE THAT.
THOSE ARE NOT RANDOM SAMPLES.
THAT'S GARBAGE STATISTICS.
AND IT GENERATES A HUGE AMOUNT OF MONEY.
OKAY?
SO, WHEN YOU FOLKS GO OUT AND YOU READ THESE THINGS
OR YOU GET INVOLVED IN THE BUSINESS
AND YOU'RE DOING MARKET SURVEY OR SOMETHING,
YOU GOTTA WATCH, OKAY?
WHEN I WORKED AT G.E., WE DID A MARKET SURVEY
AND, AT THE LAST MINUTE, SOMEBODY CHANGED A COUPLE OF THE QUESTIONS
ON THAT SURVEY.
AND NOBODY CAUGHT IT.
AND WE SPENT A LOT OF MONEY, BUILD A PROGRAM, AND IT FLOPPED.
OUR DATA WASN'T ANY GOOD.
THE MATH WAS GOOD... BUT IT WAS GARBAGE.
SO, YOU GOTTA WATCH THAT STUFF.
I MEAN, IT'S-- AND THOSE ARE THE KINDS OF-- THAT'S ONE--
THAT'S A PRIMARY REASON WHY IT TOOK A LONG TIME
FOR PEOPLE TO TRUST STATISTICS,
AND IT'S WHY MY FATHER-IN-LAW AND MOTHER-IN-LAW
DON'T TRUST STATISTICS TO THIS DAY
BESIDES THE FACT THAT I'M INVOLVED WITH IT.
(audience chuckling) GOOD QUESTION.
>> IS THERE A MEASURE TO CHECK TO SEE HOW RANDOM
YOUR RANDOM NUMBER GENERATOR IS?
>> THERE ARE MEASURES.
DON'T ASK THE NEXT QUESTION, 'CAUSE I CAN'T ANSWER IT.
(laughing) BUT THERE ARE MEASURES.
AND THAT'S-- IF YOU GO ON AND GET INVOLVED
IN OPERATIONAL RESEARCH AND THINGS LIKE THAT--
IT DEPENDS ON THE CRITICALITY OF THE DECISIONS THAT YOU'VE GOTTA MAKE,
BUT THERE ARE LOTS OF CHECKS ON "HOW RANDOM IS YOUR DATA?"
THAT'S WHAT YOU'RE ASKING, RIGHT?
UM...
THIS IS NOT MY FIELD.
SO, I KNOW THAT STUFF IS OUT THERE, BUT I DON'T KNOW A LOT ABOUT IT.
>> (indistinct) IS A VERY TRICKY THING. >> IT'S VERY TRICKY.
>> YOU CATCH PEOPLE WHO ARE TRYING TO FALSIFY DOCUMENTS
BY CHECKING TO SEE WHAT THEY'VE ACTUALLY WRITTEN DOWN FOR DIGITS,
BECAUSE IF YOU PICK RANDOMLY, YOU CAN'T HAVE, LIKE, SIX "5"s--
>> RIGHT, RIGHT.
IF WE LOOKED DOWN THROUGH THIS LIST OF--
WELL, LET ME CHANGE THIS TO A FAIR DIE,
AND TAKE A LOOK AT WHAT HE'S TALKING ABOUT HERE.
CONTROL HOME IS NICE.
LET'S MAKE-- DICE ARE UNIFORM DISTRIBUTION, RIGHT?
ALL SIX OF 'EM ARE SUPPOSED TO BE THE SAME WEIGHT?
SO, THIS GENERATED A--
NO, IT DIDN'T, 'CAUSE I HIT THE WRONG BUTTON.
(audience chuckling)
COMPUTERS STILL, FOR THE MOST PART, DO WHAT YOU TELL 'EM.
OKAY.
BUT IF LOOK DOWN THROUGH HERE, WE SHOULD SEE--
THERE'S THREE "4"s IN A ROW, AND FOUR OUT OF FIVE--
YEAH, FOUR OUT OF FIVE ARE "4"s.
HOPEFULLY, WE DON'T SEE ANY "0"s OR "7"s IN HERE.
BUT YEAH.
I MEAN, WHAT HE'S SAYING IS THAT...
IF IT'S RANDOM,
THERE'S A STRETCH IN HERE WHERE YOU'RE GONNA GET A BUNCH OF "4"s.
BUT THE OCCURRENCE OF THAT IS GONNA BE ONE OF THESE OUTLIERS.
NOT GONNA HAPPEN VERY OFTEN.
IF IT DOES, SOMETHING'S UP.
A LOT OF AUDITING IS BASED ON SAMPLING.
WHAT ELSE?
THESE ARE GOOD QUESTIONS.
WHAT ELSE?
>> SO, ARE YOU SAYING THAT WHEN I'M PLAYING ROULETTE,
AND I SEE THAT THREE BLACKS IN A ROW, THAT I SHOULD--
>> THE NEXT ONE-- NEXT ONE!
>> THAT I SHOULD PUT MY MONEY ON RED,
BECAUSE, I MEAN, THEY HAVE TO BE BALANCED OUT?
LIKE, THAT'S NOT TRUE.
(audience laughing) >> ONE OF THE ORIGINAL--
THE ORIGINAL PUSH-- THAT'S A HUGE QUESTION.
OKAY, THE ORIGINAL PUSH FOR THE DEVELOPMENT OF PROBABILITY
WAS THAT QUESTION OR ONE VERY SIMILAR TO IT, BY--
I FORGOT THE GUY'S NAME, BUT HE WAS A PRINCE IN CENTRAL EUROPE,
AND HE HAD A FRIEND BY THE NAME OF FERMAT, WHO IT'S LIKE, "OOO!"
THIS GUY FIGURED OUT THERE MUST BE SOME KIND OF PATTERN TO GAMBLING
AND "IF I KNEW IT, I COULD BEAT EVERYBODY."
SO, HIS FRIEND SPENT THE REST OF HIS CAREER...
DELVING INTO PROBABILITY, AND THAT'S WHERE ALL THIS CAME FROM.
THINK ABOUT IT THIS WAY-- THOSE CARDS HAVE NO MEMORY.
IF THE CHANCE-- OR THAT DIE, AS LONG AS IT'S A FAIR ONE,
THAT DIE DOES NOT HAVE A MEMORY.
AND THE FACT THAT IT JUST GOT FOUR IN A ROW
MEANS ONE OUT OF SIX TIMES, ON AVERAGE, THE NEXT ONE WILL ALSO BE A FOUR.
SO, IF YOU SEE THREE "4"s IN THERE, ABOUT ONE-SIXTH OF THE TIME,
EVERY TIME YOU SEE THREE "4"s, YOU SHOULD SEE FOUR.
DOES THAT MAKE SENSE?
NOW, ONE OF THE NICE THINGS ABOUT BEING A THEORETICAL MATHEMATICIAN,
WHICH IS WHERE MY INTERESTS LIE,
IS THAT I DON'T HAVE TO WORRY WITH REAL WORLD STUFF
AND ALL THAT, "IS IT RANDOM?
"IS IT REALLY"-- THOSE KINDS OF THINGS.
"WE'LL PAY SOMEBODY TO GO DO THAT."
(laughing) NO, THAT-- WHAT ELSE?
>> (indistinct speaking)...
BUT I HAVE GOTTEN MORE THAN MY FAIR SHARE OF PHONE CALLS
ASKING MY OPINION ON VARIOUS THINGS,
BUT I HAVE CALLER I.D., SO I IGNORE 'EM.
WHAT ABOUT CELL PHONES?
DO THESE PEOPLE WHO CALL TO SEE WHAT THE OPINION OF PEOPLE ARE,
DO THEY CALL CELL PHONES OR JUST LANDLINES?
DO THEY CALL CELL PHONES? >> THINK ABOUT THIS WAY, JOHN.
IF YOU WERE THE CANDIDATE, WHAT WOULD YOU PAY FOR?
>> WELL, I UNDERSTAND THAT, BUT IT WASN'T TOO MANY YEARS AGO THAT--
>> IT'S BEEN A "C" CHANGE.
>> WELL, MY QUESTION NOW IS ARE THEY NOW CALLING CELL PHONES?
>> YES, THEY ARE. >> I THINK THEY CAN,
BUT YOU CAN BE PUT ON A "NO CALL" LIST, SO THAT CAN KEEP THEM FROM THAT DATA.
>> RIGHT, RIGHT.
>> WELL, IF WE'RE TALKING ABOUT A RANDOM SAMPLING,
IF YOU'RE ONLY CALLING LANDLINES...
>> WELL, THAT'S A SHRINKING POPULATION.
>> THAT'S A SHRINKING POPULATION, (indistinct speaking).
>> RIGHT.
THE ONLY REASON WE HAVE A LANDLINE RIGHT NOW
IS BECAUSE MY WIFE HAS TO HAVE A FAX FOR HER JOB,
SO WE STILL HAVE A LANDLINE.
BUT WE NEVER ANSWER IT.
(audience laughing) PARDON?
>> YES, AND YOU'RE OLD. >> YEAH.
(laughing) (audience laughing)
>> SO, AS FAR AS YOU KNOW, THEY ARE HITTING THE CELL PHONE POPULATION?
>> I WOULD EXPECT MORE AND MORE.
AND... TO YOUR QUESTION...
POLLING ORGANIZATIONS-- THE GOOD ONES
HAVE ANTICIPATED THESE QUESTIONS, OKAY?
AND ONE OF THE THINGS THAT'S COME ABOUT IS,
"HOW DO YOU ACCOUNT FOR THE FACT THAT JOHN
"DOESN'T RESPOND TO POLLING CALLS?"
>> (indistinct speaking). >> YOU--
(audience laughing)
YOU SEND HIS NEIGHBOR OVER TO ASK, RIGHT?
NO!
THERE ARE "FUDGE" FACTORS BUILT IN TO ALL THESE THINGS.
AND THESE ARE VERY EMPIRICAL, OKAY?
THEY'RE BUILT ON HISTORY.
IF WE KNOW THAT A CERTAIN PERCENTAGE DON'T RESPOND TO THESE QUESTIONS,
MAYBE THERE ARE SOME SIGNIFICANT CHARACTERISTICS
ABOUT THOSE PEOPLE WHO DON'T RESPOND--
THEY'RE ALL GONNA VOTE FOR THE SAME PERSON MAYBE...
OR LEANING THE SAME WAY.
SO-- AND I'M MAKING THAT PART OF IT UP,
BUT THAT'S THE IDEA, IS THAT THESE FUDGE FACTORS--
THE REALLY, REALLY GOOD POLLING ORGANIZATIONS,
THE ONES THAT PEOPLE PAY A LOT OF MONEY TO,
ACCOUNT FOR THOSE KINDS OF ANOMALIES.
THEY AREN'T ANOMALIES.
THEY ARE PART OF REAL LIFE.
SO, YEAH, YOU GOTTA...
YOU GOTTA ADDRESS THOSE KINDS OF THINGS.
>> (indistinct speaking).
>> SURE. >> (indistinct speaking).
>> OKAY, AND-- DID-- I'M NOT AWARE OF WHAT HAPPENED.
DO YOU KNOW IF ANYBODY DUG INTO WHY--
>> PEOPLE LIE. >> THAT'S WHAT I WAS GONNA SAY.
IT'S LIKE, "OH!
"MY OPINION IS VALUABLE.
"I CAN MESS YOUR POLL UP.
"I DON'T LIKE POLLSTERS."
>> EVEN MORE THAN THAT, IT WAS THE EXPECTATION
THAT THERE WAS A RIGHT ANSWER AND (indistinct speaking).
>> AH, AH, OKAY. >> (indistinct speaking).
>> OH, YEAH, SURE.
YEAH, PUBLICITY REALLY CHANGES, YOU KNOW?
I'M SAYING ALL THE RIGHT STUFF,
'CAUSE THERE ARE A LOT OF MATHEMATICIANS IN THE ROOM HERE.
IF THAT MAKES SENSE.
WHAT ELSE?
WELL, I HAVE TWO QUESTIONS.
ONE IS...
ESPECIALLY-- I'D LIKE TO LIMIT MY AUDIENCE HERE, FOR JUST A SECOND.
THOSE OF YOU WHO ARE GETTING EXTRA CREDIT FOR BEING HERE
FROM THAT STATS CLASS-- WAS THIS HELPFUL?
DID THIS HELP YOU UNDERSTAND...
CENTRAL LIMIT THEOREM A LITTLE BIT?
OKAY.
ALL RIGHT, GOOD.
I'M GLAD IT DID.
NOW, FOR EVERYBODY-- IF YOU KNOW SOMEBODY
WHO CAN PROGRAM IN FLASH
OR WHATEVER THE CURRENT LANGUAGE IS, AND PRESERVE--
I ACTUALLY HAD SOMEBODY AT ANOTHER SCHOOL PROGRAM THIS THING FOR ME,
BUT WHEN HE DID, ALL THE NUMBERS WORKED
BUT IT LOST ALL THE FEATURES THAT HELPED PEOPLE UNDERSTAND CENTRAL LIMIT THEOREM.
SO, I SCRAPPED THAT.
I'M LOOKING FOR SOMEBODY WHO CAN PROGRAM THIS
IN SOMETHING OTHER THAN VISUAL BASIC, BECAUSE EXCEL--
ER, MICROSOFT TRIED TO GET RID OF VISUAL BASIC
THIS RECURSION OF OFFICE,
BUT THEY COULDN'T PULL IT OFF BECAUSE SO MUCH IS BASED ON THAT.
BUT EVENTUALLY, VISUAL BASIC'S GONNA GO AWAY.
AND I'D LIKE SOMETHING THAT WOULD BE STABLE ENOUGH
THAT WE COULD PUT OUT ON THE WEB... PEOPLE COULD USE THIS, SO...
AND IF YOU HAVE SUGGESTIONS, I'M OPEN.
>> (indistinct speaking). >> PARDON?
>> I'D RECOMMEND FLASH. >> YOU'D RECOMMEND FLASH?
SOMEBODY-- >> I CAN'T DO IT MYSELF, PERSONALLY.
I'VE PLAYED AROUND WITH IT A LITTLE BIT, BUT I'D RECOMMEND FLASH.
>> OKAY... THAT'S WHAT THIS GUY DID IT IN.
BUT HE DIDN'T UNDERSTAND THE MATH THAT I WAS TRYING TO GET ACROSS.
>> (indistinct speaking). >> JAVA?
THAT WOULD MAKE SENSE TO ME.
JAVA'S GONNA BE AROUND A LONG TIME.
AND IT'S-- IT'S EMBEDDED IN--
I MEAN, SEVERAL LANGUAGES CAN SUCK JAVA IN.
ANY OTHER QUESTIONS?
WELL, THERE'S COOKIES AND POP THERE, SO...