NIST Colloquium Series: Why (and how) Bayes' Rule Rules-Author Sharon Bertsch McGrayne




Uploaded by usnistgov on 17.05.2012

Transcript:
[ Music ]
[Noise]
>> Good morning.
Welcome to this morning's staff colloquium.
I just want to start out by asking by a show of hands,
how many of you out there are-- consider yourselves Bayesian?
[Laughter] Bayesian.
[Laughter] Yeah, about 4 or 5 of you.
[Laughter] Well, I met someone.
He's in the first row here, Tom Herzog,
just before we started here, he used to work at the NSA.
And he's a true Bayesian.
Now, what's a true Bayesian?
A true Bayesian is one who has "Bayes" on his license plate.
[Laughter] I mean that, that's going some.
So, what is all the fuss about Bayes statistics?
I've been hearing it myself now for the last 2 or 3 years
from some of the scientists in the Physics Laboratory,
especially those who are involved
in international intercomparisons.
And they kept coming to me, so like I'm supposed
to make the decision or something.
"We got to, we've got to use Bayes statistics."
I said, "What's, what's Bayes statistics?"
"Oh, it's the only way to go, the only way to go."
Anyway, there's a lot of talk about using it and we go
to the Statistical Engineering Division here and asked them
about it, and there's 1 or 2 people there
who really into Bayes.
They don't have it on their license plate yet,
I don't think but-- anyway, what's it all about?
Sharon Bertsch McGrayne is an author of books
about scientists, the discoveries they make
and their impact on science.
And the-- what she's going to talk about today is this book
that she just wrote called Bayes Rule,
The Theory That Wouldn't Die.
And when I saw that book-- I don't know how I saw it,
on the Internet somewhere, I thought this is a natural,
because people are still asking me to make a decision
about Bayes statistics.
Don't know what it is.
But I know Sharon Bertsch McGrayne.
She writes great books, and she gives good talks as well.
She's already given 2 other talks
in the NIST Staff Colloquium series on 2
of her previous books.
The one you may remember, "Nobel Prize Women in Science,"
was about 10 years ago.
She told me now she wrote it 20 years ago.
But it's timeless, "Nobel Prize Woman in Science."
And the second one was about chemistry and the making
of the modern world, and this was called
"Prometheans in the Lab."
And it's a series of article--
of chapters about some of the great discoveries in chemistry,
how they were made, and then ultimately the impact they had
on society.
This-- so this is her third talk at NIST, and I couldn't resist,
because like I say, everybody's been kind of curious
about what Bayes statistics are.
It's a real challenge.
You're going to have to explain a whole theory, right?
Sharon is a graduate of Swarthmore College,
which is outside Philadelphia.
She was a prize-winning journalist
for Scripps-Howard and other newspapers.
And she was also a former editor of the Encyclopedia Britannica
and a co-author there of articles about science.
She's written really 5 books by now.
Her other 2 books are-- this is another good one,
actually I haven't read this one, it's 365, one a day,
"Surprising Scientific Facts, Breakthroughs, and Discoveries,"
and another book called "Iron,"
which is nature's Universal Element, which she co-authored
with Genie Mielczarek, who some of you know
from George Mason University.
She's written in Science, Discover magazine,
Scientific American, and 4 or 5 other popular magazines
where she writes popular scientific articles for them.
She's been interviewed on TV,
and I can see why she'd make a great interview,
because just talking to her for about an hour,
I felt like Charlie Rose.
Ellie Heather Evans [assumed spelling] was there
to hear the interview.
But yeah, she's been interviewed by Charlie Rose.
She's been on PBS and on public radio as well talking about,
probably her books but also some of these discoveries
in chemistry for example.
Her books have received excellent reviews in Nature,
Scientific American, Physics Today,
and this particular book was recently--
got a full page review in the New York Times.
It was an excellent review.
In fact, one of the quotes is-- oops, the abstract is gone.
[Laughter] One of the quotes was,
"In case of emergency, read this book."
[Laughter] But it said, "If you're not a Bayesian,
maybe it's time you became one."
That was in the process
of New York Times making it an editor's choice.
So, would you join me
in welcoming Sharon Bertsch McGrayne.
[ Applause ]
>> Okay.
I'm going to push something.
>> Number 2, actually I should have done this for you.
>> Table 2?
>> Right there.
[Inaudible Remark] [Laughter]
>> I often--
>> I pushed the wrong button.
>> -- begin my talks by saying that I can mess
up almost any mechanical system and have done so.
>> Okay, PC number 2 doesn't seem
to be flashing-- there it is.
>> I think it was my fault.
I begin all my talks with some truth in advertising
that I am not a scientist.
I'm not a mathematician or a statistician.
I write books about the history of science, so I'm not going
to tell you how to calculate a Bayesian problem.
You will have to use your far greater resources
and backgrounds to do that.
I will not be doing that.
However, when I began writing "The Theory That Wouldn't Die" 8
or 9 years ago, I was thrilled when I googled Bayesian one day
and got a hundred thousand tips.
If I googled last week, I got 12 million hits, okay?
So there's been an explosion
of interesting Bayes just quite recently.
Exhibit A, Air France Jet Flight 447 took off
from Rio de Janeiro bound overnight
for Paris two years ago last April.
It hit a high altitude, very high-intensity storm
and disappeared without a trace.
A few weeks ago in Paris, I spent the afternoon
with Olivier Ferrante,
who is the French Civil Aviation Engineer in charge
of finding the wreckage of Air France 447.
They were looking for 2 black boxes,
which as you can see are actually red and white.
They are the size of shoe boxes--
[ Pause ]
-- and they had to search in what Ferrante calls a vast area.
I said that a lot of the newspaper magazine articles say
that it's the size of Belgium.
He said, "No, Belgium is flat."
I put it-- the overlay on top of the map of Switzerland,
because we were looking in an area the size of Switzerland
with the mountainous topography
of Switzerland, 4,000 meters deep.
After almost 2 years of fruitless searching by some
of the world's greatest oceanographers,
Ferrante hires a local firm in Reston, where I went yesterday,
has many of the same people
who developed Bayesian naval search theory, and are talked
about in "The Theory That Wouldn't Die," specifically,
it's a firm called today they're in a firm called Metron.
And their Bayesian search software said,
"Look at this particular area."
And Air France 447, the wreckage was found
after a undersea search of 1 week.
A 2-year fruitless search ended after Bayes pointed an area
and they did 1 week of undersea searching, okay?
When I asked Ferrante what Bayes had done for the search
and for him, he said, "It was an external eye.
It was neutral, rational, and methodical.
It could assemble and assess all the data
that had been gathered for 2 years."
They had not only the oceanographers undersea search
for 1 whole summer, north of the site, they had,
the Russians had analyzed 8 or 9 crashes,
there was a South African Boeing crash, and then there were all
of the assessments of the equipment that was used.
And then after making all of this assessment, combining all
of the data, they calculated the most probable region to look
in the state of Switzerland and then made a day-to-day plan
for Ferrante to allocate his assets, as he calls them,
hour by hour, until the wreckage was found.
Now for me, one of the revolutionary things
about this is that the authorities publicly
credit Bayes.
And we're going to see that for decades of the 20th century,
there were many people who were afraid
to even mention the word Bayes, okay?
So, I would like to start with Google's car,
Exhibit B. There's been an explosion of interest just
in the past few months, as a matter of fact.
This is a Scientific American article,
but it's about the deeply Bayesian driverless car.
It starts with a space theorem, you'll see says,
you start with your original assessment,
and that's Google's maps that we all use, and then you add to it,
update that information from the sensors on top of the cars,
about traffic conditions, about new detours and potholes
and construction sites and so on.
And they calculate what probably the safest way to drive
at that particular moment.
And if any of you know the name of Persi Diaconis,
he's a Stanford theoretician.
He says, "Every nut and bolt of that car is Bayesian."
[Laughter] New York Times, Sunday a week ago,
2 Bayesian stories on the front page
of the Sunday New York Times.
If I can get this one to work-- here we go.
Neither story mentions the word Bayes.
But once you understand what the theory does,
you'll spot it everywhere.
This is the story about a deeply Bayesian software
that teaches children mathematics.
And there are questions now about the statistics
that were used to prove its effectiveness.
And this story up here "Clamping Down On Rapid Trades
on Wall Street," that's highly Bayesian,
lot of Bayes used on Wall Street.
In addition to this, if Monday's--
Sunday's New York Times was not enough.
Tuesday, they ran a story, again,
no mention of Bayes, down here.
Two professors named Nobel Prize winning economists for work
about cause and effect, they use Bayes.
So Bayes is all around us.
There's also a story that's circling the Internet
like crazy, a Guardian newspaper reporter, 2,
3 weeks ago broke a story that at the same time
that Bayes was finding Air France 447,
a British appeals judge was banning Bayesian statistics
from British courtrooms.
It involved a case-- a Murderer T, he is referred to as.
Murderer T had been convicted--
one of the pieces of evidence was a print from a Nike shoe,
and a footwear expert witness appears and--
about the probability that that print came from a pair
of Nike shoes found in Murderer T's home.
The judge said, "You do not know the specific precise number
of Nike shoes in the UK at the time, I want firm numbers,
and until the firm numbers--
their Bayesian statistics is banned."
There is now international committee of lawyers
and statisticians working on the problem, but they think
that this ban will affect every case in the UK
that involved circumstantial, that is uncertain evidence.
So Bayes is all around is.
It's in our spam filters.
It's embedded in Microsoft and Google.
It searches the internet from the webpages we want,
clarifies-- we go to the doctor,
it clarifies our MRI and PET scan images.
The military uses it for robotic vehicles
to supply troops in combat.
They hope that it will help build better prostheses
for amputees.
And they, sharpens the images, for example,
that the drones took of Bin Laden's compound.
It's used in astronomy and physics, genetics,
machine translation, a foreign language,
the list goes on and on.
But I'm afraid that to understand this real explosion
of interest in Bayes and use of Bayes and why some of you here
in this room are real revolutionaries,
we have to go back to the beginning,
and that's Thomas Bayes.
And excuse me, but I'm not going to show his picture
because it's-- we know very little about Thomas Bayes.
He was a reverend, a minister, wealthy, Presbyterian minister
and an amateur mathematician who lived
in an elegant spa resort near London in the 1740s.
We know very little about him.
The picture that I'm not going to show you that's
on the poster actually, it's everywhere,
it's in the New York Times, it's everywhere, is indubitably
of someone who, named Bayes who lived much later.
[Laughter] In addition, we don't know his birth date
and Wikipedia just corrected his death date.
So-- but, given the time constraints,
I'm going to race a bit from--
starting with Thomas Bayes up until the Second World War
and then I'll slow down at that point.
But I hope we'll see 2 big patterns emerging.
First, that Bayes becomes an extreme example of a gap
between academia and the real world.
And second, that military super secrecy during the Second World
War and during the Cold War had a profound effect
on the development of Bayes.
Now, one thing we do know about the Reverend Bayes is
that he discovered his theorem, super simple theorem,
during the 1740s, during the midst
of an incendiary religious controversy
in the western world.
The issue is not unfamiliar to us today.
It was whether or not we can take evidence
about the natural world and make rational conclusions about God--
we would say God, the Creator, Bayes' generation said God,
the Cause, or God, the Primary Cause, First Cause.
We don't know whether Thomas Bayes was interested
in proving the existence of God, but we do know
that during the 1740s,
he explores the issues mathematically
of cause and effect.
So his really simple theorem--
there's no argument about the theorem, okay?
The problem is that Thomas Bayes said, "We start with PA
and that can be a guess about a situation."
And he said, "If you"-- he uses the word guess.
"Then you're going to update it with the probability of evidence
and you're going to wind up with a much more realistic guess,
and then you're going to iterate over and over again.
It commits you to redoing the calculation each time you got a
new piece of information."
But when he said that you start with a guess
and then he compounds the thing, the controversy by saying,
"If you don't even know enough to make a real guess,
just start out with 50-50,"
that inflamed people for many, many years.
The English economist John Maynard Keynes thought
that this was a rational way of learning by experience,
and he had a quote that has a little bit of-- the knife in it.
He said, "When the facts changed, I change my opinion.
What do you do, sir?"
But this fact that you can start with a guess--
a 50-50 guess was very difficult.
Bayes himself did not believe enough
in his theorem to publish it.
He files it away in a notebook
and he dies 10 or 15 years later.
And going through Bayes' papers, his younger friend,
Richard Price, who was a hero at the American Revolution
that our founding fathers thought the name
of Richard Price would live forever,
he goes through at the family's request and look at--
looks at Bayes' mathematics papers and decides
that this will help prove the existence of God.
He spends 2 years off and on editing it, throws out a--
Bayes' original essay and gets it published
in a journal that's read primarily by the British gentry
and not by professional mathematicians.
And so, it sinks from view.
And by rights, we should be calling it,
as they did until about 50 years ago,
we should be calling it Laplace's Achievement.
This is Pierre Simon Laplace.
You all know the Laplace transform.
He was, unlike Thomas Bayes,
the quintessential professional scientist.
He mathematimizes every known field
of science during his times.
As a young man in Paris in 1774, he discovers
on his own Bayes rule, and he calls it the
"probability of causes."
He spends the next 40 years of his career off and on,
in between other projects,
developing Bayes into its modern form.
And then he actually uses it.
He speaks at the end of his life very fondly
of what we now call Bayes rule,
because it produced big numbers for him.
And he used the big numbers to develop the calculational tools,
the shortcuts, the approximations that scientists
and mathematicians use for, until the age of computers.
Course they weren't big numbers like the one's that you all use,
but he was using a goose quill and a pot of ink, so for him,
they were very big numbers and he talks
about how very difficult it is to calculate with and assess it.
Until about 50 years ago, Bayes rule was known
as Laplace's Accomplishment.
Now, over the course of Thomas Bayes' lifetime
and Laplace's lifetime, scientists
and governments work very hard
at accumulating more trustworthy data.
And by the time Laplace dies in 1827,
the western world has really accumulated, for the first time,
a large data set of precise and trustworthy data.
And it becomes-- it becomes a mania, a fad, there are clubs
that go out looking for precise
and objective numbers, even women do it.
And some of the famous ones are the chest sizes
of Scottish soldiers, the number of Prussian officers killed
by kicking horses, and the incidents of cholera victims.
The clubs tended to go into lurid details, like, you know,
the number of murderers, the number of murders by night,
the number of suicides, this kind of thing,
but it was veritable fad.
And with lots of precise and objective numbers,
any sophisticated statistician preferred
to judge the probability of an event to our situation
by how frequently it occurred,
something that they had never been able to do before.
And eventually, they become known as frequentists,
and they will become the chief opponents of Bayes rule
up until very recently.
For them, modern science requires both objectivity
and precision.
And Bayes, of course, starts with a measure of your belief
in a situation, makes approximations,
and the frequentists called this "subjectivity run amok,"
ignorance coined into science.
By the 1920's, scientists generally thought of Bayes
as smacking of astrology, of alchemy.
One of them said, "We used Bayes' formula with a sigh."
That's the only thing available under the circumstances.
But the surprising thing is, that you find that all
of this time that the sophisticated statisticians
and the philosophers were denouncing Bayes rule
as impossibly subjective, they refer to it
as the subjective prior, that PA, the people who had to deal
with real-world emergencies, who had to make 1-time decisions,
who couldn't wait for a full and complete data set,
they kept right on using Bayes rule because for them,
Bayes is the thing that they could use with that they had.
So for example, Bayes-- Poincare uses Bayes to help free Dreyfus
from prison for treason in the 1890s in France.
Artillery officers in France and Russia
and the United States used Bayes
to aim their artillery in both World Wars.
They used Bayes to test their ammunition and their cannons.
The Bell telephone system almost doesn't survive a financial
panic in 1907, but it uses Bayes to automate and survive.
And the U.S. insurance industry was under orders
to start our very first social insurance program,
Worker's Compensation Insurance, almost overnight,
and they were able to do
so without very much claims information at all, safety,
injury evidence at all about American industry, using Bayes,
because it helped them make decisions with what they have.
Now, fortunately, every good book needs a villain,
and we have one.
[Laughter] And that is Ronald Aylmer Fisher.
They're both photos of Fisher.
He was a giant in statistics.
He founded modern statistics for scientific work.
He's a superb geneticist, we-- randomization methods,
sampling theory, experimental design methods,
all great achievements by Ronald Aylmer Fisher.
But despite Bayes' usefulness, he starts attacking Bayes
in the 1920s and 1930s.
And theoreticians' attitudes, in large part because Fisher is
such a giant, will change from tepid toleration
to outright hostility.
Unfortunately for a rational discussion about Bayes,
Fisher had an explosive temper.
He called it the bane of his existence.
He-- his colleagues said that he interpreted any scientific
question that you might ask him as a personal attack.
And his life becomes a sequence--
"a sequence of arguments of scientific fights, often several
at a time, at scientific meetings
and in scientific papers."
And the thing that Fisher hated most was Bayes rule.
He didn't need Bayes.
He didn't work with great amounts of, of uncertainty.
His first job was in a research--
an agricultural research station,
and he knew the precise amount of fertilizer added
to every single tiny plot
in that research station back for decades.
When he's working in genetics, he fills his house with cats
and dogs and thousands of mice for a cross fertiliz-- a cross--
fertilization experiments--
cross- breeding experiments, excuse me.
And he's a fervent, fervent, fervent eugenicist
and geneticist, and he can document the genealogy
of each animal back for generations.
So, he could design his experiments,
they were repeatable, they produced precise answers,
and he called Bayes' approximation and measures
of belief an impenetrable jungle.
He wrote, "It is founded on an error
and must be wholly rejected."
And he kept up a very personalized fight against Bayes
into the 1950s when an NIH biostatistician is using Bayes
to show that cigarette smoking was not just associated
with lung cancer but actually caused it.
This was, uh, Jerome Cornfield, first at the Department of Labor
and then at NIH and then goes to George Washington University.
Fisher was a chain smoker.
That's why the left picture is there.
He even went swimming with his pipe in his mouth.
[Laughter] He becomes a paid consultant
to the tobacco industry.
And back into a corner by the NIH Jerome Cornfield,
during the '50s in a long series of debates,
he comes up with a proposition that, believe it or not,
not that smoking causes lung cancer
but that lung cancer probably causes smoking.
[Laughter] So as a result, by 1939,
when the Second World War breaks out, Bayes was virtually taboo
as far as sophisticated statisticians were concerned.
Fortunately, Alan Turing was not a statistician.
He was a mathematician.
And besides fathering the modern computer
and modern computer science, software,
artificial intelligence, the Turing machine, the Turing test,
he also fathers the modern Bayesian revival.
So I want to switch gears a bit and dwell
on Alan Turing's story.
First, his anniversary of his birth is next year.
Second, he's a hero of mine.
And too, his story illustrates how Bayes worked as a paper
and pencil method, as embedded in one
of the first computer techniques,
and as an illustration of the effect of military secrecy.
Now, when the World War-- when France falls during the war,
it's important to remember that Britain can only feed 1
in 3 of its residents.
Britain had depended on the continent, and particularly
for France, for food and for strategic supplies.
So Britain would be totally dependent on convoys
of unarmed merchant seamen making their way up the coast
of South and North America,
meeting the Saint Lawrence seaway, and making their way
across the Atlantic Ocean.
They were attacked by U-boats along the way.
In fact, U-- German U-boats would sink almost 3,000
of these merchant marine ships and killed more
than 50,000 merchant seamen.
Hitler thought that the U-boats will win the war,
it's what he said, because they would starve Britain
into submission.
And Churchill writes later, that the only thing
that really worried him during the war were those
U-boat attacks.
Now, the German Navy ordered those U-boats
around the Atlantic via radio messages that were encrypted
with word-scrambling machines called Enigmas.
This is an Enigma machine.
To standardize their communications,
the German military buys 40,000 Enigma machines
and distributes them to all of the services.
So the Air Force got some, the Army, the foreign service,
the German railways, their allies in France and-- in--
I'm sorry, in Italy and Spain got them.
And the German Navy develops the most complex set
of security standards and the most complex
and difficult cryptography setups of all of them.
And this comes from Frode Weierud's CryptoCellar website
out of CERN, and it is actually a naval Enigma machine
and that's why I like it so much,
even though it's a dark slide, I apologize, but it's actually one
of the machines that Turing will use both Bayes to attack.
Now, an Enigma machine looks much
like an overgrown typewriter.
But it had wires coming out of here that could be changed,
you could change these wheels up here, it had code books,
it had tables, it had an enormous number of complexities
that could be changed within hours or days.
As a result, it could produce millions upon millions
of permutations, and no one in Germany
or in Britain ever dreamt that the British would be able--
or that the allies would be able to read the orders
that they were sending out to those U-boats.
Now, Turing had been a postdoc the summer
of 1939 in New Jersey.
But he returns during the summer to Britain and he spends
at working alone by himself
on cryptography on the Enigma codes.
He goes up occasionally to confer with decoders
at the super secret decoding center north
of London called Bletchley Park.
And he had orders that the day after war is declared,
you must report to Bletchley Park.
So on September 4, 1939, the day after war is declared,
Turing goes, follows orders and goes to Bletchley Park,
where he will spend the next 6 years on decoding
and coding issues and the machines
that will be used for decoding.
And excuse me, not all of those 6 years are spent
on Bletchley Park, but the decoding issues
and the computer issues will occupy him.
When he arrives, he was 27 but looked 16, just a postdoc.
He was shy and nervous.
His mother sent him proper business suits to wear to work.
He preferred a shabby sports coat.
He had lived openly as a homosexual at Cambridge,
and he arrives, and no one is working
on the all-important naval codes that are fending the U-boats
against these unarmed merchant ships.
Turing liked working alone though,
and he says after a few weeks, no one else was working on it,
anything about it, and I could have the project to myself,
and he starts to work.
The English TV channel 4 is doing a biography
of Turing that's supposed to show next month,
and I went to Bletchley Park to be interviewed, and there I saw
in the stable a little turret, 2 or 3 stories high,
a little tower, sort of like a Rapunzel tower, and Turing went
up to the top, and that's where he worked for some times
to get some peace and quiet.
And the women who were working for him rig a pulley
up to the top and send up baskets of food and drink
so that he doesn't have to take any breaks.
Now, the first thing he does when he gets
to Bletchley Park is that he redesigns a machine
to eliminate the wheel arrangements--
up here, to eliminate any wheel arrangements
that do not produce the words he thinks are going to be
in those German codes in the original German message, okay.
Then he develops a very Bayesian system that let him guess,
Bayes' word, let him guess a structure of letters
in the original message, hedge his bets, measure his belief
in their validity by assessing their probabilities,
and then add more clues as they filtered into Bletchley Park.
Now, Frode Weierud is involved as avocation
with a group that's using modern computers to try
to break remaining Enigma codes, and he says that even today,
a modern computer can take weeks or months
to solve a naval Enigma machine by brute force.
That is, if all you know is the original language
that the original message was written in.
But, if you have a machine like the one that Turing invented
to test the possible wheel combinations
and if you can guess some of the words
in the original German message,
then a modern computer can break a naval Enigma machine
in seconds or even less than 1 second.
But of course, Turing didn't have a modern computer.
But the principle remains the same.
He had his machine and next, he needed to guess some
of the most probable words that would appear in those messages.
So Bletchley Park begins collecting clues to the words
in the German messages.
And among the most fertile area for them where,
the Germans had stationed weather-reporting ships
across in North Atlantic.
Unfortunately for Turing,
weather has a rather limited vocabulary
and it's often repeated.
So they had messages like weather for the night,
beacons lit as ordered, this kind of thing.
They could refine the probabilities of some
of those messages by the weather reports that they got
from British wheather stations
in the northern part of the channel.
A German POW tells them that the Enigma operators spelled
out the words for numbers.
So Turing realized that the Enigma machines,
90 percent of them, have the word EIN in it,
a 1 for "A" or for "an."
They knew the most probable letter combinations, of course,
in German, and then they figured that at least some
of those German Enigma machine operators sometimes were going
to be tired or lazy and turned the wheels only a few notches
instead of a lot when they changed their codes,
their wheel arrangements everyday.
But in the fundamental breakthrough, Turing realizes
that he can't systematize his hunches or compare their high--
their probabilities without a unit of measurement.
He names his unit a "ban" for the town
of Banbury that's nearby, and he defines it
as "about the smallest change in weight of evidence
that is directly perceptible to human intuition."
And when the odds of a hypothesis reached 50 to 1,
he and his staff figured they'd gotten the message right,
or the words in the message right.
This was, of course, basically the same as the bit
that Claude Shannon discovers by using Bayes
at roughly the same time at Bell Telephone Laboratories.
Claude Shannon tells David Kahn, who's the author
of that classic history of cryptography published in 1967,
he said, "Bell Labs were working on secrecy systems.
I had worked on communication systems, and I was appointed
to some of the committee studying crypt analytic
technol-- techniques.
The work on both the mathematic theory of communications
and cryptography went forward concurrently from about 1941.
I worked on both of them together, and I had some
of the ideas while working on the other.
I wouldn't say that one came before the other.
They were so close together you couldn't separate them."
And now, one thing we-- another thing we really don't know
about Turing is where he got his Bayesian system.
Did he get it all on his own?
The lone defender of Bayes
at Cambridge during the 1930s was a man named Harold Jeffreys,
who used it for-- to find the epicenters of earthquakes
and the origins of tsunamis.
And Turing might have heard about that
from Jeffreys' lectures,
or he might have devised it on his own.
But his assistant, Jack Good, asks him at the--
at one point, "Aren't you really using Bayes?"
And Turing says, "I guess so."
So he was aware of Bayes at some level.
But by June of 1941, a year and a half after the war starts,
Turing and Bletchley Park could read those U-boat messages
within an hour of their arrival at Bletchley Park,
and the British could reroute the convoys safely
around the U-boats, and for most of June of 1941,
a time when Britain was still fighting alone,
no convoy was attacked.
Now the by the autumn of that year, 1941, the--
his Bayesian system was running critically short of typists
and junior clerks, which they called Girl Power.
[Laughter] And Turing and 3 others
of the decoders write a personal letter to Churchill,
and one them delivers it to Downing Street
and convinces the general in charge to give it to Churchill
and Churchill reacts immediately
and provides them with more resources.
Ian Fleming of James Bond fame even gets into the act
and plans a super elaborate raid
to capture code books for Turing.
I had to read the plan several times before I understood it,
so I think it was probably fortunate it was called off.
[Laughter] The navy--
the British Navy collected code books for Turing
from sinking German ships,
and 2 young men lose their lives trying to get them out in time.
Now, the system doesn't always work.
The German Navy adds a fourth wheel, and if you'll look
up here, there are actually 4 wheels in this one.
And at that point, Bletchley Park couldn't read the
U-boat orders.
But eventually when the Americans begin making enough
of wheel-- Turing wheel testing machines,
breaking Enigma codes becomes routine, it's like a factory.
But shortly after the German's attack Russia in June of 1941,
the German Army starts using a super-sophisticated cryptography
system coding, code called the Lorenz Codes.
And they are used for--
to communicate among the top-level Army commanders
in Europe, and some of them are so important
that Hitler actually personally signs them.
A team of British mathematicians resorts
to every technique they can think of, including Bayes rule,
pryors, Turing's Bayesian Scoring system,
these fundamental units of bands,
and then they incorporate the Bayesian methods
into the computers they built to decrypt the Lorenz Codes,
are the computers called the colossi.
And these, of course, are the first large-scale electronic
digital computers.
They were built for the special purpose of decoding.
But by the end of the war, by the 11th model, they are capable
of doing more than that, and they were far ahead of anything
that we had in the United States at the time.
Now, the engineer who built the colossi, was in charge
of building it, was called Thomas Flowers,
and he was given strict orders to have model number 2
of the colossi available and operational by June 1 of 1944,
and he was given no reasons why.
And he and his team worked-- he describes it:
"We worked until we thought our eyeballs would drop out."
But they get the model ready by June 1 and on June 5,
a message from Hitler to Erwin Rommel, the--
his commander in Normandy, is decoded and raced by courier
to the-- General Eisenhower, who is having a staff meeting
at the time, about when to launch the invasion of Normandy.
The courier gives Eisenhower the sheet of paper
with the decoded message on it.
In it, Hitler says, "To Rommel: If there is an invasion,
do nothing for 5 days, because it will be a diversionary feint,
and the real invasion will happen elsewhere 5 days later."
Eisenhower reads this, he can't tell his staff
about Bletchley Park, about the messages being decoded.
He gives the sheet back to the courier.
We get this story from Thomas Flowers who's told this much.
And he turns to his staff and says, "We leave in the morning,"
June 6th, 1944, and Eisenhower later says
that the decoding efforts shortened the war
in Europe by 2 years.
Now, a few days after Germany's surrender in May of 1945,
Bletchley Park gets a very surprising order
from the British Government,
and that is that the entire decoding effort from the war
and the colossi are super secret,
they're not to be mentioned, and the colossi,
except for the last 2 models, are to be destroyed.
And I think one has to wonder today
if those orders didn't prevent Britain from being the center
of the computer revolution later.
Now after the war, Turing of course,
no one knew what he had accomplished,
that he had kept Britain going-- fed and supplied--
during the period they went-- were going on before alone.
So he's working on computers and other projects
when 2 English spies for the Soviet Union flee to Moscow
to escape-- evade arrest.
And one of them was Guy Burgess who had been a diplomat here
in Washington, D.C. In fact, openly homosexual graduate
of Cambridge University, and the U.S. tells the British
that the 2 spies were tipped off
by another homosexual spy graduate of Cambridge University
who was Anthony Blunt.
And the British Government panics at the thought
of a circle of homosexual spies coming out of Cambridge.
The number of arrests for homosexual activity spikes
in Britain, and the first day of Queen Elizabeth II's reign
on February 7, 1952, Turing is arrested for homosexual activity
in the privacy of his home with the consenting adult.
No one knew, of course, that he had helped save his country.
So less than a decade after Britain has fought a war
against Nazis who had conducted medical experiments
on their prisoners, Turing is found guilty and sentenced
to chemical castration.
He, too, takes the estrogen injections.
Over the next year, he grows breasts, and on June 7, 1954,
the day after the 10th anniversary
of the Normandy invasion that he had helped make possible,
Alan Turing commits suicide.
Blunt, of course, is later knighted, and a couple
of years ago, 55 years after Turing's death,
the British government apologizes for its treatment
of Turing, one of his country's great heroes.
Well, where did that come from?
[Laughter] Some of you may have heard how I can jinx any system
I walk near.
So Bayes rule also leaves--
comes out of the Second World War, more suspect--
even more suspect than it had gone into the war.
And as a result, for the next 30 or 40 years during the Cold War,
a small group of maybe a hundred or more believers,
Bayesian believers will struggle for acceptance and recognition.
It's a group so small that one
of them could finance their annual conven--
conferences every-- not annual, biannual, every 2 years
in Valencia, Spain, because he uses Bayes
to make election night predictions
for his local political party in Valencia
and uses the money he earns to finance their meetings.
Now, without any public proof that their method worked,
the Bayesians, of course, were stymied.
When Jack Good, for example,
who had been Turing's war-time assistant, knew Bayes work
from Bletchley Park but couldn't say so, he gives a talk
on the theory at the Royal Statistical Society,
and the next speaker stands up and begins his talk,
the opening words, "After that nonsense."
And when I talked to Jack Good years later,
I can tell you he was still hot under the collar.
[Laughter] Now during Senator McCarthy's witch-hunt
against communists in the federal government,
a Bayesian at the National Bureau of Standards was called,
only half jokingly, an American undermining the United
States Government.
And the National Bureau
of Standards will actually suppress a report
to the U.S. Army's Aberdeen Proving Grounds during the 1950s
because the study used subjective Bayesian methods.
Now, I have to apologize,
the endnote on the quotation about that is wrong.
It's actually from a conversation
with Churchill Eisenhart who was an important statistician here
for many years, both at the Bureau and later at NIST.
And he said-- he said, the reason being--
that he suppresses this report, he brings it up himself
at the very end of the interview.
He says, "The reason being
that this particular Bayesian paper was not empirical Bayes
or anything like that.
It wasn't based on past experience,
it was subjective Bayes.
I was terribly afraid that this fellow's paper would result
in some colonel somewhere telling people who were testing
that he knew where the answer lies
with such and such probability.
Now, build that into your analysis.
I just didn't want the results of munitions testing
to be subject to the personal opinion of a colonel."
Harvard Business School professors develop the Bayesian
trees for MBAs.
Howard Raiffa and Robert Schlaifer are Bayesians,
and the decision trees are deeply Bayesian,
and they are called-- Howard Raiffa is called--
they're called socialists and so-called scientists at Harvard.
And Harvard Business School at-- during this period,
is known as a Bayesian hothouse.
A Swiss visitor to Berkeley's statistics department
in the 1950's, which was very anti-Bayesian at the time,
realizes that it was "kind of dangerous to defend Bayes."
During this period, of course, the military continues
to use and develop Bayes.
Military knows Bayes works,
parts of the military knows it works and keeps it secret.
So for example, the 1950s wrestles with the problem
at how do you deal with something that's never happened.
There's never been-- it's never occurred enough
to have a frequency, to have a sequence to it.
There had never been an accidental H-bomb explosion.
There had, of course, been deliberate testing, but not,
not accidental explosions of the conventional--
of either the H-bomb itself, the nuclear weaponry itself,
or the conventional explosives around it.
So you couldn't predict its probability,
and those who remember Dr. Strangelove,
the movie that spoofs General Curtis LeMay
and his Strategic Air Command, you can appreciate the sort
of David and Goliath sense to a young postdoc named Al Madansky
at RAND who uses Bayes to show
that expanding Curtis LeMay's program would--
could well cause 19 accidental H-bomb explosions a year,
and the Kennedy administration eventually adds safeguards.
That study was classified for many years.
People-- Madansky says people would come up and whisper to him
that you really did something that's really famous.
And he didn't know what had happened at all to his report.
There were other secret Cold War projects, of course.
The National Security Agency cryptographers used Bayes
and cracked the Soviet codes.
And there was an immensely powerful adviser
to the White House and to the National Security Agency named
John Tukey who was a professor of statistics at Princeton
and had a joint appointment with Bell Labs.
And he uses Bayes and his team uses Bayes for 20 years
to predict the winners of congressional
and presidential elections for the Huntley-Brinkley news hour.
That was the most popular news hour during that period.
But he and Tukey insist on keeping Bayes rule secret.
No one can-- write a paper about it, no one can speak about it.
And it's apparently to keep his--
the role of Bayesian cryptography
at the National Security Agency and Institute
for Defense Analyses also secret.
And then the third thing, of course,
is that the U.S. Navy was using it--
developing Bayes to search-- do underwater searching for first,
the hydrogen bomb that's lost in Palomares, Spain,
and then to find the nuclear submarine, the Scorpion,
that disappears without a trace while crossing the Atlantic
and coming home, and then
to catch Russian submarines in the Mediterranean.
And I won't lead you that part of the story,
but I didn't even realize it, but it was arranged
that Admiral Nicholson would tell me the story
about catching-- using Bayes to catch the Russian submarines.
And all through the conversation with him, he kept saying,
"You know, I'm not sure I should mention these wires
that were coming out of a sled, you know,
we were dragging around.
I don't know if I should mention this.
Should I mention those wires?"
And I didn't realize that he was telling the story
of capturing Russian submarines for the very first time.
He was so worried about the wires.
But later, I would have written it a little bit differently
in there, but that's the first time that story came out.
So for many years, during this Cold War,
the Bayesians concentrate on building a logical theory
to make Bayes a respectable branch of mathematics.
And many Bayesians of that generation remember the moment
when Bayes' overarching logic descends on them.
They talk about the epiphany, Howard Raiffa,
the decision treatment at the Harvard Business School talks
about first, his intellectual conversion to Bayes
and then his emotional conversion to Bayes.
He uses the word conversion.
To them, frequentism begins to look like just a series
of ad hoc techniques, whereas Bayes' theorem had what Einstein
had called the cosmic religious feeling.
Now, during this period, both sides, the Bayesians
and the frequentists are proselytizing their methods
as the one and only way to do statistics.
Both sides used religious terms.
When a Bayesian Dennis Lindley was appointed chair
of a British statistics department
that had been frequentists,
the frequentists there called him a Jehovah's Witness
elected Pope.
[Laughter] Lindley still fumes about that.
So he, in turn, when asked how
to encourage Bayes' says, "Attend funerals."
The frequentists reply in kind.
They say if the Bayesians would only do as Thomas Bayes had done
and publish after they are dead-- [Laughter] --
we should all be saved a lot of trouble.
[Laughter]
So the extraordinary fact about Bayes during the Cold War is
that with the military using Bayes and the civilian Bayesians
under attack, there were very few visible civilian
applications of Bayes in the mainstream.
For example, when an MIT physicist named Norman Rasmussen
is asked by Congress to do the first study
of nuclear power plant safety in 1973, the industry's
by then 20 years old, he does a massive study using Bayes.
He predicts what actually happens at Three Mile Island,
he uses Bayes because there's never been a nuclear power plant
big accident before, so he has not much data.
He uses the failure rate of valves and the failure rate
of pipes and this kind of thing, and he used--
resorts then to expert opinion, which Bayes allows you
to combine with more objective information.
And that so incendiary to 1, use Bayes and 2,
during the Vietnam War era to use expert opinion,
he winds up hiding the word Bayes in the appendix
to volume 3 of his multi, multi-volume Rasmussen report.
The only big Bayesian application
in the civilian mainstream is a project that uses the words
in the Federalist papers as data.
The Federalist papers were a series of essays written
by our Founding Fathers to convince the voters
of the U.S.-- New York State to ratify the--
to vote to ratify the U.S. Constitution.
Twelve of them were anonymous and Frederick Mosteller
of Harvard, the one who calls the business school a Bayesian
hothouse, and Frederick Mosteller and David Wallace
from the University of Chicago do a massive Bayesian study,
classification study, using the words
from the Federalist papers' data and conclude 2 things: 1,
that the anonymous papers were almost certainly author--
written by James Madison.
That's a decision that's, that's stood the test of time.
And then they discovered what they called an "awesome result,"
that the century-long argument
over the Thomas Bayes' beginning guess,
this hated subjective pryor, was really quite irrelevant
if you had large amounts of data to update it with.
And Mosteller and Wallace said,
"You really should be spending your time building models
and learning how to do that instead
of fussing over the pryors."
The practical problem was that in order to do this, Mosteller,
who was a super manager, had had to organize a veritable army
of a hundred Harvard students to input data.
They start using adding machine paper,
rolls of adding machine paper,
some of you probably don't even know what they are, but narrow,
little bit like toilet paper.
And then they wind up punching, inputting data
and traipsing it cross Boston and Cambridge to MIT,
which had the computer center.
Harvard at that time did not have a computer center
at the time.
And the sheer organization of this was just too complicated
for anyone else to consider duplicating.
Now during the late 1980s, however, things are changing,
mainly because of imaging.
Medical diagnostics, the military, industrial automation,
they're all producing blurry imaging--
images, and to understand what the original thing looked like,
they needed to use the probability of causes.
What's the cause of this image?
And the first to suggest using Bayes
for image restoration was a man named Bobby Hunt in 1977.
Well, he worked for Sandia and Los Alamos and had used it there
for a strategic weapons problem and it took him several years
to get clearance, but-- so, it was published in 1977.
But by 1984, there was a host of techniques floating
around that was Bayes, Gibbs sampling, Monte Carlo,
Markov chains, iterations,
and 2 men suddenly realize how they all fit together.
They were Alan Gelfand, who was spending his sabbatical
from the University of Connecticut in the UK
with Adrian Smith, a student of Dennis Lindley,
the Jehovah's Witness Elected Pope.
And the 2 of them suddenly realized how it all works
together, and they write their breakthrough synthesis paper
in 1984.
And they're so afraid
that everyone else will see how all the pieces fit together
that they race through writing their paper.
But they also wrote it very carefully.
Twelve-page paper and they mention the word Bayes 5 times.
So I asked Gelfand, "Why, why not more,
why don't you come out, talk about Bayes more?"
He said, "There was obvious some concern
about using the "B" word."
[Laughter] "A natural defensiveness on the part
of Bayesians in terms of rocking the boat.
We were always an oppressed minority trying
to get some recognition.
And even if we thought we were doing the right thing,
we were only a small component of the statistical community,
and we didn't have much outreach into the scientific community
where more people were, indeed, using Bayes."
Bayesians thought this paper was an epiphany.
It becomes at the same time that powerful lap--
desktop workstations become available,
at not-too-astronomical prices, and a couple of years later,
there is off-the-shelf software, called "bugs,"
that becomes available for doing Bayesian problems,
and that comes from Dennis Lindley's academic grandson,
David Spiegelhalter.
That all fits together, and the Bayesians talk about 10
or 20 years of a frenzy of Bayesian computation,
because finally, after 240 years,
they could do really complex realistic problems.
The-- this revolution brings in computer scientists,
artificial intelligence people, physicists,
they all refresh and broaden Bayes.
They depoliticize it and secularize it,
and it's adopted almost overnight.
It's a very pragmatic revolution;
it doesn't change people's philosophies of science so much
as it works and we're going to use it.
The battle between Bayesians and frequentists subsides.
Researchers could finally adopt whatever method best fit the
problems they were working on,
and even prominent frequentists moderated their positions.
Bradley Efron, the National Medal of Science recipient
who wrote that classic defense of frequentism, recently said,
"I've always been a Bayesian."
[Laughter] Thank you.
[ Applause ]