Authors@Google: Albert László Barabási

Uploaded by AtGoogleTalks on 03.09.2010

>>presenter: Good afternoon, everyone. Welcome to AuthorsGoogle New York presentation. This
is Albert-Laszlo Barabasi, a Hungarian scientist. He is the former Emil T. Hofmann professor
at the University of Notre Dame and dis-, current Distinguished Professor and Director
of Northeastern University Center for Complex Network Research and associate member of the
Center for Cancer Systems Biology at the Dana Farber Cancer Institute at Harvard University.
He's here today to talk about his current book, "Bursts", which probably has direct
applications for all of us here at Google. So, without further ado, Dr. Barabasi.
>>Albert: Thank you and thanks for the opportunity of being here. So, well, let's just start
easy in that respect and I wanna show you-- if I can get technological-- a video that
I just discovered, essentially, last week by somebody giving a talk. So, I stole it.
And what you see here, this is what I call, what the original title is "Good Morning",
and it's a visualization of when people say "good morning" around the globe. So, essentially,
it's coming from Twitter and what they have done is that every time somebody, somebody
tweets "good morning," they color code it in real time and what you end up, like the
green is like an early good morning, the yellow is like 9am good morning at the local time
and then black are those people who say good morning at the really odd time.
So, and if you just kind of look at it, you can see how the morning comes around the globe
and how it really spreads and as a wave goes through the US, like the, the orange line
goes through US as, as 9am passes by and, and on one hand it is beautiful and the other
one it's, it appears to be completely useless because who really cares about when people
say "good morning?"
But, on the other hand, it actually powers, shows the power of what you can do with data
today. What you can do with this massive amount of data that is emerging about human behavior.
And, of course, being at Google here today I don't have to preach the power of data because
Google is really about data in that respect. But it also shows how much data we can actually
cover about what we do essentially, on a daily basis. And I'm a physicist and which means
I'm a natural scientist and, and -- which means that I believe that natural phenomena
can be understood, described, quantified, predicted and controlled. And no one would
find anything questionable about that statement. That's what scientists are supposed to do
and that's how scientists are supposed to believe to drive their work. What, however,
if we replace natural phenomena with humans? And if we read the sentence together, then
it says humans are supposed to be understood, described, quantified, predicted and controlled
and that's obviously a very scary statement.
So, I mean, we don't have to panic, of course. That's the good news because the, the, --
there's little secret that we don't really talk about in science is, is that the fact
that the scientific revolution that really determines our life has really stopped at
the border of natural sciences. It never really reached us humans. And what I mean by that?
I mean that we have no problem predicting where an electron will go, but we have --
but we cannot foresee electronic crisis or, kind of emerging financial crisis and so on.
We have no problem turning on and off a gene, essentially, in a cell, but we have serious
difficulties foreseeing wars and battles and major crisis and so on. And the reason for
that is very simple. There is a fundamental difference between humans and bacteria, or,
or other kind of natural organisms because bacteria don't get really annoyed at you when
you put them under the microscope. And the moon will never sue you for landing a space
craft on its face, but--
on the other hand, none of us really would like to submit ourselves to that detailed
data collection that we really submit objects of na-, of the study of natural sciences;
trying to know about them everything all the time. And what I'm trying to do today is to
tell you a bit that this is about to change. And it's gonna, and it's good change essentially
in a very profound way. Now, I mentioned prediction. In order to make predictions, essentially,
you need data; lots of data. Anyone who tells you that they can make any prediction without
data are either palm readers or business consultants.
So, when it comes to humans essentially, what is really changing is that we started to have
data. And you know very well the kind of data I'm talking about. Every time we send an email,
we leave some digital breadcrumb about our social links, about interests and so on. Our
bank knows our ability to pay for things, our tastes, our willingness to pay for certain
things and where we shop for. And, and our mobile phone company knows where we are and
how me move around and, and so on. And so we often don't choose to think about it. But
at the end of the day that we are under multiple microscopes that will record, in real time,
what we do and from these digital breadcrumbs, we actually, one can re-, recover a person's
life with almost minute resolution. So, when it comes to the book Burst, what I tried to
achieve here is to, to kind of talk about the fundamental changes that are happening
thanks to this huge amount of data that is emerging about human behavior. There are lots
of aspects of this data. One of them is privacy aspects and I'm sure being at Google you know
exactly what I'm talking about. But the book, even though it mentions privacy, is not about
privacy. It's about the laboratory that really our society is becoming, thanks to this huge
amount of data that is being automatically collected about all of us and about its revealing
power to understand human behavior. Now, when it comes to human behavior, we have to realize
that why -- I have a fundamental question to ask is that why would a physicist care
about human behavior? And let me kind of rephrase that question. Why would you listen to a physicist--
talking about human behavior or brain science, instead of listening to sociologists or to
a brain scientist or somebody like that? And the truth is that, yes, we in physics started
to care about human behavior, but our real interest and my real interest really is in
understanding complex systems. And there are a bunch of complex systems out there that
are worthy of scientific interest. The brain is one of them. The economy is another one.
The human cell is another one and computer systems is yet another one. But what is happening
the last years is that really, the society's becoming the one where we can get the most
data about the behavior of the individual components. The type of data that is really
emerging about what we do--all of us on a daily basis--would be equivalent of knowing
every, what every single neuron of mine does in real time. Or, would be equivalent of knowing
what every single gene of mine does, essentially, in real time because there's individual data
collected about each individual; about their activity patterns, mobility patterns and so
on. So, so when, what, if you wanna be pragmatic and you believe that complex systems behave
in a similar fashion, then you wanna go in a direction where you have the most data and
where you can make the most advances. And in the last five, ten years, human society
is becoming the archetypical complex system that we can really study in real time. So,
I mean, this is a long pursuit for, at least for me. It started with our studies on networks
and some of you may have actually seen my earlier book, "Linked", where I talked about
how people are connected and how components of a complex system, generally, are connected.
How genes connect to each other, how computers connect to each other and how you characterize
this large network that emerges from that. And then, of course, the second step of this
pursuit is to go from the structure, the network to the, the dynamics of when things happen
and how, how do we understand in time, the type of events that take place in a complex
system and that's really the, the pursuit and the goal of "Burst" in that respect. Now,
when I wrote "Burst" actually, I had a couple of goals in mind. And one of them had to do
with how the book should look like. If you write to a general audience, particularly
about science, there are a couple of clichés that you are supposed to respect. First of
all is that you are supposed to spill all the big ideas in the first chapter, so just
in case you never had the patient, the patience to go all the way to the end, you haven't
really missed anything. So, you can stop at the end of chapter 1. And, and also kind of
the chapters have to be a standalone so that, once again, if you, if you feel that you stop
in the middle of the book you can still go to a cocktail party and chat about reading
the book, itself. So, I kind of, when I wrote "Burst", I said, "Let's, let's try to do something
different in a sense that if you don't have any of these expectations that I described
from the novel, to the contrary, if you get to the end of the novel and there's nothing
surprising that really builds on what happened before, then you are somewhat disappointed."
So, in a way, I built essentially the book to read as a narrative that evolves in time
and, and builds on each other and I can tell you that much that, unless you read it to
the end, you will really not find out along the way what hap-, what happens because some
of the big questions that I'm asking are really kind of addressed fully only to the end and,
and they are kind of big turns in the, in the story as we move along. But let me give
you also kind of an insider connect to "Burst", which is like because it has a couple of different
lines of narratives that are moving together and the first narrative has to do with what
I call "Hasan." This is a picture of Hasan and Hasan is an artist. He's a, he's a media
installation artist, depending on how you wanna define him, but one of the reason that
actually became interesting to us is because something interesting happened to me after
September 11th, that is, after travelling around Africa--and he travels a lot, I mean,
he travels so much essentially that he has a 96 pages thick passport because he needs
that many pages to fit all the pass-, all the stamps that, to the countries he travels
to--and so, he was actually, pulled off the plane once he returned to US after September
11th and was questioned by the FBI in Detroit and that started essentially a hiatus step.
Kind of lasted about six months that eventually ended with lie detector tests and he has spent
many, many hours in the FBI offices; never being really arrested, never being really
officially processed, but nevertheless, constantly harassed about what are his activities. It
turned out that the reason why they arrested him was very simple--because he happened to
have a, a storage box in Florida that he closed down just right after September 11. So, the
local owners of the storage facility looked at him and said, "You know, you know, September
11th, we know that Florid-, people from Florida are [ ] the training happened there. This
guy's called Hasan Alaji. Hasan Alaji, he has brownish skin, so what could he be?"
[Albert laughs]
So and that really kind of put him underwater for about six months and much longer, actually,
so certain activity continues today. But what is interesting about him is not what happened
to him, because he's not alone; it happened to many people. What is interesting about
him is that how he reacted to these events. And the, after some of the, some of the visits
to the FBI, he was asked essentially to every time to reveal where he goes internationally,
partly for his own safety so he can come back to the US and so on, and eventually he decided
that he will start revealing this information publically. So, if you, he has a website that
I haven't yet connected to Internet here, but the point is that if you go to his website,, you can find out in real time where he is precisely. You can get
pictures of the environment where he is. You can get, essentially, not only for where he
is now, but all his past for the last six, seven years. You can get pictures of all the
meals he ate, all the bathrooms he's visited, all the expenses he ever made, all the flight
numbers he ever took, pretty much all his life, essentially, spelled out there. So,
now you parti-, perhaps can understand why it's interesting for us because he's a person
whose activity is publically known. It's available to everybody because he chose to disclose
his life, the way he thinks about it, by disclosing all his activities and putting it out to everybody,
it becomes worthless because the value of the information is based that some people
have it and others don't and that's where the value is derived from and when everybody
has access to the information the value really drops. And, and in his case, he thought that
the value for the FBI for information about him becomes worthless if it's available to
everybody. And he also kind of tracks himself in the behalf of the FBI, the way he thinks
about it. But, as a result, we know a lot about him and one of the things I do in the
book is that we actually analyze his trajectory, his, his habits and we compare it to the habits
of millions of other individuals to try to understand whether is he really unusual or
is he normal? Was the FBI actually right arresting him for, for because he was, the reason he
was arrested was for irregular travel patterns as well. And, and was, was the FBI right doing
so or he's relatively ordinary and there are some actually interesting surprises as we
analyzed him as they come along in the book.
So, that's the first story line that we follow across the book and only at the end of the
book we actually get to understand what's happened to him all the way here today. The
second story line I call it 1514. And it really refers to an event that started in Rome in
1513 at the Papal election and then it went into a big crusades that started, that started
from Hungary and spread to Transylvania and, of course, one of the reasons this was somewhat
interesting for me--and I knew a bit more about it because I happened to be born in
Transylvania so I'm somewhat more familiar with, with the events over there--but why
is this interesting? The reason why this event is interesting, from the perspective of the
book, is because at the certain moment at the beginning of the events, some person gets
up and says, "I know exactly what's going to happen." And he gives, essentially, a prophecy
saying that if you guys do with these decisions and the decision is about starting a crusade
or not starting a crusade, then, then this will be the outcome of the events. It will
lead some really blott-, bloody battles, in which you're gonna be ending up, ending up
on the shorter end of the table. And, and the question that I ask through this particular
story is that was he right making that prediction? And how could somebody in 1513 make a prediction
about the events that will actually happen during the next six months. And, in general,
using this one as a, as a starting point, can we really make predictions about events
that are of such magnitude as, as they appear in history books and so on? And the beauty
about events happening in 1514 that we know exactly what the outcome is, of course. But
I think that the story itself is very fascinating and very much ties into the, the world pursuit
of the book about predictability of human behavior. And the third aspect of the book,
of course, is the science. And I, the code is called "Bursts" because the bursts certainly
refers to an activity pattern that we all beha- that we all follow. And if you look
at anybody's activity pattern in real time--when do we send emails, when do we make phone calls,
so when do we browse the Web--and I challenge you for those of you have access to data like
that to look at it whether it's the case or not. I'm sure it's going to be true because
we looked at lots of data in that respect. These events are not random in time, but they
are conglomerated into bursts. That is, I have short periods of time when I send lots
of emails out and then long periods where I don't have, get a chance to do any. And
then follows another burst of activity when it comes to my email. Same for phone calls,
same for, even essentially, sex, same for, same for visiting the libraries or certain
locations and so on. So one of the things we notice in the last kind of decade is that
really human activity is not entirely random, but there -- it has these conglomerations
into these short bursts. And most important the timing of these events actually follows
very precise laws that we call "power laws". And that is, and it's not kind of following
the random, random patterns that one would expect if really our activity pattern would
be derived by randomness. Now, of course, no one expects that their activity pattern
is random so that's never a question. I don't think that I'm acting randomly and I hope
none of you acts random, either. But the question from the scientific perspective is that what
are the signatures from the norm-, from those normal randomness? And burstness is one of
the signatures. And burstness actually takes us to the next question, which I already outlined
here. I said, "If we were to understand humans the way we understand natural phenomena, we
should be able to predict their activity." Now, prediction is really a very scary word
by itself because the question is what are we going to predict? Am I going to try to
predict what you're going to dream about tonight? Or, will I predict when will you get the next
promotion? Or a chance encounter with somebody and so on. And as I mentioned earlier, in
order to predict anything we need data--lots of data. So, a litmus test of our ability
to predict human behavior really rests on our ability to collect data about some aspects
of human behavior; then we can use later to say how predictable it is. So, when we started
thinking about this problem several year ago, I thought, "Let's collect data about human
mobility; where we are, where we go next." So, I actually, not having access to any other
data, I was so curious whether we are predictable or not that I decided to collect data about
myself. So, I started to wear this tiny, sexy watch that was really a conversation piece
at any single dinner that I went to because it was standing out as a big brick on my hand.
But what it did, essentially, is that it's really designed for runners and anybody else
who's involved in other sports that needs to track their location in time, and what
it does is it tracks in real time where you are and at the end of the day you can put
it into the computer and you get, essentially, your trajectory. So, in my case, I got a map
like that. So, here's how I'm moving around Boston so I can disclose my location because
this is mine. And, and the question that I wanted to ask with this data that if I have
access to my past whereabouts, could I write an algorithm that would tell me where I'm
going to be tomorrow at 3pm? Not that I really care about where I'm gonna be tomorrow at
3pm. I should know better than that, but that's really a test of how well we understand, essentially,
some aspects of human behavior, like human mobility. But as we're, as we're kind of collecting
data I realize that there is already lots of location data collected by individuals
because -- is there anybody in this room who does not have a mobile phone? Ok, nobody's
raising their hand. So, and of course, nope, you're not kidding yourself in this particular
institution about the fact that your mobile phone company knows exactly where you are
at any moment. So, and, and not only do they know where you are and that's normally not
collected, but for billing purposes, every time you make a phone call that data is actually
being collected. So, so the mobile phone company knows not only where you are, but it knows
where millions of other consumers. So, this is much richer data than what I could ever
collect about myself because it allows us to compare many, many individuals and, of
course, mobile phone companies are terribly worried about releasing some of this data
as you very well know here because they wanna, they wanna protect the data for keeping the
trust of their consumers and also for legal reasons and so on. But in the last few years
they realized that there's lots of value in this data, so they started to raise it for
researchers, for other companies and my group happened to be one of those that we were given
quite large scale data about human mobility and calling patterns--of course, anonymized
we don't know who the users are, we don't know their phone numbers, we just know, they
are for us, they are just particles wandering in space like a bromium particle in a gas.
And this really kind of allowed us to ask the question: How predictable we are? So,
lemme give you an idea what is the amount of data that one can actually get from that.
And what you see, of course, is a visualization of the wonderful city of Paris, of how people
move around Paris, thanks to their mobile phone data. And when you see this firework-like
type of jumps out there it doesn't really mean that lots of mobile phone consumers jump
up in the sky and came back, but it just really meant that in order to visualize the high
number of people going from one location to another one, the next street, they really
had to move them out of space, essentially, out of the flat space because otherwise you
wouldn't see the huge amount of activity that is taking place over there. So, this kind
of gives you an idea of, of how much we know about these individuals and once you have
access to the data, you can really start asking the legitimate question: Is this behavior
predictable? Now, one of the questions we first asked in a series of papers is that
what are distances that people travel on a daily basis; how far you move around? And
the answer was relatively simple. If you look at how many individuals move a certain distance,
this is the, this is the typical distance car-, travel by, let's say, a person in a
big period versus the, how many people travel that much, you find that most individuals
tend to stay local; that they just move two to five kilometers on a daily basis. So, they
have a reduce of movement that is relatively small, but of course, there are a few people
out there at the tail of the distribution that move a lot. And the number of individuals
who move a little and versus those who move a lot follows a very precise power law. So,
again, it's, it's, if you have a large population you can predict how many people will be big
travelers, work travelers and how many will be just sticking around in the neighborhood,
moving next door, essentially, to work from their home and so on. So, this was kind of
the first step. And it already indicated there was a lot of heterogametic in the population;
people behave quite differently if you look at the large population. The next thing we
did is that we used this trajectory to measure the entropy of each individual. That is, for
each person we measure the entropy of the trajectory. Now, what is entropy? The entropy
of this particular system is zero. It says that there is completely order system. You
know exactly where each ball is; there is no ambiguity about each ball's position so
that's why we say "from our perspective it has zero entropy." Now, this is a system how
it looks like if it is a disordered system and that would have some kind of larger entropy
clearing on zero because it's random. So, the entropy is a measure of randomness in
that respect, but the reason why we, we measured entropy is because entropy is connected to
predictability. Lemme show you what I mean. If I remove a ball--sorry--and, and you look
at this, you say, "Because the entropy of the system is zero, you have absolutely no
ambiguity about where the use-, about where the ball is missing from." You know exactly
the position of the ball that is missing from there. What, so if I put it back I give you
no new information; very predictable. However, if I show you this configuration then you
have no way of really knowing where I removed that particle from. So, if the entropy of
the system is not zero, then it's much harder to predict. And the same to our, to our trajectory.
If I measure your trajectory, my trajectory, if I get a very, very low entropy it means
that we're very predictable. If I get a very high entropy I have lots of problems predicting
where you're gonna be next time. So, what we, what was our expectation? So, first of
all, we, we're able to derive from entropy a measure of predictability and the predictability
in this case goes between number zero and one. So, this is horizontally is this high
predictability and zero means is that the user is completely, it's impossible to predict.
It's completely random, his or her motion. And the predictability would be one if I could,
if one could, in principle, write a data mining algorithm that, based on the past locations,
could a hundred percent accuracy predict their next location. So, there will be no randomness
whatsoever in the persons motion. He or she goes always exactly the same time back and
forth between home and work, for example. So, our expectation was that because there
is this big heterogametic in the way we behave and we move around, as we showed earlier in
the distances, we would find that most people are difficult to predict because they happen
to have a life and they are somewhat spontaneous and so on. And there may be a few people that
are somewhat easier to predict. So, that was kind of where our starting point, that there
will be this kind of spectrum of behavior. So then we measured the entropy for each user
and then we measured from that the predictability. And on this plot, the predictability looked
like that. So, what is it that we see here? Essentially, this is the predictability across
a large number of users who use mobile phones. And the first thing you notice is that it's,
it's not a small number, but it's a very large number. It's peaked around .93, which means
that, for a typical person, if you know their past locations in principle with a 93 percent
accuracy, you can predict their next, their next location, which is a huge amount of predictability
across the population and when it comes to their next location. What is also interesting
about this curve is that there is nobody under 80 percent. That is, we always think that
there are, so in reality we are all very predictable, we're missing essentially the people who don't
have really a high predictability and really what is surprising is we tend to think that--ok,
fine. I happen to be a regular fellow. I go always to work in the morning or, in my case,
at noon and I come back in the evening and I have my favorite cafeterias and I follow
that path every day and so on. But there are these other cool people around me.
The students and the rockers and so on who are really cool and they are really unpredictable.
They are spontaneous. They're really living out there. Well, what this curve tells you
is that that person may seem cool to you, but is probably only five percent less predictable
than you are.
[everyone laughs]
So, I don't want to put it in terms that we are all boring, but when it comes to our predictability
or at least our locations, it is, it is actually the case. At least that's what the data is
showing to us.
[audience member laughs]
So, why am I bringing this example up and why did I put this one in the book? Is because
I was really trying to show that once you can collect enough data about people you can
really ask this fundamental question how predictable we are and you get really surprising answers,
which kind of if one interpolated into the future, could say, "Is it possible that just
about any aspects of human behavior could be predicted if we collect enough data?" And
that's really kind of a question, that is a question of our times and I'm sure many
of you are thinking about it in this context and, and the answer is very complex. And I
kind of try to advance this argument as I move along the book is that where will this
essentially lead in the long term? So, kind of closing the, the science part for a second
in the interest of time, the last aspect of the book – it's kind of the last vehicle
to getting the message through -- is a very nonconventional one because it has to do with
images. So, if you flip through the book you will see images like that and these are the,
these are drawings and kind of like, it's kind of using different media. It was done
by Botond Reszegh and Botond Reszegh is an artist from Transylvania, from the village
where I'm from, essentially, from the city where I'm from in Transylvania. And what he
and I had been working on for the last few years while writing this book is that to,
I told you there are different layers in the book: the history, the science, Hasan and
so on. So, every second chapter, essentially, has an image like that, which tries to put
these different layers on top of each other. It tries to marry the science with the history.
And this is a good example of that. This comes before the chapter where we start to talk
about the historical, the historical moment when the Pope election takes place in Rome
that starts the history aspects of the book and, and hence, since that particular event
took place in the Sistine Chapel, we see, we see kind of the image of God in front of
the Sistine Chapel. What is interesting is that this particular event takes place a few
months after the painting of Michelangelo was finished, so those electors who were actually
electing the next Pope probably could smell the fresh paint on the wall because it was
just brand new; a few months old. And on top of it you see a Brownian motion because I
talk in that particular chapter about the possibility that we are random. And one aspect
of random mobility is that we move completely randomly. That is, that we move like a random
particle and you kind of see a Brownian trajectory over there. So, as I mentioned, every second
chapter has--and I think these are beautiful images that Botond Reszegh has done--and kind
of the concept of bursts in essence marry-, married with, with some of the battles that
took place in the the event, the concept of chance and weather predictability on the top
and at the same time showing the historical location of some of the events that took place
in the history line. And, and the last reason why, and this one actually is one of my favorites,
once again, on the top those criss-crosses essentially show the trajectory of mobile
phone users, as is recorded by mobile phone companies where they are based on their phone
calls. And what you see in the latter, essentially, is a key letter in the historical line because
it reveals who is the character--who is the main character--of the historical story. And
this letter was written about seven years before the events took place in Transylvania
and one of, and, and I actually travelled quite a bit to find that particular letter
and the reason why this was particularly interesting for me is it was written actually by Leonard
Barabasi who happens to be one of my ancestors who was the viceroy of Transylvania, so, the
second person after the king at that time. So, he himself was very personally involved
and through him, essentially, our family was very involved in much of the events of 1514.
So, it's not a completely independent story. The reason why I picked it is because it has
lots of personal relevance, just like the science in the book because if I wanna conclude
I wanna say in many ways this was a very personal book because I don't -- I try to write about
things that I care about that involves the science part, the type of research that we
do in my group about predictability, about human behavior, about complication of social
science and the history like also follows my personal interest in that respect. And
my goal was in this book is to kind of combine these different narratives together and it
will be up to you to decide how successful that attempt has been. Thank you very much
for attention.
And I believe we have some time to take some questions? Am I correct? I guess we have to
use that microphone so that it will be recorded for posterity, or whatever you say, in the
spirit of this talk.
[Albert laughs}
And the spirit of the book. Yes.
>>audience member#1: Hi. You mentioned predicting people's location. I was wondering what granularity
that was at?
>>Albert: Oh. So, the work we have done, essentially, we entirely use only the tower level data.
So, so, so, we know exactly where users are when they make a phone call and, and only
what is the tower they are closest at. So, what we--
>>audience member#1: So, is that typically like a half mile or?
>>Albert: Well, it depends on the, on the region, obviously. In big cities, it could
be a few hundred meters, essentially, a quarter of a mile, even shorter. In, so the, so the
density is very, very high in big cities like New York City and becomes much lower when
you go out of that. Yes, that's right, but the majority of the users that we looked at
happened to be in the big cities because that's where people are. So, in that respect we have
quite a bit of resolution, but yes, indeed. In principle, we also chunked the time into
hour long intervals because there's really not much of a [ ] where you'll be maybe five
minutes from now. It's more like in hourly intervals so, so at the end, a temporal regularity,
temporal chunks are on hour and the location resolution is the tower distance.
>>audience member#1: Thanks.
>>Albert: Mm-hmm.
>>audience member #2: Hi. You talked about trying to understand people. There's this
stupid flatworm called C. elegans where we know the complete circuit diagram; every cell
and every neuron and yet people have not been able to build a complete simulator of this
worm. So, I look at that and I say, "We're very far away from being able to predict individual
people." We can predict statistical things and that's quite different. Have you ever,
I don't know if you've seen the book, "Human Behavior and the Principle of Least Effort",
by Zipf in 1949 where he tried to predict mass behavior, but I look at this specific
simulations and I say, "We can't even do a flatworm." People have also tried to do the
garden slug and again, they know the wiring diagram and they can't tell you what the slug
will do.
>>Albert: It's, it's a lovely question. Essentially, is that, I mean I think that your question
has many, many -- raises many, many issues. The flatworm, C. elegans, is one of my favorites,
essentially. I have published quite a number of papers on that. I know, I don't know if
you know, that you can knock out a particular gene and then you would make it to make all,
only in circles and circles and circles. So, so, we understand lots about that, but--
>>audience member #2: You mean the Indianapolis Race Course?
[Albert laughs]
>>Albert: That's right, that's right. It becomes a car racer, that's right. But, on the other
hand, the, the question you are really raising is that what is the type of data? Can we really
understand something from fundamentals? Physicists have been very kind of arrogant in that respect
to say if we understand elementary particles we can tell you everything what you're gonna
do next. And that's not the case. So, in the sense that really, I mean, there are these
different resolutions of how we understand, essentially, in the world and one of the,
and I think most of us in sciences have given away that particular possibility that in my
understanding quantum mechanics elementary particles we're gonna understand how our brain
works. But what is really happening is that at different levels of resolution we can really
measure certain aspects of behavior and they, by themselves, can be predictable. So, I may
not understand fully the full quantum mechanics of how a electron moves exactly in let's say,
one particular metal, but I can make a chip, a pretty good chip about it and I can predict
very well the chip's behavior when it comes to computation. And in the same way, I think
what happens about human behavior is that we do not understand really from, let's say,
from neuron level why we do things and so on, but certain behavior aspects of ours are
really fundamentally predictable and the data is really telling us that is possible –
the possibility. And, I think, if anything I can hope from this research is that we will
start asking systematically what aspects of human behavior are part of that category that
can be predicted and what aspects of that would never be predictable, essentially, given
the amount of data we can collect of them, partly because they happen too rarely to find
any patterns about them and partly because they inherently have such a large entropy,
for example, that, that they're predictable if it will be zero. And I think it's all up
to us all in this environment is really ripe for that to ask these questions across many
behavior patterns. Thank you.
>>audience member #3: Thank you, Dr. Albert, for your presence. It's fantastic, fascinating
chart indeed. My question is like, the minute that someone says "predict," the notion that
I get in my mind is predicting some form of an event like a 9/11 happening, some kind
of event, right? And what I've really kind of gathered from your research is predicting
human location--
>>Albert: Sure.
>>audience member #3: like, where they are, where they're going to be. So how will you
kind of connecting that locational prediction to some kind of event that might take place.
Have you made that, that bridge? Do you know what I mean?
>>Albert: The answer simply is no. And I, I think we're very, very far from that, but
we're at the moment we can actually ask that question whether that's possible or not. You
are very right. Everything that I refer to here, refers to habitual events; things we
do, things we do with certain amount of regularity that we can actually collect data about that.
But I think that's the only way to proceed at this stage. Let's first understand habitual
events. Let's understand the type of things that we do and occupies 93 percent of our
time because the reason why we can predict 93 percent accuracy is because we spend a
huge bulk of our time at home and at work and travelling in-between and there is very
little time in-between to do something crazy, in that respect, in something different. And
I think once we understand the habitual events, which I call the normal behavior patterns,
then we can start asking systematical the question how we can predict, essentially,
the event that differ from normal. And 9/11 is an event that differs from normal. Could
we get there? Right now, I say, "It's hopeless." We cannot collect enough data to predict 9/11
in that respect, another 9/11. But, on the other hand, I think that we have to walk systematically
this path through the small details all the way to the big picture in order to get there
and give a precise answer.
>>audience member #3: Great. That's very interesting. So, am I to, am I right in understanding that
your thought is, rather your research, is on the premise that most of the things that
happen in the future has a very tangible link to some data point that happened in the past?
It could be location; it could be something else which you have not yet researched as
>>Albert: Oh, I believe that and I think, I think, think about the weather patterns,
for example. In 1800 and this is one of the things I talk about in the book, the way,
you know what? England was actually pretty good at predicting about the weather. And
the way they did it, and up to like 1960s, was that every day the weather map would come
in, what are the weather in the morning and around UK and then the weatherman would look
around, essentially, and say, "I remember. Four years ago in April was we had exactly
the same pattern. It was very, very similar. Let's pull out that map." So they said, "Oh,
yeah. This is just like that." So let's pull out the next day's map from four years ago
and we say if the weather is today like four years ago, then it's gonna continue next day
that way. And it was relatively predictive in that respect. How did we get today to the
very precise prediction that a few hours ahead we are very accurate about the weather? By
collecting a huge amount of data about the weather patterns, essentially. So, once that
data converged, that is, that we had enough resolution in the data collection and in the
computational tools, then weather prediction became rather accurate and we also understood
the limits of how we can go. It's about 19 days how we can advance, how we can predict
in the future because there's a chaotic limit that we are hitting that the instabilities
are too, the normalarities are too strong and you cannot go beyond that even if you
have enough data. I think that's the type of understanding we seek about human behavior.
How far we can go in advance and what is the limit of predictability? And I believe that
there will be a limit of predictability. For human mobility, it's quite long because we
are very regular. For some other aspects, like 9/11, maybe very, very short.
>>audience member #3: Thank you.
>>Albert: Thank you.
>>audience member #4: Hi. My question is about also to predicting locations and you're, you're
talking about habits and you're saying most people are regular, but did you see any events
or what about the number of people who like, just stuff that happens out of habit like,
you get arrested, they go to jail, you spend time in jail, you get in an accident, they
go to the hospital or they get deployed to Iraq.
>>Albert: Yes.
>>audience member #4: These kind of things. Do you see these people and do they fall within
that 20 percent that you talked about?
>>Albert: It's a very good question and the question is where is the seven percent coming
from that we are failing to predict, that's also part of it. And it's undeniable that
we have major life changes and when you get hired in Google, for example, you change fundamental
your behavior of what you did before, if you were a student or something before. We marry,
we divorce, we move jobs, we move houses and so on and that changes, essentially, our,
our mobility pattern. The question is what percentage of your time at any moment is really
reflecting that change? And the, the fact is that after the change you enter in another
habitual pattern that is very predictable. So, for example, you mention someone in jail.
Hey. Somebody's in jail is very predictable. It's gonna be in jail for a couple of years.
So, so, in that respect, I think part of the unpredictability comes from the people who
we see, this is the kind of on average end and those who are on the lower end in predictability
are people who may have actually had a change of lifestyle during the period that we were
looking at. They may have moved location and so on. So, this is part of the picture, absolutely.
But I think we tend to underappreciate the time that we actually spend on habitual patterns
and how little is actually the transition time from one to another one. And once you
have the transition, then you switch to another one. So, I think that's where, that's why
we don't get a hundred percent.
>>audience member #4: Thanks.
>>audience member #5: So, the fifth part of your natural science or human diagram is control.
So, what do you think the implications of your work are for controlling people's behavior,
location, or other aspects that you think you can measure and get enough data about?
>>Albert: I mean, that's a lovely question and it's a very pertinent question and let
me tell you again an example from the weather patterns. So, everybody has heard of John
van Neumann, essentially and he's a really big developing the first computer, not only
in architecture, but in building the first computer. Few of you may know the fact of
why he actually got the money to do that--from the military. Because he proposed that he
will predict weather and he will control the weather. And they had like this crazy images
of how they are gonna control. They gonna spray chemicals around the weather and disperse
clouds and so on and, of course, they never got to that, but they got to the, it was thanks
to John Neumann that first time the Meteorological Agency of US has purchased a computer to start
predicting weather and that's what, that's why we have it. So, luckily they never got
to control and I'm just hoping that this will be the same case as well, here. And I think
one of the goal I had with this book is let's just talk openly about these issues. Because
I think we, as a society, have to come together and decide how far we wanna push it. And I
think Google is struggling this question very, very much both internally, I'm sure externally,
which we see very much of how, how we find this balance essentially between the usefulness
for me and the, and the, at least the perception of control about, about my, about what I do.
And I think that the only way we can actually arrive to, to where it's situational, to where
we don't kill innovation, but we also maintain privacy and, and kind of lower the fear level
is by talking openly about that. So, to be shorter, essentially, I think we're everybody's
very far away from thinking seriously about control. But we have to realize that's the
natural progress of understanding processes, but, but control is a question of will. And
will can be controlled by loss, can be controlled by consensus and we can control that control,
essentially, we can limit control only by understanding what it could lead. And I think
that's why I put it there, that let's just get it open out there. Let's, let's think
about it and let's come together and decide where the limits of that one are.
>>audience member #5: Thank you.
>>Albert: Sure.
>>audience member #6: You talked about having like a huge amount of data and then from that,
extrapolating with 93 percent accuracy a person's location in the future and I was wondering
if having all of that data, was the reason you were able to do that? Or if you had data,
just solely one user, would your set of prediction algorithms work just as well, just for that
single person?
>>Albert: Well, I think it would have worked just as well for one person except nobody
would have really believed we could predict and the referees would have never believed
that I got something generic here. They just have thought that I just got a very peculiar
user who's very boring, irregular. So, I think from my perspective as a physicist, I always
ask the question, "The result that I get, how generic it is?" If I can only predict
you, everybody will kind of jump at me and say, "You are a very special case." And of
course you are a very special case, but if I want to show the generic results, I have
to do that for everybody in this room and not only in this room, but I have to look
at non-Google employees as well and people who go to school and people who live under
the bridge and so on. And, and at the end of the day, that was the reason why we have
to have this different data is to really understand how limited is the understanding we get from
this data. How particular it is for one person? And even in this case you may ask, "Is it
really just referring to the country where the data is from?" And I think that we believe
that this is rather generic, it's not particular to one country. You may ask other questions.
What is the demographic spectrum of the people we use? And of course, that's limited by the
people who use cell phones as well, but in this case actually, the country has 110 percent
penetrational cell phones. That there are ten percent more subscriptions than how many
people are in the country; so therefore, we think that we get a pretty good demographic.
So, I think the main reason we look for lot, many people is to convince ourselves and the
community that this is a generic result and not particular to you or I.
>>audience member #6: Thanks.
>>Albert: Ok.
>>audience member #7: In the location data that you got from the cell phone companies,
were you able to actually correlate like street closures and other events that actually caused
habitual travel to, to change? The reason I bring this up is because there's enough
traffic analysis as a civil engineer where we try to control how populations move by
building roads, destroying roads, things like that.
>>Albert: Sure. Now, let me, you're, you're right on the money in the sense that one of
the reasons I, I personally believe we need to understand this, the type of travel patterns
is to be able to build better cities and that's part of it, building better roads and so on.
We have a project now that we're trying to do that, but the data we get, essentially,
is inherently limiting in that respect because we know only when the person makes a phone
call where they are only in the tower resolution. Now, on the other hand, I believe that, that
we could actually extract some of the anomalies in the mobility that you're talking about.
And we have a project now where we're trying to do that by first building up a map of the
regular mobility pattern of what is, what is the predicted one and trying to see if
there are any significant deviations from that and we hope to look into the, into the
media as well, to detect some of these events that we know that took place and correlate
with that and see how detectable we are. The results so far were not encouraging. I think
the data collection is very sparse. In order to do that I think you need to get the type
of data that, that resulted in the Paris visualization that really recorded every time a user passed
from one tower to another type of region where either he or she made a phone call or not.
So, that's kind of the, the pass through data and we don't have regulatory access to the
data because normally what phone companies do not collect that, but they could and they
did for, for that particular visualization project. If we would have that, then I think
the resolution would be possible.
>>audience member #6: Thanks.
>>Albert: Thanks. So, it's a matter of data.
[Albert laughs]
>>audience member #7: I'm wondering if you've ever read Isaac Asimov's Foundation series?
>>Albert: Why do you think I work on this project?
>>audience member #7: Yeah, I mean, ok. So, did you take some inspiration from that. There's
this science called "psychohistory" and--
>>Albert: Sure, in Harry Seldon, that's right.
Of course just to tell you, I don't think we're any close to that and you remember why
he was successful at predicting the future? What, what was the price for that?
>>audience member #7: I think he got to build a encyclopedia, or funding for his encyclopedia.
>>Albert: Sure, but that have noble, but it was actually something that had to happen
in order to be predictive. They had to stop every innovation.
>>audience member #7: All right.
>>Albert: So, absolutely science was outlawed and there could be no innovations. So, essentially,
they had to maintain this stationary state of the society where no new Google would ever
come along. So, so in that respect, I, I really think that that's where I say that there may
be a certain amounts of days, of months, of years of limitation of how far you can go
and that's one of the goal of this is to say what type of events? How far into the future
could we predict it? And, and the -- Asimov was actually a smart person and he realized
that if you allow for innovation, predictability will pretty much go to zero on long time because
how well could you have predicted twenty years ago the impact of Google on the society? Hopeless.
Am I correct? I mean, look at the 1960s visions of how the 2000 is gonna look like and you're
laughing about it. It's like a fairy tale, isn't it? We are all flying in personal rockets
around the earth and we have no Internet or concept that we don't have a personal computer
concept of one? So, how different is our reality from what we imagined back then? And that's
thanks to innovation. And thanks to innovation, taking a different path from what could be
predicted based on what people cared about back then, which they cared about space travel,
so, so therefore, they projected their future based on space travel and not on what they
could not have imagined, which is the Information Revolution. So, so that's why I, I think that
Harry Seldon will never become a reality, only at the price where we stop essentially.
So, maybe you could do that in North Korea.
So, perhaps this is a good moment for us to end? Thanks for your attention. Thank you
for attention.