Google Brussels TechTalk "Internet Privacy"


Uploaded by googleprivacy on 02.02.2010

Transcript:

So as we were setting up for this talk, Sebastian asked me
is this title OK?
"An Engineer's Vision for Internet
Privacy," does that fit?
And I said, well, Sebastian, you've basically given me a
title to talk about whatever I want because I'm the engineer
and it's my vision.
So what I think is going to be most interesting to talk to
you about is the fact that during my seven years at
Google, first as the security person, as Sebastian
mentioned, and now more recently as Google's privacy
lead, consistently through all that time, my work has been on
the privacy sensitive data that Google holds, the proper
protection of it, and making sure that the privacy of our
users is appropriately protected.
So coming to you today to speak from that experience and
having done that work, I think that I can discuss two parts
that are going to be useful to you.
The first one is to tackle this question of what is this
data that we're talking about?
We often hear that Google has so much data,
has all of this data.
But often, I think it's not necessarily clear in people's
minds what that data is.
What exactly is it that they are referring to?
So I'm going to spend some time first attempting to do as
clear and complete a job as I can of answering that question
of what that data is, what the different kinds of data we're
talking about are, how that data is used.
And the ways in which different privacy challenges
presented by those different kinds of data and what we are
doing to meet them.
And then I'm going to transition into, as engineers,
the things that we have built in the past year in particular
to provide--
especially our focus has been on providing a transparency in
control to the individual user whose data this is, the things
that we have built in order to do that on various fronts.
So it's partly a matter of discussing what is all of this
data because that, I think, highlights what the challenges
are, and then moving to describe some of the ways,
some of the solutions, that we are finding for the
challenges.

I will being briefly by also acknowledging that today we
have just launched publicly, as part of the update of our
privacy center, these privacy principles, which are the
outcome of actually several months of internal discussion
at Google, among all of us who work at Google, work with the
data, work with these questions on a regular basis
to say, OK, we already have the privacy policies which
are, to some extent, legal documents.
But people also want us to say, look, what is your
philosophical position toward this data?
What is your philosophical position toward privacy and
the questions that people have and products that you build?
You have never said that in plain language.
So we had a great deal of discussion internally and a
great deal of saying, well, what if we say this?
And somebody going, no, that's not quite exactly right.
It should be like this to describe what it is that we do
and how we approach this.
And I won't read these out loud to you.
Today, I will say that we have launched them publicly today,
and there may be some things in the news about this.
But this is part of our effort to bring more transparency and
control, not only to the individual aspects of the
data, but also to our own attitudes.

So I think that for us, and I know we are always repeating
this mission statement over and over again, but when the
question is privacy, I think it's useful to return to it
because, to some extent, this is where all the trouble
begins, right?
This is our vision, this is what we aspire to.
We want to organize all of the world's information and make
it universally accessible and useful in a
privacy-sensitive manner.
But nonetheless, the pursuit, the ambitious pursuit of this
goal, which we hope is an idealistic and laudable goal,
is necessarily one that is going to raise many privacy
questions along the way.
And that's part of what I will be discussing as I go over the
different kinds of data that we are talking about.
So when we say Google has all of this
data, what is the range?
What's in that big bucket of data?
What are all the different kinds of things?
And as I work on this problem and work on the questions of
privacy that arise from all of this data, what I find is very
useful is to divide that data into these three buckets.
There are possibly other ways to look at it, but I have
found this practically useful.
And I'm going to focus first on this bucket here, the
Google account data because, to some extent, that's one of
the easy buckets.
So Google account data here.
This is all of the data that is associated with someone who
comes to Google and sets up a username and a password.
So I go to Google.
I have a username and a password for my personal use
of Google, for Gmail, for Docs, for Picasa, for Blogger,
for all of these services in which my relationship with
Google is one of I go and I log in.
And now that I have logged in, I see all of the information
that I am using those services to hold, and process, and
allow me to edit, and change, and delete, and do all of
these things.
And this is relatively straightforward in terms of
providing transparency and control.
It is straightforward for Google to say, you have logged
in, here is all of your stuff.
Here is all of the information associated with your account.
It's yours.
We know it's you because you have the password, and if you
wanted to delete it, if you want to edit it, if you want
to see it, that's very straightforward.
Now let me turn my attention to this box.
This box is harder.
This is the box of information where a Google server has an
interaction with the outside world.
Someone in the outside world on some machine, usually a
human being but not always, sometimes a computer script,
sometimes a bot, connects to a Google server and does
something, requests some information and receives some
information back.
There's no username and password here.
This is a simple transaction.
Now, web search--
web search for someone not logged in is like this.
It's a connection from a machine in the outside world
to a Google server.
A search query is sent, and a set of
search results are received.
And the fact that that interaction took place is
recorded in the log line.
Now, this log line has the potential to be quite privacy
sensitive, and I will be going into more detail about those
log lines and what is in them.
Because I think that this box is the box that has been the
most mysterious to people and where there's the most need
for additional explaining on our part.
Because this one is pretty straightforward.
I think this one is easy to understand.
People can see what's going on.
This is the mysterious and troubling box
where we need to explain.
We need to be clear about what is happening.
So in this box, because there is no username and password,
there are challenges for an engineer who is saying, how
will we provide transparency about this data?
How will we provide control about this data to people?
There's a need to be more creative because we do not
have a clear way if someone comes and says, show me the
data that you [? have ?] for me.
We don't have a clear way to know which-- if I come and say
that, we don't have a clear way to know which is the data
that was really for me.
And I'll describe that in more detail.
And then, the third box, this is the box that does not come
from these one-to-one interactions.
So this picture is supposed to be a picture of the search
index, all of those pages out there on the web that Google
goes out and crawls and brings back the information to build
it into the search engine to provide search.
So information like that or the picture on that side,
Google Sky, for example.
So because we're trying to answer the question when
people say, what is all the data that Google has?
We're trying to be very complete here, and I think
pretty much all of the data that Google has would be in
one of those boxes.
So I'm now going to go on and talk about the difficult box.
So I think it is often very useful to people to see
exactly what is in one of these unauthenticated log
lines that Google writes.
And so here we have an example.
And I apologize that the slide is a little bit squished.
Someone comes to Google, connects to Google from their
own machines to a Google server and types in the query
term "flowers" and presses Google search.
And the query term "flowers" is sent to the Google server,
and a log line is is written that looks like this.
So in this log line we have, from the left to the right,
the IP address of the computer that connected to the Google
server, the date and the time that this took place, the URL,
including the query term "flowers," the browser type
and the operating system of the computer that connected as
reported by the web browser.
This does not necessarily need to be accurate.
Sometimes people set their web browser to
report other things.
That's fine.
Let's see, and then the cookie ID.
And I will, on the next page, explain more about the cookie
ID and take that apart so that we can all see
exactly what that is.
So the cookie that we are talking about when we are
talking about search logs is what Google refers to as the
pref cookie.
Because, from the very beginning of Google, this was
the cookie that, if you go and you use Google web search and
you set the preferences without a username and
password, just click the preferences link, and you say,
my preferred language is Swedish.
I prefer to see 15 search results per page, and I want
moderate safe search turned on.
Those preferences would be stored in this cookie, the
pref cookie, which is a little text file in the browser on
your machine.
And if you were to go on your own machine and find this text
file and open it up, this is what you would see.
The name of the cookie is pref, for preferences.
Those first boxes there, the randomly assigned ID number,
which is loosely unique, is supposed to be different from
the cookies on everybody else's machine.
But this is not a secured cookie, so that
is not always true.
Sometimes these cookies are copied,
sometimes they're stolen.
There's not a presumption that this is secure.
And then we have some encoded preferences.
In this case, LD=en is encoding "language display
preference equals English" for this particular cookie.
Results should be shown in English.
There are some numeric encodings of the date and time
the cookie was created and the last time that the cookie was
seen or modified by a Google web server.
And then, the domain, google.com.
This is what causes the browser to enforce that this
cookie can only by returned to, can only be shown to or
edited by a google.com server.
The browser will enforce that.
The "send for any type of connection" is reflecting the
fact that this is not a secured cookie.
There is no requirement that the browser require encryption
before it will send it.
So again, this goes back to there is no strong presumption
that this cookie has not been stolen or copied.
This is not an authentication cookie.
And then, the expiration date.
And now, we will note, because often this is on people's
minds, that Google's policy is to anonymize the IP address in
the log line at nine months by removing the last octet.
And I will page back to say specifically what that means.
So in this case, and you can see the IP address is a little
badly printed, we would wipe out the .89 at the end, and
that would create uncertainty among 256 possible numbers
that that could originally have been.
So anyone who was using that IP address would now
effectively be in a crowd of 256 people and would not be
distinguishable among them.

And for cookies, we anonymize at 18 months.
And I will go into significant more detail now in explaining
what the use of this log is, what the important use is that
requires us to wait for the 18 months and nine months that we
are preserving here by waiting that long.

Now, as an immigrant, I apologize slightly for the
intense American gangster movie tone of this slide, but
it's a good, short way to say it that
fits well on the slide.
So I'm going to describe as a very, I think, useful
overarching theme how Google, as a search engine and to a
certain extent all search engines, depend on the ability
to learn from the good guys.
And when I say learn from the good guys, what I mean by that
is to learn from observing those interactions with all of
the intelligent humans who use the service in good faith.
And by that, I mean a human who comes to a Google server
and search, if there's a query term, presses that button
because they really want to see the most useful results
for that query.
And they will click on the results that they judge,
again, with their human intelligence, to be the most
useful results.
And that the ability, as a general theme, to harvest and
learn from that signal coming in all the time and feed it
back into making the search engine work well, is
fundamental to the way in which search
engines work today.
The other side of that coin is that when you have a system
where the effective functioning depends on this
kind of continual learning from these interactions with
good guys, there is always and immediately this economic
incentive for another set of people to attempt to affect
the functioning of the system in ways that
will benefit them.
Not to the benefit of the users.
And the most straightforward example of this is what we
would tend to refer to as web spam and link farms in which
someone wants their website to come very, very high in the
search results, even though it is not
actually a useful website.
No one would click on it on purpose.
They're not doing the work to make it a website that would,
as we would say, organically rise into the search results.
Instead, they are attempting to scam the system by making
it only appear to be such a website.
And so, what I will be describing is how there is
this constant almost game of chess in terms of trying to do
a very good job of learning from the good guys, including
learning how to tell the good guys from the bad guys so that
you can only learn from the good guys and damp the signal,
or ignore, or shunt to the side what is happening with
the bad guys so that the quality of the search service
remains good for everyone.
And then I will talk also quickly a little bit about
we're calling it, as a catch phrase, inventing the future.
And this is the fact that at Google, again going back to
that mission statement, we have great faith in the power
of abundant data and abundant computing resources to do some
amazing things for humanity.
To build some tools that will help us to tackle problems
better than we have ever been able to tackle them before.
And so, we have a couple of examples of what we are
talking about in that space to illustrate and give a feeling
for when we say, it is important to recognize that
abundant data can be a powerful force for good as
well as a risk.
OK, so quickly, if I ask people when I talk to them,
how do you think Google works?
How do you think Google provides good search results?
Often if they're people who have done some reading and
they're familiar with the tech space, often
their answer is PageRank.
People have some understanding of PageRank.
Remember, PageRank is what Larry and Sergei invented.
And the point that I would like to make about PageRank,
two points.
First of all, it started with PageRank.
PageRank was the opening move of that chess game.
But it didn't end there.
Many, many things have happened since PageRank.
And second, that even from the very beginning, this theme,
the central insight of PageRank, was about learning
from the good guys.
Because the insight of PageRank was to say that the
worldwide web is created by intelligent humans.
Each of those humans are writing web pages, and in
those web pages, they are choosing to link to other web
pages from certain words.
And every time they do that, they are providing a signal
that they believe, with their human intelligence, that this
page that they have linked to is useful in the context of
the word that they have linked from.
And the insight of PageRank was that a computer algorithm,
computer code which has no human intelligence, could be
written to go out and observe all of that signal across the
web, and harvest it, and bring it back in to creating a
search index that could then be beneficial
for everyone to use.
And that that theme has continued to be a very
important one in the science of making search engines good
as the chess game proceeds.
So as soon as PageRank was out there, as soon as this
technique went into play, the second move of the chess game
was for the bad guys to come and say, we will create link
farms. We will create things that are not really the work
of intelligent humans in good faith, but which attempt to
pose as if they were to Google's code, which has no
human intelligence.
And they were able to do this rather effectively.
So now, what does Google do?
What do the search engines do?
And you say, well, where else can you look for that signal?
Where else can you look for that signal of what the people
in good faith are telling you by the way in which they
interact with the system and the things they build.
Where can that be harvested and brought back in for
everyone's benefit when they use the search engine?
And so the next place to look was the log data.
And I'm going to page back again to that picture of the
log line to describe how this is.

OK.
Suppose we have, again, the query for flowers.
Suppose Sebastian has come to Google, and he searched on
flowers because he's going to get some for his wife to
apologize for how very busy he's been this week.
And he searches on flowers, and this log line is created.
So what happens next?
That depends on what Sebastian does.
And what Sebastian does depends on how good a job
Google did.
So if Google did a very good job, if we got it right in the
search results that we returned, Sebastian will
probably click on the very first search result.
And that will create another log line.
It will be some point later in the stream of log data that is
coming in as all these different people around the
world interact with the server and do searches.
So some point later in time, there will be another log line
to load [? up. ?]
The same IP address and cookie ID, slightly later time,
showing and then there was a click on the
first search's line.
Or, alternatively, suppose we didn't do such a good job, and
perhaps instead Sebastian clicks on the
eighth search result.
Or we really didn't do a good job, and he doesn't even like
anything on the first page of search results, and instead,
he clicks through to the next page.
Each one of those things would create another log line a
little bit later.
So in the process of writing the system now to learn from
these signals that the people interacting in good faith are
providing, these log lines, separated slightly in time,
and linkable together by the IP address and the cookie ID,
are little stories.
They're little stories of us getting it right, not quite
getting it right, getting it very wrong, and they're little
stories that have some information about what getting
it right might be.
Because maybe if he clicks on the eighth search result, that
should go up.
That's a signal that it should go up.
Now, these are very little stories.
Often, the important stories are bigger and longer in time.
I told you an example story that happens in a few seconds.
Let me bring in again the bad guys, right?
Because now that we are learning from someone clicking
on the eighth result as a sign that the eighth result was
better than the first result, this is something easy for the
bad guys to do if they want to push their result higher.
So how do we combat that?
Well, we apply our science to being able to recognize the
larger patterns that say this cookie and IP address appears
to be a human operating in good faith.
This cookie and IP address seems to keep doing this query
over and over again, and always clicking on the same
search result.
That's probably not a human.
But that requires being able to link the longer story
together to see the pattern.
So I hope that this has sort of illuminated, at a very
basic level, what the use of this log data is.
And then, I want to highlight that nothing I
have described here--
it does not matter at all who the person behind that IP
address and cookie is.
That's not fundamental to this use of data at all.
That's not relevant.
What's important is being able to link together those stories
because that's where the information that is useful to
have the system learn from is.
And what is critically important is being able to
filter, to say, this appears to be a good guy,
we'll learn from it.
This appears to be a bad guy, we will not.
Recognize those differences.
OK, I know I am running somewhat short on time, so I'm
going to jump ahead.
And I expect that this has probably made people think of
all sorts of very in-depth and interesting questions, and we
can go into this in more detail in the question period.

So I'm going to jump ahead quickly to the abundant data
innovation, the things that become possible to do if you
have a large amount of data and a richness of computing
cycles that we think are potentially very beneficial,
and powerful, and interesting.
So one of them is Google Translate, and we've been
working on Google Translate for many years.
And we do this in an extremely data-driven kind of way.
Because we have all of this data from various sources and
we have so much computing power, we are able to do very
large-scale statistically-driven work in
this area that does not necessarily have to have
preconceptions about grammatical structure.
Which can be useful because often the way in which people
use language on the internet does not necessarily follow
grammatical structure as it is described in the books.
And new slang terms arise.
New ways of using language are constantly arising.
And our approach allows us to immediately learn about those
things, as well, and incorporate
them into the system.
And Google Translate, I think, now, you can see how many
different languages this covers because the system is
completely open to all of the people who are using it and
all of the different languages that we see web pages written
in, to get better in.
I think this is now at the point where I, as a user, am
very excited to be able to go, for example, to the front page
of an Italian newspaper online and press translate.
And there will be some humorous mistakes.
But it's enough that I can read it.
This world of information is now open to me in a way that
it really wasn't before.
It would have been very slow and painful for me to look
things up one by one and translate them.
And we're beginning to build this into some of our services
in real-time, so that someone having a conversation, for
example, in Google Wave in a chat context with someone in
another language can even have the benefits of real-time
translation.
So this is an application that we're very idealistic about
and excited about.

And then let me talk quickly about flu trends.
Sebastian, if I'm going five, 10 minutes over, is that--
OK.
So some of you may have heard of flu trends, which began
first in the United States when a group of Google
engineers who were familiar with our log data and what it
is like had the idea, or the insight, to say, the Center
for Disease Control in the United States has, going back
in time historically, data about outbreaks of the flu in
geographical areas at particular times.
And Google has this log data going back in time that is a
source of information about what query terms were being
entered into Google in geographical areas, as derived
from the IP address as a signal of what geographical
area it is over time.
So suppose we take all of this computer power that we have,
and we take those data sets and we put them out, one next
to the other, aligned by time, and we run a big statistical
analysis to see with no preconceptions whether there
are particular queries or patterns of queries that have
a very high correlation with flu outbreaks.
And it turned out that yes, there was.
And the team that has worked on this is actually very
secretive and will not tell anyone what those queries are
because they don't want to poison the data.
They fear that if it was widely reported, then people
would search on it more out of curiosity and
that would break things.
So it's a secret.
But they tell me that they are not what you would expect,
which is very intriguing, leaves you wondering.
My ear itches, am I getting the flu?
So as you can imagine, the Center for Disease Control was
quite excited about this.
Now, what they get from this is not information that they
would not already have, but they get it
about two weeks faster.
Because the traditional route for them to get this
information is someone must be sick enough to go to their
doctor, and then there's a certain amount of reporting
that comes from the individual doctors
through the state level.
And so there's a time lag.
There's about two weeks of time lag before they would be
able to reach similar conclusions from the data
following the traditional path.
But the query patterns is almost instant.
So we can give them the information that this signal
is occurring in a particular geographical area so that they
are prepared two weeks earlier than they
would otherwise know.
And for them, this is very valuable.
And so this seems to be a generalizable phenomenon.
Where there is this kind of historical data available and
there is this time lag.
This is something that seems to apply for economic data,
for example, is another one.
And the name that our chief economist, Hal Varian, has
coined for this phenomenon is predicting the present.
Because it is closing that time gap.
It's not predicting the future.
That would be very exciting maybe at some point later.
It's predicting the present.
OK, now let me switch.
I have attempted to illuminate for you what this data is that
we are talking about and worrying about why we need to
hold onto it.
What's going on there.
Now, let me switch gears and talk quickly about some of the
things that we have built so far in looking at all of this
data and saying, what can we build, as engineers, to
increase the transparency for all of our users?
Because you are here in this room with me today and I'm
explaining these things.
But I'm not going to be able-- even with the power of
YouTube, I will probably not be able to explain this to
everyone in the world.
So what can we build that will make these
things clearer to people?
What can we build that will show people what kind of data
this is, what kind of control they have over it, how it is
being used?
What can we build that will make that information as
accessible to people as possible?
So I'm going to talk quickly about several things that we
have built each for a different
piece of that problem.
Now, the first one is in the context of interest-based
advertising.
Some of you may know that up until the spring
of last year, Google--
contrary to many people's impressions--
did not personalize advertising at all.
When we selected what ad to show, we were always selecting
it based on the context of what was being done at the
time, based on the contents of the web page that the ad was
being shown on, the contents of the Gmail message that was
being read, the search query that someone had just entered.
But not on any sort of memory associated with a particular
cookie as being a particular individual.
In the spring of last year, we finally said, OK, everyone
else is doing it.
Maybe this is something where we can provide useful value
because there are situations in which there just is not
very good contextual information to use to try to
decide what would be a good ad to show.
And so maybe there is a way that we can do this that will
be valuable and that people will find useful.
Try to see if this is something where we can be
innovative and do a good job.
However, we recognized that people rightfully have a
strong feeling about anything that
feels like being profiled.
Anything that says you are building of up a profile of me
is something where immediately people are like,
what's going on?
Do I have control over this?
Do I understand what's happening?
Is this something I wouldn't like?
So what we decided to do was we said, you know, this
profiling happens all over the place, right?
All through the industry, all through all of these
information systems that we all interact
with all the time.
But mostly, it happens in a way no one can see, right?
The advertisers can see it because that's what
they're buying on.
And the people building the systems can see it.
But it's under the surface.
It's not visible to me as someone who's going around on
the internet.
But there's nothing about the system that means it
has to be that way.
There's nothing about this that is fundamentally secret
to fend off the bad guys.
So why don't we show people?
Why don't we take that step?
And so what we did, what we are highlighting here is this
is a picture of one of these ads, these interest-based ads
being shown.
And in addition to the ad itself, there is a link
providing an ads notice to the fact that
this is an ad by Google.
And if one clicks on that link, one is taken to our ads
preferences manager.
Now, that's not the only way to get here.
There are many, many ways to get here.
Searching on Google for Google ads privacy will get you here.
Clicking through our privacy policies or going to our
privacy center will get you here.
But on this page, what we do is we say, OK, there is a
cookie that our ad server sets in people's browsers in
people's browsers when it serves an ad if this has not
been blocked.
This is a different cookie from the [? search live ?]
cookie.
This is an entirely different context.
This cookie is used only when the ad server serves an ad on
a third party website.
And what we began to do with this interest-based
advertising is to allow a memory associated with that
cookie of what websites that cookie had been served an ad
on before in terms of what category those
websites fell into.
So if Google's ad server had provide an ad for this cookie
on a travel website--
Google has a certain set internally of, this is a
famous travel magazine's website.
We know that's in a travel category, a sort
of reference set.
Then we make a little note associated with that cookie,
served ad to this cookie on a travel site, perhaps
interested in travel.
And that builds up into a set of categories like that.
And so, what we'll do on this web page if you visit it is
show you exactly the categories that are currently
associated with the cookie in your browser, allow you to
remove them, edit them, add new ones if you would like to.
Because after all, why would we not do that?
As a company that is involved in trying to choose what would
be the most useful ad to show, if you would like to give us
better information about what you would be interested in
seeing, of course we should encourage that.
It seems obvious to us in hindsight.
And if you come here, and you do not like this, and you say,
even though you're showing me all the categories, and the
categories are all boring, there's nothing embarrassing
here, and you're letting me edit them, I
still don't like it.
I philosophically disagree with it.
Well then, there's the opt-out button.
And if you want to be very, very careful and make sure
that that opt-out always persists even if you clear the
cookies in your browser, there's a browser plug-in that
you can download that will always put your opt-out back
when you clear your cookies to make sure
that you stay opted-out.
I want to highlight here that these categories are
purposefully, I would say, very boring.
We do not use any categories that the European Union
considers the protected, sensitive categories.
We do not use any categories even beyond that that we think
anyone would have worries or concerns or sensitivity about.
And, I think, for me as an engineer, one of the very
interesting things about doing that, about that choice is
that it means that even though there is no username and
password here, we can show you this.
We can take the risk that we accidentally show this to the
wrong person because it's too boring to be embarrassing,
which is very convenient.
I also have an interesting little bit of information to
share about this because now that we have had this up for
about a year, we have been able to see how
people respond to it.
And what we have seen is that tens of thousands of
individual people, or individual cookies, have come
to view this web page.
And for every one that has looked at this and then chosen
to opt-out, four have instead chosen to edit the interest
categories, and 10 have chosen to do nothing.
So this, I think, is very useful and interesting for us
to actually--
it's the usefulness of having data.
To have data about how people react to this, and respond to
this, and make use of this is very helpful to those of us
working on privacy, trying to do things that will really
work for users.
I will talk quickly about the Data Liberation Front, which
is-- this is more about control than transparency.
But this is a Google engineering project to ensure
that whenever someone comes and uses Google's services,
puts information into Google's services-- this is, again,
more in the context of the first box with the accounts,
with a username and password.
If they later wish to leave, if they wish to stop using the
service and take all of their data out again and move it
somewhere else, it should be easy for them to do so.
There's a sort of discussion that tends to happen in
companies, and Google is no exception, about whether to
pursue a strategy of making it difficult for people to leave
or making it easy for people to leave. Google had this
discussion around the time that we launched Gmail, and we
decided definitively that we believed it should be easy for
people to leave. As engineers, I think as such an
engineering-driven company, engineers often are offended
when systems make it difficult for them to do what they want.
So there was a strong feeling that, from Gmail, from the
very beginning, that there should be IMAP and POP as mail
protocols supported that would make it easy to take all of
your mail out.
And so, the Data Liberation Front is sort of refreshing
and reminding us of that sentiment internally, that for
all of our services, it should be easy to--
as close as possible to one button that says, I want to
take all of my data out.
Give it all to me in an easily packaged format so that I can
take it somewhere else.

And for us internally, we think this is important not
only from a sense of engineering aesthetics, which
is a big part of it, but also because we frequently say that
our business is built on the trust of our users.
That our greatest asset as a company is the trust of our
users because we are so dependent on all of those good
guys who interact with us all the time.
And we feel that making this decision to make it easy to
leave helps us ensure, helps us have faith that when people
stay, it's because they still feel good about us.
And it's not that they're staying because it was too
difficult to go.
We think that will be the best for us in making sure that our
users still trust us 10 years from now.
And then, the Google Dashboard--
this is something that my team launched just this last fall.
And this was our attempt, in the context of Google
accounts, to answer that question in a straightforward
way, what is all this information
Google has about me?
Because we have a lot of services, and a lot of people,
I think, at some point maybe they used Blogger, and then
they lost interest. And that was a year and a half ago, and
they've forgotten about it.
They don't even remember they have that little bit of
blogger information.
So what we tried to do here was to build a single place
for people to go that would provide, down the left-hand
side, transparency.
It would show you all of the Google services that you have
interacted with, that you have any information stored.
And some useful summary information so there would be
something that gives people some kind of high-level
feelings, a kind of high-level picture for what's there.
And then, down the right-hand side, control.
Direct links to all of the things that you can do to say,
I want to delete that data.
I want to change the settings about who can access it.
Anything that you want to do.
There are two things that I want to point out about how we
chose to do this.
Number one is that we did not want to make this specifically
a privacy tool.
And the reason for this is we didn't want it to be something
that people would only go to if they were already thinking
about privacy.
We wanted it to be something that felt useful to people in
a more general "I want to look at this all the time because
this is fundamental and basic to how I'm using Google to get
things done" kind of way.
So that their awareness of what the data was and what
kind of control they had over it and what was there would
just be integral to their experience of using Google,
rather than being something where you have to go over or
under your privacy settings to see it.
And that's one of the reasons why we called it Dashboard as
opposed to My Privacy or something like that.
We also discussed whether we should call it something like
My Google, right?
Because this is all your Google stuff, or kind of who
you are to Google, or something like that.
And we decided against that because we didn't want it to
be too Google-specific.
We wanted to do this in a way that encouraged it becoming
something that the industry as a whole would do.
And so we chose Dashboard as a term because that seemed like
everyone could build a dashboard.
We could being to have a discussion where people would
expect that a dashboard is part of the infrastructure of
this information, and they'll look for it, and that it could
be standard.
So we hoped to support that.
And then I will just finish by saying that part of the
attempt to provide transparency and control about
all of this is to build it directly into our products.
And as an engineer, I often think that's the best way
because it's showing rather than telling.
It involves building stuff.
But we really want to cover all the bases.
We want to find as many different useful ways as
possible to say the same thing in our quest to be clear and
provide good information to everyone about
what's going on.
So another important part of that, I think, is our Google
privacy center, where we attempt to find other ways to
say the things that our privacy policies say in
language that is perhaps more accessible.
And so one of the things that we do there is make videos,
like this one that we're making right now.
But we have also done things like commission cartoons,
write very frequently asked questions.
Basically, any format in which it might feel more natural to
someone who is seeking information about what's
happening, or has a question about their privacy.
Making that information accessible to them so that the
privacy policy is not the only choice.
And I will stop there, and we can turn off the camera, and
have a free-for-all.