TEDxRíodelaPlata - Luis von Ahn - Utilizando el poder de millones de mentes humanas


Uploaded by TEDxTalks on 09.12.2011

Transcript:
Ideas worth transforming you
Luis von Ahn: Using the power of millions of minds
Hello. Well, let me start by asking you a question:
How many of you had to fill out some sort of web form
where you've been asked to read a distorted sequence of characters like this?
How many of you found it really annoying?
Okay, outstanding. So I invented that.
(Laughter)
(Applause)
That thing is called a CAPTCHA.
And it is there to make sure the entity filling out the form
is actually a human and not some sort of computer program
that was written to submit the form millions and millions of times.
The reason it works is because humans,
have no trouble reading these squiggly characters,
whereas computer programs simply can't do it as well yet.
For example, when you're buying tickets online for attending a concert
the reason you have to type
these distorted characters is to prevent
scalpers from writing a program
that can buy millions of tickets, two at a time.
CAPTCHAs are used all over the Internet.
And since they're used so often,
a lot of times the precise sequence of random characters that is shown to the user
is not so fortunate.
So this is an example from Yahoo.
The random characters that happened
to be shown to the user were W, A, I, T
which spells a word.
But the best part is the message that the Yahoo help desk
got about 20 minutes later.
(Text: "Help! I've been waiting for over 20 minutes, and nothing happens.") (Laughter)
This of course, is not as bad as this poor person.
(Text: REBOOT) (Laughter)
Well, I can tell funny stories about captchas for hours
but since I cannot do that
let me tell you about a project that we did afterwards
which is sort of the next evolution of CAPTCHA.
This is a project that we call reCAPTCHA,
which is something that we started at the University,
and then we turned it into a startup company.
And then Google acquired this company.
so, all what I'm going to say for the next 5 minutes
is owned by Google. So, please, do not spread the word.
So let me tell you how this project started.
It turns out that about 200 million CAPTCHAs are typed everyday.
When I first heard this, I was quite proud of myself.
I thought, "look at the impact that my research has had."
But then I started feeling bad.
They are not only obnoxious, but also
each time you type a CAPTCHA
essentially you waste 10 seconds of your time.
And if you multiply that by 200 million you get that
humanity as a whole is wasting about 500,000 hours every day
typing these annoying CAPTCHAs.
So then I started feeling bad.
And then I started thinking, is there any way
we can use this effort for something that is good for humanity?
While you're typing a CAPTCHA, during those 10 seconds,
your brain is doing something amazing.
Your brain is doing something that computers cannot yet do.
So can we get you to do some
useful work to mankind?
Putting it differently,
is there some humongous problem that we cannot yet get
computers to solve,
yet we can split into tiny chunks
such that each time somebody solves a CAPTCHA
they solve a little bit of this problem?
And the answer to that is "yes," and this is what we're doing now.
So what you may not know is that nowadays while you're typing a CAPTCHA,
not only are you authenticating yourself as a human,
but in addition you're actually helping us to digitize books.
So let me explain how this works.
So there's a lot of projects out there
trying to digitize the existing books.
Google is digitizing books.
Amazon, with the Kindle, is digitizing books.
Basically the way this works is you start with an old book.
You've seen those things, right?
Like a book?
(Laughter)
So you start with a book, and then you scan it.
Now scanning a book is like taking a digital photograph of every page.
The next step in the process is that the computer
needs to be able to decipher all of the words in this image.
Now the problem is that for older books that were written several years ago
the computer cannot recognize a lot of the words
because the ink has faded and the pages have turned yellow.
Thus the words look a bit different
and the computer cannot recognize them.
So, for books that were written more than 50 years ago,
the computer cannot recognize about 30 percent of the words.
So what we're doing now
is we're taking all of the words that the computer cannot recognize
and we're getting people to read them for us while they're typing
a CAPTCHA on the Internet.
So, the next time you type a CAPTCHA
(Applause)
these words that you're typing
are actually words that are coming from books that are being digitized
that the computer could not recognize.
And now the reason we have two words nowadays instead of one
is because we need to verify if the answer is correct.
Because one of the words is such that the system know what it was,
and the other word is a word that the system just got out of a book,
it didn't know what it was, and it's going to present it to you.
We're going to ask you to type both words.
And we won't tell you which one's which.
And if you type the correct word
for the one for which the system already knows the answer,
it assumes you are human,
and it also gets some confidence that you typed the other word correctly.
And if we repeat this process to like 10 different people
and all of them agree on what the new word is,
we are very confident that this new word
was accurately digitized.
So this is how the system works.
And the good thing is that it has been very successful.
We're digitizing about 100 million words a day,
which is the equivalent of about two million books a year.
And this is all being done one word at a time
by just people typing CAPTCHAs on the Internet.
Now, since we're doing so many words per day,
funny things can happen.
And this is especially true because now we're giving people
two randomly chosen English words next to each other.
So funny things can happen.
For example, we presented this word.
It's the word "Christians"; there's nothing wrong with it.
But if you present it along with another randomly chosen word,
bad things can happen.
So we get this. (Text: bad Christians)
(Laughter)
It's quite funny.
But it's even worse, because the particular website
where we showed this actually happened
to be called The Embassy of the Kingdom of God.
(Laughter)
Oops!
Here's another really bad one.
American politician, JohnEdwards.com (Text: Damn liberal)
(Laughter)
So we keep on insulting people everyday.
Now, we're not just insulting people.
Quite often, interesting things can happen.
So this actually has given rise to an Internet meme
that thousands of thousands of people have participated in,
which is called CAPTCHA art.
Here's how it works.
Imagine you're using the Internet and you see a CAPTCHA
that you think is somewhat peculiar,
like this CAPTCHA.
Then what you're supposed to do is you take a screen shot of it.
Then of course, you fill out the CAPTCHA
because you help us digitize a book, please.
But then, first you take a screen shot,
and then you draw something that is related to it, like this.
(Text: invisible toaster)
(Laughter)
It's just an example of CAPTCHA art.
There are tens of thousands of these.
Some of them are interesting.
Some of them are very cute. (Text: clenched it!)
Some of them are funnier.
(Text: stoned founders) (Laughter)
This is my favorite number of this whole project: 900 millions.
This is the number of distinct people
that have helped us digitize at least one word
out of a book through reCAPTCHA.
A little over 10 percent of the world's population,
has helped us digitize human knowledge.
And it is numbers like these that motivate my research agenda.
So the question that motivates my research is the following:
If you look at humanity's large-scale achievements,
these really big things that humanity
has gotten together
like for example, building the pyramids of Egypt
or the Panama Canal
or putting a man on the Moon --
there is a curious fact about them,
and it is that they were all done with about the same number of people.
They were all done with about 100,000 people.
And we can ask ourselves why is that all of them used
about the same number of people.
And the reason for that is because, before the Internet,
coordinating more than 100,000 people was impossible.
But now with the Internet, I've just shown you a project
where we've coordinated 900 million people.
So the question that motivates my research is,
if we can put a man on the Moon with 100,000 people,
what can we do with 100 million people?
So based on this question,
we've had a lot of projects that we've been working on.
I will not tell you about all we have done.
But, let me tell you about one that we are working now on.
This is something that we've been working on for about two years now.
And we're going to launch it in about 30 days.
It's called Duolingo.
This project started asking the following question:
How can we get 100 million people
translating the Web into every major language for free?
Okay, so there's a lot of things to say about this question.
First of all, translating the Web.
So right now the Web is partitioned into multiple languages.
A large fraction of it is in English.
If you don't know any English, you can't access it.
But there're large fractions in other languages,
and if you don't know those languages, you can't access them.
So I would like to translate all of the Web into every major languages.
So that's what I would like to do.
Now some of you may say,
why can't we use computers to translate?
Machine translation nowadays is starting to translate some sentences here and there.
Well the problem with that is that
it's not yet good enough,
and it probably won't be for the next 20 to 30 years.
So let me show you an example of something
that was translated with a machine.
Actually it was a forum post about ...
is a forum about programming questions.
It was a programming question translated from Japanese
into English and from then into Spanish, though my translation is good.
The other one is bad. You'll see.
So I'll just let you read.
This person starts apologizing for the fact that it's a machine translation.
Indeed, this was done with the best translation program
from Japanese into English.
Remember, it's a question about computer programming.
So here you are the preamble to the question.
(Text: At often, the goat-time install a error is vomit.) (Laughter)
Then comes the first part of the question.
(Text: How many times like the wind, a pole, and the dragon?) (Laughter)
Then comes my favorite part of the question.
(Text: This insult to father's stones?) (Laughter)
And then comes my favorite part of the whole thing.
(Text: Please apologize for your stupidity. There are a many thank you.) (Laughter)
Okay, so computer translation isn't yet good enough.
So we need people to translate.
So what I want is to get 100 million people
translating the Web into every major language for free.
But I don't think I could afford paying 100 million people for the job,
so I want them to do it for free.
Now if this is what you want to do,
you pretty quickly realize you're going to run into
two pretty big obstacles, needing to be hurdled.
The first one is a lack of bilinguals.
So I don't even know if there exists 100 million people out there
using the Web who are bilingual enough to help us translate.
That's a big problem.
The other problem is a lack of motivation.
How are we going to motivate people to actually translate the Web for free?
After thinking about this for months,
then we realized there's actually a way
to solve both these problems with the same solution.
We realized that there's a way to kill two birds with one stone.
And that is to transform language translation
into something that millions of people want to do,
and that also helps with the problem of lack of bilinguals,
and that is language education.
So it turns out that there are millions of people wanting to learn other languages.
Today there are over 1.2 billion people learning a foreign language.
And it's not just because they're being forced to do so in school.
For example, in the United States alone, there are over
5 million people who have paid over $500 for software
to learn a new language.
Many people want to learn a new language.
So what we've been working on for the last two years
is a new website called Duolingo,
where the basic idea is people learn a new language
for free, while simultaneously translating the Web.
And so they're learning by doing.
So this is how it works.
So the way this works is whenever you're a just a beginner,
we give you very, very simple sentences on the Web.
And if you don't know a word we'll tell you what each word means
though you are asked to "translate this sentence".
And it turns out that it really works.
Even though people know nothing of the language if we explain
what each word means, they are going to be able to translate it.
And as you see how other people translate
the same sentence, you start learning the language.
And as you get more and more advanced,
we give you more and more complex sentences to translate.
This is how you are going to help us translate. This is how the site works.
We're mostly done building it,
and now we're testing it.
When we started working on this
I didn't think it could work, really.
But it turns out that it works, indeed. It's amazing.
First, people really can learn a language with it.
In this case we are testing it with people
knowing English, wanting to learn Spanish.
And with people knowing Spanish, learning English.
So people really do learn a language.
And they learn it about as well as the leading language
learning software,
which is very good, but perhaps more surprisingly,
the translations that we get from people using the site are very good.
They are as accurate as those
of professional language translators.
Now of course, we play a trick here and it is that
we combine the translations of multiple beginners, several students,
and choose the best. But it turns out that that best translation
is as good as those of professional language translators.
Now even though we're combining multiple translations,
another good thing about Duolingo is that
the site actually can translate pretty fast.
So let me show you an estimates of how fast we could translate.
If we wanted to translate Wikipedia from English into Spanish --
of course, Wikipedia exists in Spanish but is much smaller
than its English counterpart, about 20 percent of it --
If we wanted to translate Wikipedia from English into Spanish by using Duolingo
we could do it in five weeks with 100,000 active users
learning English with Duolingo.
And we could do it in about 80 hours with a million active users.
Since all the projects that my group has worked on so far
have gotten millions of users,
we're hopeful that we'll be able to translate the Web for free.
We haven't yet launched Dulingo,
(Applause)
I'd like to leave you with ...
We haven't yet launched Dulingo and we're planning to do so in 30 days,
but if you go there, you can sign up
to be part of our private beta in about 30 days.
Help us.
Thank you.
(Applause)