Google I/O 2011: App Engine MapReduce


Uploaded by GoogleDevelopers on 11.05.2011

Transcript:

MIKE AIZATSKYI: Good
afternoon, ladies and gentlemen.
I'm really excited to see you here today.
I'm Mike Aizatskyi from App Engine Engineering Team.
And today we would like to talk to you about App Engine
and MapReduce.
So the agenda of today's talk is to go briefly through
MapReduce Computational Model, because I assume some of you
would like another reminder of what it is.
Then we'll remember the Mapper library, which we've now
released like a year ago with previous Google I/O.
Then I'll make a couple announcements.
I'm going to talk about some technical stuff to get you
interested in how we did some things in App Engine Team.
And throughout the whole presentation, I'm going to
show you several examples and demos just to keep you
entertained.
So let's start with MapReduce Computational Model.
You might know that MapReduce was created at Google many
years ago to solve problems with
efficient distributed computing.
And really, absolutely every team uses
MapReduce for something.
I'm absolutely sure that even Android team uses MapReduce
for doing some stuff, for doing some population
statistics, whatever.
And it's been a real success at Google, and we use it for
everything.
And this is a diagram of how it looks like.
It might be a little bit confusing, so we'll go through
it step by step.
But it basically has three major steps, and those are
called map, shuffle, and reduce.
So the map shuffle is some kind of user code which runs
over input data.
That input data might be
database, some files, whatever.
And the goal of the code is to output key and value pairs.
And the MapReduce framework should take those key value
pairs and should store them somewhere for
other steps to consume.
And as you see, the input is usually sharded, like split in
parts, so that all of them can be processed and paralleled
really, really fast. And as I said, this is user code.
The second step of MapReduce is a shuffle step.
And the shuffle phase has to collate values with the same
key together.
It has to go through all the values which are
generated by map phase.
It should find similar, same keys.
It should bundle those values together, and it should output
key and list of value pairs.
And this step has no user code whatsoever.
This is a generic step of web-produced framework,
provided usually as some kind of service, as some kind of
magic black box which does shuffle.
And the third part is reduce.
And the reduce function will actually take the shuffle
output, which is the key, and list of value pairs, and it
does something with that.
It might compute something more
interesting from that data.
It might save it to database.
It might do something.
And this is again the user code.
And again, it's goal is to process the shuffle output as
fast as possible, and it's good if it's parallel.
So once again, this is the diagram of MapReduce
Computational Model, not the implementation
but just the model.
So we have map step, key values, shuffle bundles same
keys together, and reduce, process, and shuffle output.

So the way we work at App Engine, we usually take
something which we already have at Google and which works
for Google, which works for us, and we give it to you.
We give it to our developers.
We bundle it somehow, and we give you access to that same
set of technologies.
And this is exactly what we wanted to do with MapReduce.
But when we started working in that, we realized that there
are lots of problems with this approach,
unique to the App Engine.
First of all, App Engine has scaling dimensions which we
usually don't have at Google.
App Engine has lots and lots of applications, much more
than Google applications.
And we expect many of them that will run MapReduce at the
same time, and that might cause additional scalability
problems.
The other problem is isolation.
We do not want anyone running something to kill a royal
wedding website.
If you run MapReduce, you want royal wedding to be up.
So we want to completely isolate all applications from
each other, but still we want them to have good performance.
It might be surprising, but we also want to rate limit.
Originally, MapReduce at Google was designed to go as
fast as possible.
It wants to process all the data in shortest amount of
time possible.
And that means that we can consume thousands of dollars
resources per minute.
And not everyone is happy about that, because you might
run MapReduce and it will consume all your daily
resources in 15 minutes.
It will kill your online traffic, so you might want to
rate limit it and say, OK, don't run in five minutes.
Please run over two hours, because I have short-term
quotas which I don't want to consume.
And taken to the extreme, there are even three apps
which have a very, very limited amount of quotas, and
they still want to run MapReduces.
And they might say, OK, run this for the whole week, but
just get it done.
But please, please, please do not kill my application, which
is not a goal for MapReduce, Google's MapReduce to run as
slow as possible.
We also have a protection issue, too, because there a
lots of users out there.
Some of them are malicious.
Some of them are just not doing the right thing.
And we want everyone to protect from each other.
We want to protect Google from you.
We want to protect you from each other, so that nobody
steals over there.
So when we analyzed all this, we realized that it's not a
really good idea to take the Google's MapReduce and give it
to you as is.
We realized that we need to do lots and lots
of additional work.
And that's why, a year ago, we released the Mapper Library,
which implemented the first step of MapReduce,
which is map phase.
We released it at Google I/O 2010, in this building.
Since then, it's been heavily used by developers everywhere,
outside, inside Google.
We even use it in App Engine Team.
We use it in admin console.
We generate reports with that.
We have new index pipeline, which is in the works, which
uses Mapper Library.
And lots of people-- like, there was recently a blog post
about guys using Mapper Library,
which is already useful.
And during the whole year, we've been working on
improving that.
And they had quite a lot of set of
improvements to the library.
Let me go through some of them.
We have Control API, so that you can programatically
control your jobs.
So that you can, for example, create a cron job, which will
run everyday and start Mapper.
We have custom mutation pools, which is the way to batch some
work within map function calls and process it more
efficiently.
We have namespaces support, so that normally you can run your
Mapper over some fixed namespace, if you use
namespace, or some several namespaces.
You can even run Mappers over namespaces, not the data
inside namespaces but just over namespaces, so that you
can analyze your namespaces.
You can do some kind of
migration, statistics, whatever.
We have implemented better sharding with scatter indices.
So they think sharding is now pretty good.
And there are many more small improvements to the library.
So if we take a look at this, and we ask a question, what
should we do to take Mapper Library to
the MapReduce Library?
The first thing is we need some kind of storage for
intermediate data, because map jobs, they tend to output lots
and lots of data in MapReducers.
And for that one, we designed the Files API, which we'll go
through in more detail later.
And we released it in March 2011, just two months ago.
We also have to implement the shuffler service.
And reducer is not a problem, because reducer is actually
the mapper.
It just goes over shuffle output.
And of course, we have to write lots and
lots of glue code.
So today, I'm happy to say that we're launching the
shuffler functionality.
And this actually has two different functionalities.
First of all, we have a full in-memory user space,
task-driven open source shuffler for kind of small
data stats.
And small, I mean like, hundreds of megabytes, maybe
gigabytes, maybe even more.
Who knows?
This is completely open source for Python.
Java comes soon.
It will be open source, too.
We also announced that we want to start trusted testers
access to big shuffler.
So if you have some kind of data set, which you need to
run MapReducer on, and you have hundreds of gigabytes of
data and you want to run MapReducer, just get in touch
with us and we will see how we can work for you.
And we also, of course, written all the integration
pieces which are needed to run your MapReduce jobs.
And they are part of Mapper Library, which we are now
deciding to call MapReduce Library.
It's not a map library anymore.
So without going--
we'll go to the technical details later.
But first, let's go through some simple examples of how
MapReducers look like.
And the first really simple example, which is the, hello
world, of MapReduce, is a word count MapReduce.
And the goal of MapReduce is to take some set of text, some
text, and calculate how many times a single word appears in
the text, like basically statistics of distributions.
And map function is really simple.
It's almost like, two lines, even if you throw in some
additional declarations.
Basically you split the text in words, and your yield word
is the key and empty screen is values.
We're not using values here, per se.
We're just going to need them to count those.
So if you take the famous quote from the Pulp Fiction,
then for five words, we can output five key value pairs.
Then the shuffler will take this data, will process it,
and then will bundle all values for
the same keys together.
So we'll have three pairs that maybe--
that are going to have all two values from the input, which
are empty screens that also two.
And baby's going to have only one.
And reduce phase just ignores values.
It just counts how many of those there are.
So it's going to output Zed's two times.
Dead is two times.
And baby is one time.
So let's see how this works.
Is it large enough?
So unfortunately, this stuff runs for several minutes in
the data set that they chose, so I decided
not to run it directly.
And I took 90 books, 90-something books from
Gutenberg Project, which amounts to tens of
megabytes of text.
And I run this MapReduce.
It took six minutes.
And you can see that this is our new UI
for multi-stage processes.
We're going to talk abut about this with it later.
But you can see that the whole MapReduce count took, like,
six minutes, 40 seconds.
Map phase took one minute, 40 seconds.
Shuffle took three minutes.
And reduce took one minute, 50 seconds.
And there were some additional steps, like cleanup and some
statistics steps.
And if you take a reduce job, which has
finished in a minute--
here it is one minute, 50 seconds-- we can see that it
read 78 gigabytes of intermediate data, which is
quite good.
I'm told that shuffler is supposed to process hundreds
of megabytes, but we're actually being run to being
able to run it like over 78 gigabytes without big
problems. The output--
how much?
One gigabyte of data.
And the total input size was just--
the input size, we have, like, 50 megabytes of text, books.
This is how its output looks like.
So we have all the words, which are in some order.
They're kind of sorted, but we'll talk about this later.
But we can see that captivity, or like, I don't know what it
is, but it's actually present only once.
So if we use some kind of graph, with this word, we can
see that it actually happens only once, in this only book.
And yeah, it works.
So this is the word count MapReduce.
And now let's improve our word count MapReduce to build the
simplest possible, really useful MapReduce, which is a
building block for, even for google.com.
And this MapReduce is called building inverse index, which
means that we want to build a huge map which allows you to
look up which documents have a given word.
And you can actually say that this is how
Google Count works.
So if you have internet on your CD-ROM and you upload it
to App Engine, then you can build the Google
Count search index.
So we use the same stuff, but instead of ignoring file name,
instead of outputting empty screens, we'll output the file
name, or document name, or something which identifies
where this word comes from.
And the only thing we do in reduce, we just remove
duplicate values.
We just built the list that this word is going to occurs
in this document, this document, and this document.
Of course, instead of just duplicating documents, you can
calculate how many times it occurs in documents.
So later you can compute some kind of ranking, whatever.
But this is the simplest one.
Again, I'm going to show you that it's actually ran a
little bit slower for some reason.
It's undetermined minutes of time.
But yeah, it took six minutes, 50 seconds.
If you take a look at the size of intermediate data--
here it is.
We can see that we actually processed 170 gigabytes of
intermediate data.
And the whole MapReduce took six minutes, which we think is
actually quite good.
So here is how the output looks like for the whole 60
megabytes of books from Gutenberg Project.
So you can pick--
now you can get this document.
And you can actually build a web storage, which has some
kind of search here.
So let's verify that it works.
Let's take a look at aviators.
Here it is--
want to see aviators, everything.
You can actually see that there are only two documents
which have this word, 1059 and 3061, which is true.
So I'm quite confident that it works.
So these were two most common and the simplest MapReducers
you could write.
Later in this talk, I'll show you a little bit more
complicated and interesting MapReduce.
But now I would like to dive into
technical bits of our solutions.
I'd like to talk about how we did what we did.
And actually I would like to show you that you could build
something, or you can build something, similar or modified
to your needs, now that you know how it works.
And I'm going to talk about two things.
I'm going to talk about Files API, which was our solution to
MapReduce storage problem.
And I'm going to talk about User-Space Shuffler, about how
we built it, which algorithm did we use,
and so on, so forth.
So Files API.
As I told you, and as you have seen, MapReduce jobs generate
lots of intermediate data.
This was a tiny, simple MapReduce, which is just an
example, and processes just 60 megabytes of input data.
And it generated 170 gigabytes of intermediate data.
And of course, you have to store it somewhere.
So before Files API, we had three storage systems. We had
Data Store, which is kind of expensive because it's
replicated over the multiple data centers, lots of copies.
And it also has one megabyte entity limit, which makes
storing 170 gigabytes of data kind of hard.
You have to split it.
You have to remember where you put its chunk.
You have to read it in order.
It's not a pleasant task.
We also had a blobstore, which is really
good, but it's redundant.
The only way you could get data in, you could upload
files through a CPU form, which does
not work for MapReduce.
And you had Memcache.
Memcache is kind of good.
It's really fast, but it's small and it's not reliable.
You can never be sure that the data you put in is still
there, because there might be memory pressure
or something else.
So that's why we designed the Files API.
We wanted to give you familiar files-like interface to some
kind of virtual file systems. These are not local files.
These are virtual file systems. They
look like local files.
If you have ever accessed local files, you will find it
similar to you.
We released it in 1.4.3.
We integrate it with Mapper Library the way I'm
going to show you.
And we kind of consider it to be a low-level API.
It's not something that every user, we
think, is going to need.
Most users will probably just use MapReduce Library.
But you might find some interesting needs
for this one, too.
And it's also still an experimental library,
experimental API.
We're still looking at how it works.
We're still trying to understand what you
need, what we need.
And these files--
I said that the API is really familiar to local file
systems. But it has some huge differences.
Basically, the biggest difference here is that files
have to stay writable and readable.
In writable state, you can write the file but
you cannot read it.
In readable state, you can read but you
cannot write to it.
And all files start in writable.
Later, when you're done writing, you can finalize the
file, and you can transfer it to readable state.
There is no way out of readable state.
Once it's finalized, it's forever.
All writes are append-only, so there is no random access.
All writes are atomic, meaning that if you output 100 bytes,
then nothing is going to get in between of those.
And they are fully serializable between
concurrent clients, meaning that you might have lots of
big round tasks, lots of tasking tasks tasked right
into the same file at the same time and we're going to make
sure that all the data does not interweave between them.
And they also say that some concrete file systems, they
might have some additional APIs for dealing with their
own properties.
And every file system is going to have their own reliability,
or like volubility constraints or guarantees.
Files API doesn't know about that.
In March, we'll release the first file system.
This is a blobstore filesystem.
So we actually took the read-only blobstore facility
and we made it writable.
So you can now write directly to blobstore.
You can create really huge files.
There is no limitation aside 64-bit plans, which I think is
going to be enough for the next thousand years.
Once you finalize the file, those files are fully durable.
They're going to be replicated across multiple data centers.
And they're going to have the same reliability guarantees as
high-replication data store.
But the important point here is that writable files are not
durable, meaning that if something bad happens, for
example, we have data store failure, or we might have a
maintenance period.
So you might lose your files while
they're in writable state.
It is especially so that while you write, we don't replicate
it, so that [UNINTELLIGIBLE]
is good.
Because they think that MapReduce is going to be the
primary consumer of this API.
OK.
If something bad has happened, you're going to restart
MapReduce jobs.
And since this is blobstore, you can fetch a blob key for
finalized file and use the familiar blobstore API.
So if you want to download that file or serve it to the
user, you could get the blob key and serve it directly to
the user without writing too many code.
This way you can create a, I don't know, you can write a
code which generates some kind of movie.
You just write the blobstore frame by frame, then you
finalize it, and you can serve it.
This is a simple Python example of how you could
access it if Java has its own API.
And yeah, we don't have much time to go through all of
those, but basically it looks familiar to the language.
It's mostly like the files you have in Python.
So first of all, you create the files in blobstore.
And you specifically say, OK, I want a file from blobstore.
Then you open it to write.
You write data.
And once you're done, you say, OK, finalize the file.
To read it, it's the same stuff.
You can open file.
You can read from it.
You can seek in the file.
Write is append-only, and read can have random access.
So you can do lots of crazy, interesting things.
And if you want to get blob key, you just call, get blob
key function, and you're going to get a blob key for filing.
As I said, we integrate with Mapper Library.
We edit some kind of output writers support.
So there's one line in MapReduce.
It's [UNINTELLIGIBLE].
It would just specify, OK, I want the output of this Mapper
function to go directly to blobstore.
This way, by having single-line map function, we
just confirm send it to the csv line, or xml, whatever you
can-- you can easily write a mapper which will export your
data into blobstore.
And since you have control API, you can actually create a
cron job, and do a copy of your data every day, or every
week, or run it whenever you need.
We also plan to migrate, both download it and upload it this
functionality.
We're not there yet, but it will eventually come.
But meanwhile, it's now easy to get the data in some sort
of downloadable format.
And we also have some low-level features, which
we're not going to spend lots of time, but basically have
exclusive locks.
You can lock files.
You can make sure that no one else accesses it.
And we have sequence keys, so that if you need some kind of
guarantee that you have written the data you're
already writing, because request might be retried,
possibly might be rerun, or request might come the second
time from the user.
That there is some notion of sequence key and the API will
take care of you.
If you need this, read the API documentation and source code.
And in the future, we're going to introduce more file
systems. The first one which we really, really want to get
out, we call it Tempfile system.
This is going to be much faster, much cheaper, because
it's not going to replicated anywhere.
But it's not going to be durable at all,
even finalized files.
It might lose those if something bad happens.
And you're going to have each file be stored only for
several days, because we want this to be a MapReduce banking
file system.
We want it to be really fast. We want it to be able to store
terabytes there without selling your house.

Another thing is that we want to integrate with other
storage technologies, which we got at Google.
Yeah, just keep looking at we're doing and we have--
we're going to have lots of exciting file systems coming.
So this was Files API.
And now let's talk about the bread and butter of our
MapReduce implementation, which is User-Space Shuffler.
So as I told you, we're going to have two shufflers: the
User-Space, which is kind of simple, open source, and it
has its limits on the data it can process; and we're going
to have a huge shuffler service, which is hidden,
which is just an API code which says, OK, shuffle this
for me, and is going to process lots and lots of data
really, really quickly.
But we wanted to have User-Space Shuffler for small
and medium jobs.
And as you remember, the job of the shuffler is to
consolidate values for the same key together.
So look around the data, find those same
keys, values together.
And even that we want to have this User-Space Shuffler,
meaning it is in full source code.
It has no new App Engine components.
Anyone could have built something like that
We still want it to be reasonably fast, reasonably
scalable, and reasonably efficient.
Because we just saw that, yeah, 170 gigabyte
is not a big deal.
It's just a couple minutes, which is good.
So the way we're going to arrive at shuffler
implementation is we're going to start with really, really
stupid shuffler.
We're going to improve it step by step until we get to the
file algorithm.
And the first inaugural shuffler is just load
everything to memory, sort it.
And read the sorted array.
This is going to look like this.
Once you have your values sorted by key, you can just
read and see, OK, this is the same key, grab this value.
Same key, grab this value.
OK, this is another key.
Here are your values.
It's really just a couple lines of code, but
unfortunately, it has lots of problems.
First of all, it's memory-bound algorithm, by how
much data you can fit in memory.
Like, if you're having a separate workstation, like
server, computer, you might be lucky to have 64 gigabytes.
So it's actually memory-bound, and there is no way to
increase performance by throwing more resources into
it, because it's single-thread.
It's just sorted, no way out of it.
So we're going to improve this one by chunking our input data
into chunks, and store the data into Files API.
And then each chunk will be sorted by its own. so those
chunks should be of some size, because when you're writing in
App Engine, in Python, it's not really memory-efficient,
our chunk size is like five megs.
So we're going to sort each one of those, store it, and
then we're going to merge stored all chunks together
using external merge.
Or even as MapReduce Library does, we're just going to
merge-read without sorting, just
merge-read from all of those.
This is really simple.
Your upper two blobs go to the first chunks, lower three to
the second chunk.
And then you're going to merge-read by maintaining
pointers into each one of those.
You read from all of them.
You see, OK, blue, blue, blue, blue.
Then you move each one of them.
And yeah, it's going to work.
And the properties of this approach is that first of all,
this is no longer memory-bound.
You can sort as many chunks as you want, as soon as each one
of those fit in memory.
Another good thing that this sorting is parallel.
You can throw in thousand computers, give each one a
chunk, and they're going to finish in parallel.
But unfortunately, the merge phase, merge
phase is not parallel.
It's just one phase in a single computer, which has to
merge-read all of them.
It is kind of difficult and slow to merge-read from too
many files, like you could have 10,000s of files, it
should have some kind of heap structure to maintain pointers
into 10,000s of files.
Yeah, it's getting really, really slow once you crank up
number of files.
So we're going to improve this, too.
And the way we're going to improve it is we're going to
use the hash code of the key.
So it's really like, if hash codes of the key is divisible
by two and not divisible by two, then we can process all
those keys, two keys, separately.
Like, all even keys go into one chunk.
All odd keys go to another chunk.
And we can run the same algorithm from the previous
step on all those chunks together.

We don't have to worry about merge-reading all of them,
because each one of them has different hash keys.
This is how it's going to look.
It looks quite nice, you know.
And then I needed a fourth color.
But basically once again, we take a look at each block
output from Mapper.
We separate it into chunks by hash code.
Like, blue and red goes to the top, green and yellow goes to
the bottom.
Then, we sort data from each hash chunk.
So we sort some greens and yellows.
We sort some blues and greens.
And then we're going to do a merge-read
from only those chunks.
And this approach is actually quite good.
Because by having hash code, you can split your output data
into as many hash code-based chunks as you want.
You want thousands?
You got it.
You want 10,000?
So you can have, merge, as parallel as possible.
You can run thousands of merge-reads in parallel.
All sorting is in parallel.
There is no memory-bound thing here.
If you have to read too many data, just increase number of
hash chunks and process them separately.
And this is actually the shuffler, which we released
today for Python.
And the Java version is going to come really, really soon.
As far as I know, even someone is working on that.

Yeah, this was the shuffler.
This is the way it works.
Of course, the goal is more complicated than that, because
we have to deal with lots of tiny details.
But this is the idea.
And we soon realized that our MapReducer consisted of
several offline processes like map.
You have to finish map before you start shuffle.
Shuffle, you have to finish shuffle
before you start reduce.
So we designed a new API to chain complex works together,
and we called it the Pipeline API.
And this is actually the glue which holds the Mapper,
Shuffler, and Reducer, and which makes it a single
MapReduce job.
And our MapReduce Library is now fully
integrated with Pipeline.
And that UI which you saw was multiple stages that actually
come from Pipeline API.
It's an API which takes care of all that detail, that, OK,
after you're done with user, you have to run cleanup jobs
to clean up all those intermediate files.
This is the API which enables to do that.
And if you're interested in this API, I want to emphasize
the talk of my colleague, Brett Slatkin.
It's called "Large Scale Data Analysis Using the App Engine
Pipeline API." I think it's 4:00 something.
So yeah, please come in if you're interested in running
your own complex processes.

And now, I promised you a more complicated, more involving
example of MapReducer.
And the examples, we're going to write is which I call
distinguishing phrases.
So the idea is to take a look at multiple books and figure
out if some phrase has been used in one particular book a
lot, but not used in the other books.
Like, if some authors have their own signature phrases,
or some books have signature phrases.
The way we're going to do it is we're going to split the
text into ngrams. ngrams is a sequence of n
consequent words together.
And you can pick the n which you like, so I ran the
examples for n equals 4.
The problem here, if you increase n a lot, like if you
use 10 grams, then 10 grams are probably quite unique
because these are complete sentences already.
And they're quite unique.
And if you use n equals 2, this is not even a phrase.
This is just two words.
So I picked 4.
And for its use, the first thing we're going to do, we're
going to check if this phrase is actually frequent.
We're not interested in something the
author used only once.
So we check if it occurred 10 times.
Then we're going to check if a single author or a single book
has this phrase more than every other book combined,
which means that this is really unique for this book
because it's more than everyone else.
And we're going to run this on the same data
set, which is 90 books.
Let me show you how it went.
So it took like, seven minutes or something.
This is, as I told, the UI which comes
from Pipeline library.
So we have map, line, shuffle, reduce.
If you actually open the shuffle, you can see that
there are lots of sort pipelines.
And each sort pipeline, is certain one hash based bucket
the way I explained to you.
And yeah, so it finished in like,
seven minutes, 25 seconds.
Just to have an idea of if this seven minutes is good or
not, I wrote a simple Python program, which uses the same
MapReduce Computational Model, but completely in-memory in a
single computer, and in one thread, because it's Python.
And on my computer, it took like, almost
20 minutes to complete.
So this already faster than writing in computer.
And this actually scales, so if you increase the data set,
you could just increase number of shards, and you're not
supposed to see a huge time increase.
And when I was creating this demo, I wasn't even sure that
this was going to work, but it turns out there are lots and
lots of unique phrases in books.
And this is the output.
There are lots and lots of phrases here
which you could use.
Here it is.
Money price of corn.
Let's just check.

Oh, this is not fair.
This is just one book which always talks about
money price of corn.
Let's pick something else.
This is Spanish.
Yeah, these are the same.
It's surprising that this is unique, right?

Yeah, so you can see that there were three books which
use the phrase only once.
"These are the same" something.
And there is one book which used it seven times.
And I'm just curious, this is the--
what is this?
This is an old book of Leonardo da Vinci.
So you can say that in this book set, the use of the same
phrase, kind of uniquely identifies Leonardo da Vinci.
If you have a huge corpus of text, I guess it might become
really interesting.
And this was all like--
if you throw in all the details which I omitted here,
like count occurrences in line of functions, it's like 50
lines of code.
It's not a big deal, and it's really something interesting.
So this was all I wanted to talk to you about today, and
let me summarize what I was talking about.
So starting today, I did a push couple hours ago into
[UNINTELLIGIBLE].
You can grab the source code, and you
can take a look examples.
The communication is coming soon.
And you can run small, medium, and who knows how big they can
be, these User-Space Shufflers.
We haven't actually pushed the limit yet.
You can run those MapReducers today in Python.
Java's going to come really soon.
If you're interested in helping us to get Java
implementation, just write us an email and we'll tell you
where we need help.
And I know there are already people helping us.
Right?
Yeah.
And if you're interested in large MapReducers, like maybe
we should call them huge MapReducers, get
in touch with us.
We'll give you the API to access the full-blown shuffler
for certain terabytes of data.
I don't know.
So now I'm ready to answer any questions
you might have. Thanks.
It looks like magic, but you know, I didn't quite
understand the shuffle part and the merge portion.
You know, as a framework, it looks very complete.
But when you run it on Google App Engine, you have the task
use which allows you to do map.
But as far as the merge and the shuffling is concerned, if
you run it parallel-ly as different execution threads,
you need to have a common place where you can do merge.
It can even be a data store, or it can be memory.
If it's data store, the current BigTable doesn't allow
you to do a Select For Update thing.
Which means if you try to implement the same inventory
from two different threads--
right.
So when you do an update, and when you read something else,
it gives you dirty data, right?
MIKE AIZATSKYI: So basically, yes.
The Pipeline API gives you some kind of
synchronization point.
It takes care of all of that by itself.
It knows all the limitations of App Engine platforms. It
knows that it stores data in data stores.
It knows that it's bad to access the same entity from
multiple requests at the same time.
And Pipeline API hides all the complexity for you.
It just defines a job as a simple Python function.
And you say, OK, run it after this one completes, and pass
all the output here.
And it's going to take care of that for you.
OK.
Right.
So where do we get--
because we are trying to do this for aggregate queries,
and you're not able to figure out a way within the--
MIKE AIZATSKYI: It doesn't use any kind of aggregate queries.
It uses key names in a really interesting
fashion to achieve that.
Right.
But key name, you know, aggregating information is the
objective of merging to a key name, right?
MIKE AIZATSKYI: Yeah.
For a use case like aggregate queries.
MIKE AIZATSKYI: Yeah, but just take a look at source codes.
It's all open source.

Anyone else want to be on tape?

So then we're done.
Thank you a lot for coming.
If you have any questions which you want to ask me off
the record, feel free to come to me.
Thanks a lot.