Google I/O 2011: Scaling App Engine Applications

Uploaded by GoogleDevelopers on 11.05.2011


GUIDO VAN ROSSUM: Good morning, everyone.
JUSTIN HAUGH: Hi guys, so I'm Justin Haugh.
GUIDO VAN ROSSUM: I'm Guido van Rossum.
JUSTIN HAUGH: And we're here to talk to you guys about
scaling AppEngine applications.

There's the right button.
So I'm a software engineer on the AppEngine team, I work on
system infrastructure.
GUIDO VAN ROSSUM: I'm also a software engineer on the
AppEngine team.
I work on tools and run times and everything Python related.
And here's a bunch of links that you probably won't have
time to type in to your computer.
JUSTIN HAUGH: But yeah, there's a feedback link if
you'd like to give us some feedback throughout the talk.
GUIDO VAN ROSSUM: And it's an AppEngine app.
JUSTIN HAUGH: Which means it scales.
OK so here's the agenda for today.
We're going to be talking about scaling on AppEngine.
I'll start off by telling you a little bit about the
AppEngine platform, what is scaling?
Is scaling actually a hard problem?
I'm guessing because of this audience that it probably is.
And then talk about the scaling formula
that AppEngine uses.
And then Guido will be talking about how to build a scalable
application on AppEngine, I going into some tools advice,
some pitfalls to avoid and then we'll wrap up with Q&A.
So why do we care about scaling?
I think it's interesting to go back and look at the first
AppEngine blog post back in 2008.
"The goal is to make it easy to scale and get started with
the new web app, and then make it easy to scale when that app
reaches the point where it's receiving significant traffic
and has millions of users."
And that's still our goal today.
We want AppEngine to be a really high performance and
scalable web framework for you guys to use.
So what is scaling?
It means handling very high levels of QPS with low latency
and few or no errors.
This leads to happy users, happy engineers, happy
investors and that's what we're all looking for here.
So QPS, that's queries per second, we use that term a
lot, it's in the admin console when you go
and work with AppEngine.
Latency, we're referring to the time it takes to handle a
request. With errors we're talking about HTTP response
code, so 500 plus you consider a scaling error.
And then if you think about numbers, what constitutes a
very high traffic app that needs to scale well?
A query per second, I mean if you actually just upload an
app to AppEngine and you are getting a query per second
that's actually pretty good traffic, but that's not really
that hard to achieve in terms of scalability.
When you're starting to get to 100 QPS, 1000 QPS, that's an
app that needs to scale.
When you're talking about 10,000 or more, that's a very
impressive application that's getting a lot of traffic.
It needs to be really well architected and needs to use a
lot of the techniques that Guido will be talking about.
So an example of a scalable app that's currently running
on AppEngine is Panoramio.
This is an application that allows you to kind of take
your lens, it's like a way to go look at photos that people
have shared around the world.
It's actually a Google application
that runs on AppEngine.
So they get about 2000 QPS at their peak.
Nice traffic curve there, that's four
days' worth of traffic.
If you look at the breakdown of there traffic, most of that
is a dynamic request that's been served by AppEngine, some
of it is cached, and their latency is very low and
constant, so this is an example of an app that's doing
a really great job using AppEngine.
This is their errors, so this is about between 8 and 30
errors per second.
This is actually really low, this is
like .01% of the traffic.
So this is a really high performance app that is really
showing what you can do with AppEngine.
So is scaling hard?
There's two schools of thought about this.
I would say, to some extent, the answer's no.
Most programs are actually really scalable.
If you just have a simple website, if you're just
serving static content, if you're reading and writing a
few form fields on a request, that's actually
really easy to do.
If you just upload that to AppEngine it'll most likely
scale really well out the box.
But when you get to more interesting programs that are
doing aggregations, data joins, fan-ins, fan-outs, if
you're working with complex data structures, if you are
working with large amounts of data, audio, video, images
these things are potential bottlenecks in your
application, and you need to be careful about how you are
using these things to scale well.
And so the truth is that it depends.
It depends on the problem you're solving.
It depends on the infrastructure you're using.
If you're using your laptop, obviously you're going to have
some limits even if it's a very well designed
If you have a rack of machines that you're managing, if you
want to scale up you have to go buy a computer, plug it
into the rack.
If you're on a scalable cloud like AppEngine, things get a
lot easier.
We'll talk about how that works.
And of course it depends on how much time and money you
want to throw the problem.
So efficient scaling is what we're talking about.
But programs are born scalable.
Hello World is a really scalable app on AppEngine.
You can get extremely high QPS levels out of Hello World
without a lot of work.
But adding things makes them less scalable.
So reading and writing data, making HTTP requests, large
requests and responses, again large amounts of data, can
slow you down.
Really this is the interesting work that most applications
are doing, so we'll be talking about strategies to avoid
these potential hot spots.
So caching parallel, do things in parallel using async APIs,
or threading, can you do things offline?
These are things that you should be thinking about as
your application starts to get more complex.
So let's talk about the AppEngine
platform and how it works.
So AppEngine's all HTTP based, everything is
based around requests.
We have the concept of app instances, so you upload your
app to tap AppEngine you just write some code, hit upload
and it's there, it's deployed.
And each instance of your application is a process that
can handle any number of requests.
So python processes are single threaded, and handle requests
one after the other in serial.
Java just recently added multithreaded support as of
1.4.3 and this allows Java to handle requests in parallel if
you enable it.
AppEngine performs dynamic scaling.
So you actually don't use any instances that would take up
any space in AppEngine if you're not getting traffic.
And as your traffic starts to rise we just automatically are
creating instances for you based on a scaling protocol.
So we look at a lot of variables, perform a
calculation, and then if your traffic levels of are starting
to increase, we'll add some more instances.
So we'll go into a little more detail about that.
And our platform both are to minimize your latency of the
requests that are coming in before they get handled by an
instance, minimize the number of instances, and then have
very high utilization of those.

So the scaling formula.
Whenever you get a request, whenever a request arrives at
AppEngine, it actually doesn't immediately get
handled by an instance.
It waits for a short period of time in a pending queue.
And this is in contrast to some more traditional
architectures, where requests are immediately routed to a
machine or a server.
So AppEngine has to make a decision.
It has to decide how long to wait, and what to do with that
request. Should it give it to any existing instance?
Or should it create a new instance?
So the inputs to this formula are: the number of instances
that you currently have, the throughput of those instances,
and the number of requests that are waiting in the
fitting queue in order to be served.
And when I say waiting, we're talking on the order of less
than a second.
Less than 100 milliseconds.
And so AppEngine will make a prediction based on these
variables of how long requests are going to have to wait in
order to be handled by one of these existing instances, and
if that prediction is too long we'll create a new instance.
So there's a comparison to the loading time, and you can see
some sample code that gives you the
idea of how that works.
So let's walk you through a quick example.
So let's say you have an application that has about 100
milliseconds of latency once it handles a request. But
creating a new instance takes a second.
So AppEngine is tracking these variables for this
And let's say there's five instances already.
So, in the first case, let's say there's ten requests that
are sitting in the pending queue that have just arrived.
The wait time to handle those requests for the next request
that comes in that queue, it'll have to wait about 200
AppEngine knows this because it's looking at your latency.
So it's going to wait.
It's going to not load a new instance because it would take
a second to load up that instance.
So it's faster just to wait.
And so that's the result.
In the other case, let's take another example where 100
requests are sitting in the pending queue.
The next request that comes in would have to wait two seconds
for those five instances to churn through the request
queue in order to handle that request. So in this case the
results of the algorithm is to create a new instance.
So you can see how this works.

But there's actually a little more detail to it.
It's not enough just to compare the loading time and
the waiting time.
We also need to calculate and to take into account the warm
latency, the latency of how long it takes to handle a
request once your instance is ready to go.
And in this case it was about 100 milliseconds.
We actually would more aggressively create new
instances for this application if the pending time started to
get large in comparison to the warm request latency.
AppEngine doesn't want to be the cause of your latency, so
we do what we can't to increase your instance count
and scale you up.
So in steady state, there's going to be a certain
percentage of your instances that are idle, as they finish
a request, wait for a short period of time for a new
request to come off the pending queue.
But the waiting time is going to be very small in comparison
to your latency.
But this is a dynamic process and new requests are coming in
all the time.
Instances may have to be turned down if they have an
error or they exceed a certain number of requests to be
handled overtime.
So things are always in flux.
So AppEngine is doing what it can to really optimize your
performance and also your utilisation.
Warmup requests are something I'd also highlight here.
Warm up requests allow AppEngine to create new
instances in the background without actually handling a
user request. And so users will never see the latency
that would result from loading up a new instance due to a
warmup request. So you can enable this in your app.yaml
in Python, there's also a Java version equivalent, by adding
in-bound services warm up.
So how quickly can your app scale up?
You know you've got a big mention on Tech Crunch, you're
getting a ton of traffic.
It depends on your latency.
And these are some guidelines I would give you guys.
If you latency is about 100 milliseconds this is
excellent, you're doing really well.
If your latency is 250 milliseconds, that's OK.
When you start to get a second or higher, AppEngine is going
to be a little more conservative about giving you
new instances, because your application is not performing
quite quickly enough.
And it also depends on your loading time.
So if you're less than a second that's a pretty good
loading time.
AppEngine will be willing to load up new instances.
Anything longer than that starts to slow things down.
And so what we're talking about in this section of the
talk is the velocity of adding and handling more queries per
second, so let's say your traffic goes from 0 to 10 QPS
for let's say a 200 millisecond application, you
can you scale very quickly in about a second.
When you get to 100 QPS you're talking just
maybe about ten seconds.
When you talk about 1000 QPS, that's quite a lot of traffic,
AppEngine needs a little bit of time to react to that.
And because throughput is so important, I just wanted to
contrast single threaded and the multi-threaded case.
So in the single threaded case, your run times are only
handling one request at a time.
So the QPS really is almost entirely determined by the
latency of your application.
You can see some examples here.
In the multithreaded case, and again this is new in Java,
that as of 1.4.3 you can mark your app thread safe and it
can handle multiple requests at the same time.
So this is a really great way to make your Java applications
highly scalable and I recommend you guys
go check this out.
And really at this point your throughput per instance is
primarily determined by the amount of CPU usage that
you're doing to handle a request. AppEngine will
continue to give your multithreaded instances more
and more requests, as long as the CPU rate is reasonable.
And I've given some examples with 2.4 gigahertz processors
which is about what we're using right now.
And so you can take a look at the instances console.
This is what a typical app might look like.
You can see the QPS per instance, the latency, and you
should go check this console out pretty regularly to just
see how your instances are performing, and see if you can
do something to make them more scalable.
In this example, this app has enabled always on, and always
on is another thing that you guys think could enable to
help your apps scale.
Because you'll have three instances that are always
ready to go at any time.
And I'd recommend you do this if you are thinking you're
going to expects some load spikes or
have erratic traffic.
But back to reality, you know we've gone
through some math here.
How do things work in real life?
These theoretical limits we've been talking about, they're
not exactly achievable.
There's routing overhead, there's load fluctuations,
there's some safety limits that app engine has in place
to prevent us from just creating hundreds and hundreds
of new instances in a very short period of time.
So app engine is a very scalable platform, it does
need a few seconds, a few minutes if you're talking
about thousands of QPS.
And we do tweak the formula regularly, so you make
trade-offs between performance and utilisation.
Of course we could just throw more instances at your
application, but that wouldn't provide very good utilization.
And there's machine upgrades, infrastructure changes that
are taking place, we introduce new features like
pre-compilation, warming requests like always on that
make things scale better.
So just some take aways I would mention.
AppEngine does a lot for you.
It tracks your latency, your CPU, it's a scalable cloud
platform that's always adding and removing instances.
It's trying to optimize both performance and utilization.
And it responds very quickly to traffic spikes.
But it doesn't understand where your latencies comes
from, it doesn't understand where your CP usage is coming
from, it he can't make your app more performing.
So Guido will be talking a little bit about that in a
second, but you know it's really a partnership between
you and AppEngine.
And just as an example, here's a successful partnership.
This is Matt Mastracci of and he just sent us an
e-mail last November, saying just wanted to say thanks for
making a great product, they got mentioned on "The View",
resulted in a lot of traffic, it looks like they got up to
about 300 QPS.
AppEngine scaled wonderfully, we got about 900 errors on the
front page while it scaled up, but compared to the overall
traffic that was nothing.
Our app has seen a ton of traffic but it's amazingly
fast, you wouldn't even know.
And I think this really shows like how
AppEngine typically works.
We didn't do anything special for Matt.
He's just a typical user.
So with that I'll turn it over to Guido, to talk about how to
build a scalable app.
GUIDO VAN ROSSUM: Thank you Justin.

So now that Justin has discussed some of the theory
behind AppEngine scaling, I'm going to discuss what you
actually have to do to make your apps scale as well as
that successful application.
So there are some techniques and tools that I want to
discuss, beginning with loading testing and then
focusing a little bit bit on something called Appstats.
But before I even start this, I want to mention that always
you have to treat this as sort of experimental science.
You're looking at a very complicated system, it behaves
a certain way, you have to really poke and prod it in
various ways to find out how it really behaves.
Which is often sort of unintuitive.
You you might not actually understand the performance
characteristics of your own application, even if you wrote
the code yourself.
At least that is my personal experience.

Even so, whatever you learn today might not be valid
tomorrow, or next week or next year.
Because your users change.
Maybe they become more sophisticated, maybe you
attract different types of users, also of course your
application changes.
And it's very likely that when you add the new feature in
this part of your application, somehow it effects the
performance of some other part of your application that you
didn't think was affected by that feature at all.
Also as data accumulates in the AppEngine data store, the
performance characteristics of the data
store may change somewhat.
And of course AppEngine itself changes.
I mean it's a very complicated production environment, the
sort of the network weather varies by day and by week.
Also the AppEngine team always make improvements, as Justin
mentioned, that complex scaling formula is regularly
tweaked and tunes and sort of adjusted.
And in general we do that to make sure
your apps scale better.
But sometimes we make changes that like work out one way for
90% of the apps and work out a slightly less positive way for
some other apps.
So, sort of be aware, even if you think you know exactly,
you understand the performance of your app, it may change.
So keep an eye on it.
So the most important technique that I recommend
that anybody who is expecting or even hoping to reach a
significant number of QPS, you have to do this basically.
Run a synthetic load test. There are many tools to sort
of aid you in load testing that can send synthetic
traffic to your site from some other system that is outside
AppEngine or sometimes some tools
actually run on AppEngine.
And the important reason to actually try your app with
synthetic traffic is that when you when you didn't do this,
and you hit a sort of sudden success with live traffic,
it's too late to fix things or it's a mad scramble to figure
out what went wrong.
And the thing is, in practice you almost always hit a bump
if you didn't actually plan and test.
So the basic approach for load testing is
actually fairly simple.
You sort, of using the one of these tools to generate
synthetic traffic, you generally increase your
traffic that your application receives.
And I would recommend using a test instance or at least a
test version of the application if you've already
gone live for a smaller number of users.
So you increase your traffic basically
until you hit a bump.
That bump can be that your application suddenly starts
return much higher error rates.
Or it just sort of starts responding slower and slower.
Those two are actually pretty related.
Or your load testing tool finds out that it cannot drive
traffic at the site faster than a certain QPS.
That is the point where you're going to, unless you say OK
I'm really happy with 20 QPS, that's all I want to shoot
for, but in general you probably hit the bottle neck
and you want to get beyond that point.
So now you use some other tools, and there's the
AppEngine dashboard, the whole admin console has a number of
different tools just already showed up.
The instances console, there's a whole bunch of different
charts of performance over time that you can use.
The logs are a very important tool, also the quoted details
help you understand what is going on.
How much how much of each type of resource
your application uses.
And then there's Appstats which we will get to it in a
few slides.
Using all those tools together, you have to actually
sort of perform science and reasoning and logic about your
application as if it's a black box and you don't understand
exactly how it works, until you understand why it is
hitting this bump.
Is it a data store problem?
Is your cache not operating correctly?
Are you're spending too much time expanding the template?
Are you hanging on some external URL?
There are many possible reasons.
So once you understand what's going in your application, you
produce a fix.
You rewrite some small section of code-- hopefully it's a
small section--
fine tune something, change some configuration, redeploy
and then you go at it again.
It's just a rinse and repeat approach and hopefully now
you'll get a little higher.
And then probably, if you keep driving more and more traffic
at as your application, it is very likely that you'll hit
another bump in a different part of your application or a
different part of the infrastructure.
You learn a lot about AppEngine this way, and you
learn a lot about sort of performance tuning in general
and about your particular app in specific.
And hopefully you will eventually reach sort of a
smooth trajectory where you can handle many 100s of QPS or
whatever your traffic projections are.
So there are a number of things that I want to sort of
remind you of doing.
Justin already explained that the complex scaling formula is
very nice but it sort of generates instances gradually.
So don't run a load test where you start driving 200 QPS at
an app that was dormant before that, and run that for half a
minute and see how it performs. Because during that
half minute all you will see is errors while some instances
are being created but not enough to actually
deal with the queue.
Gradually increase your traffic, so take like three or
five minutes or so to reach 100 or 200 QPS.
Then let it run for a little while.
So that you can actually observe the application in its
ideal steady state.
Another thing I want to emphasize is, it's very easy
to sort of hit one URL with any of the load testing tools
or performance testing tools.
That may not actually be how your users are going to use
that application.
One technique that I like is invite a bunch of friends to
use the application, and sort of just record what they do.
Keep track of exactly which URLs they hit, what their
behavior is.
You can use the logs, it's pretty easy to sort of figure
out on the logs which part belongs to which user.

And so if you do this at a very small scale but enough to
have some idea of what users do, and those sessions you can
then use to sort of tell your load testing what kind of
traffic to drive at your application.
Like so many hits of the home page, so many hits of the
preferences page, this and that.
Make sure to also include static resources like style
sheets or images, what have you.
Also take into account that there's a certain amount of
client caching.
Although if you expect thousands of different users,
of course each of those users is going to request every
static resource that you have on your home page at least
once, so sort of think about all those things to create a
realistic synthetic traffic pattern.
Because I've seen some people very disappointed because they
had a load test that proved that they could handle 500
QPS, and them on the actual users came along they had a
bump at 100 QPS.
Because the users were doing very different things than the
load test was testing.
Also another thing is sometimes load testing tools
themselves have an obvious sort of limitation.
One obvious limitation is of course is if the load test,
the load generating tool runs on a machine that has
relatively low network bandwidth, because then it
can't actually drive a very significant amount of traffic.
Anyway load testing can be a lot of fun.
Do it.
Get good at load testing before you sort of announce
your first app to the wide public.
It's much better than having to scramble
once your load comes.
So one important tool that I want to call out at least a
little bit at this point, because it's so incredibly
useful both for debugging apps in general and for developing
performance like doing a load test in
particular, is Appstats.
Appstats is a tool that-- actually I gave a talk about
Appstats about a year ago--
so I'm not going to say a whole lot about it because you
can still find that talk on the web, search for Appstats
or Appstats AppEngine or something and you'll find it.
So Appstats is based on the theory that most likely if you
have an app that doesn't perform adequately or that
hits some kind of performance bump, as you try to scale it,
the most likely reason is something to do with remote
procedure calls.
Now in AppEngine everything you do with the data store,
with Memcache, fetching external URLs, everything like
that is actually an RPC.
Traditional CPU based profiling just counts function
calls and times them and doesn't give you very good
information about what those RPCs are.
Appstats focuses entirely on the RPCs, collects information
about every RPCs made you during a request, and then
visualizes that in very sort of in your face ways.
So it's completely obvious what your request is doing,
and often then it's also completely obvious why it's
not performing the way you thought it should be.
And this tool exists both for Python and Java.
So the only other thing I want to say about Appstats, this is
the kind of chart you get out of Appstats.
This is a very simple time line , each box here
represents one RPC, so you can see that here was a data store
that called for some reason 247 milliseconds, so that's
interesting I would say.
So what you can is you can click on this box, now this is
not a live demo so I'm not going to show that, but you
can click on that box and it will, somewhere here below, it
will expand to a stack trace where you can see exactly
which sequence of function calls led up to that
particular RPC call.
And at least if you're using the Python version, you can
also inspect the contents of each stack frame so you'll see
the local variables and parameters that will pass into
these functions.
So if you Appstats shows that you're making three queries
like in this case, or three queries in a row and you sort
of thinking about what that request did, you were only
expecting it would make one query, you can immediately
find out where those other two queries come from.
And maybe they're totally expected once
you understand that.
This is an incredible eye opener, so go download--
you don't actually have to download it, it's included in
the standard SDK and also in the run time.
So now that we've sort of discussed the science of
measuring what's going on in your app, by sort of using
either a telescope or a microscope, look at your app,
what things can you do to make your application work better?
What what sort of typical techniques can you use to
solve the problems that you found using or
doing a load test?
Now there are lots of strategies.

One strategy I would sort of summarize as stupid schema
tricks, the AppEngine data store is not a relational
database and there are a bunch of things you can do.
Given the focus on RPCs, the fact that too many RPCs is so
often a cause of slowness in your application, batching
RPCs together is an important technique to save on latency.
There's also something called parallel RPCs where you don't
reduce the number of RPCs, but you reduce the time that you
wait for them.
And finally of course there's lots of different places in
applications and on the Internet where you can tune
your caching.
So let's look at each of these in a little more detail.
So stupid schema tricks.
Basically and there have been many talks and blog posts and
articles about this, so I don't have to explain much of
this, but AppEngine's data store is not a relational
database, it is not SQL.
And some of the things that are sort of dogma with SQL
actually work counter productive in AppEngine.
For one example is, in SQL if you have a table with lots of
columns, and you query against some of those columns, the
data that it is in columns that you're not looking at is
never actually fetched.
Well in AppEngine it works slightly different.
All the data that it is in a particular row--
if I can sort of carry over that SQL terminology--
is actually fetched into your application.
So if you search for certain columns, all the data that is
in other columns for the same record will still be fetched
to application.
If you have something like a photo application that
contains there's a large blob of jpeg data that you don't
use except when you want to display the image of course,
you would still be paying for fetching that as part of your
query results.
So a very simple solution here is to break up that particular
table or entity kind as we called it in
AppEngine, into two parts.
For example the photo metadata, which would be a
small entity with just the columns that describe things
like date the picture was taken, title,
file names, so on.
And a separate entity containing the large bulky
data, like the image, there may be a thumbnail and some
other things.
So now your query results come in much faster because that
bulk data doesn't have to push through the client.
Pretty much the opposite is sort of duplicating data in
multiple entities.
This is basically sort of going against the SQL dogma of
normalize your schema.
There are many situations in AppEngine where it's actually
useful to actually make redundant copies of a certain
piece of information in a couple of different entities
that you then all have to update at once, in order to
make reading those entities more efficient, in the sense
that you only need one of those entities in order to be
able to display the information.
You don't have to sort of do pointer chasing.
Because pointer chasing, which in the AppEngine data store
turns into reference chasing, every time you chase one of
those references is another RPC that gets very costly.
So batching, it's a very simple concept.

If you're somehow in a loop, or maybe that loop is sort of
hidden in your code, if you are fetching a whole bunch of
different entities or maybe you writing a bunch of
different entities, one entity at a time, you pay RPC
overhead for each entity you read and write.
And given that all this goes across various networking
nodes, and there's some access control checking going on, and
various bits of overhead, the cost of an RPC is sort of the
cost that it takes to do the actual work, like find the
entity in the data store.
Plus the cost for the network traffic going back and forth.
And there's a certain amount of constant cost per RPC that
by combining a whole bunch of get requests or put requests
in a single call, you can shave off lots.
So using Appstats I took two snapshots of a very simple
application, it just fetches 20 different keys using 20
RPCs, and in this case it took nearly 400 milliseconds
realtime to fetch those entities.
And then I compared that to a slight modification of the
same app where it fetches those same 20 keys using a
single batch RPC.
And here it only takes 200 milliseconds.
So we sort of halved the latency of the request by
using a batch request. So both the data store and Memcache
are fairly latency sensitive in this way.
So you can get a lot of affect by using the batch APIs and
the data store has a slightly different style of API use it
has to get a put request, a list of keys or entities
instead of a single key or entity.
Memcache has seperate get and get null calls.
But in both cases the effect is the same, the more you
batch, the faster it goes.
So parallel RPCs is a different approach.

Until the AppEngine release 1.5.0 that was released this
very morning, you can't look at it because the wifi is
pretty much down.
But I'll tell you, it is out.
Until now the main application of parallel RPCs was for URL
fetch, which is the AppEngine specific interface for going
out on the web and fetching something from another server
that is not in Google data center.
Or maybe it is in Google data center.
And since an external server easily takes 200, 300
milliseconds or a second or so for a round trip, if you have
to fetch more than one external RPC, that could
really slow you down.
So I think about two years ago, we had a separate
asynchronous API to URL fetch where you can sort of say, go
fetch this one, go fetch this one, go fetch this one, and
then wait for all of them.
Again here is a little example chart taken with Appstats
where you see the dramatic difference between doing those
URL fetches in series or in parallel.
So the new thing in 1.5.0 is there are separate
asynchronous calls for data store get, put, and delete.
Which are the most important ones for
which this makes sense.
So you can start a series of gets and a series of puts
simultaneously, those gets async or put async calls
return something called an RPC handle.
So they return immediately, they, in the background, start
sending the stuff to the data store and the data store
starts doing the work.
And then when your application is ready to consume the
results it can call get result on the handle and then it will
get the same results as the corresponding synchronous call
would have returned.
The advantage here again is that you can make a bunch of
these calls in parallel, set up a bunch of gets and puts.
In the future you will even get be able to set up multiple
independent transactions.
And your total wait time is equal to sort of the slowest
of all the RPCs rather than the sum of all the RPCs.
So the more you parallelize the more you win.
And you can do this in Java too, Google for get async data
store service to find the proper documentation for that.
So the last sort of standard technique that everybody needs
to know about and hopefully you have heard
about it, is caching.
So there are actually lots of different places
where you can cache.
This is a tiny little diagram of how requests travel from
the browser to your application to the data store
and then the data comes back.
So the question is where can you where put in caching?
Well it turns out there is actually catching
opportunities in every one of these boxes.
So let's look at the first one.
In the browser, closest to the user, the browser caches
stuff, how much the browser caches depend on various HTTP
headers and the most recommended header at this
point is cache control.
There's also something called Etags which is sort of an
additional thing.
And then there's expires but that's pretty much sort of
replaced by cache control.
So if you only want status to be cached in the browser for
various privacy reasons you can set cache control private
and then you set a max age.
Now the problem with the max age is, if you set it too low,
the cache is not very effective.
If you set it too high, the cache is more effective but
there's also a probability that the users
will see stale data.
Because the browser really sort of holds on to that data
in its cache very effectively.
So what's a good number for max age?
Of course that depends on your application.
But basically this is a problem.
So there's a different place where you can ask for caching
and in many cases it will actually work, and that is
sort of on the Internet.
And now you can--
companies run proxy servers and ISPs often also run
Internet caches.
So that popular content, this is most effective if you're
doing something really high visibility, really high QPS.
Like say you're planning some kind of royal wedding, the
nice thing here is even with a small cache time out, like
setting it to a minute or even 10 seconds sometimes, if there
are enough users that are all talking to the same cache--
so if the ISP has positioned their cache
in the right spot--
even such a short caching time can be very effective.
Because during that minute that the data is valid in the
cache, hundreds or thousands of users might actually be
hitting that cache instead of going all the way to your app.
So your app sees less traffic and users are happier.
Of course at this point you have to put public in your
cache control header, because that cache is shared.
So make sure that you don't accidentally use this for per
user, or otherwise sensitive data.
In some cases, Google itself actually has
a cache like this.
Now I have to it to emphasize that this is not
a guaranteed service.
And it only work for paid AppEngine apps, which is good
because if you're a free AppEngine app you probably
don't have enough traffic to benefit from this style of
caching anyway.
But there is some caching in Google's Front End servers.
It's controlled exactly the same way, just the same cache
control public header, has the same nice property there s
fairly small time out on the cache.
It's still sufficient to get the high benefits.

The nice thing is if it's being cached in Google, you
will still see this traffic on the dashboard in the AppEngine
console, and you'll see it in the logs.
You can recognize it by the 204 response code.
So that's catching sort of your application.
Of course you can also put caches in your application.
And perhaps the simplest place to put some caching is just in
main memory of your application.
In Python you just store stuff in global variables.
Modular global variables.
In Java you use static variables
for the same purpose.
The upside is that it's really fast because it works at
memory speeds, like 100 nanoseconds or so.
The downside is of course that each instance, and we're
assuming here that your application's getting enough
traffic that there are multiple instances, has its
own copy of the data.
So if it takes a lot of effort to compute that data, each
instance recomputes that data when it starts up, or when it
first needs it.
And you'd better only use this for data that essentially
never changes, because it changes in one instance,
there's no effective way for that instance to communicate
to the other instances that what it is
caching is now invalid.
So that's a natural segue to using Memcache.
Which is a separate service, that even though it's called
Memcache, it doesn't use your application's memory, it uses
a separate server's memory.
However because it just stores the data in
that server's memory--
that server has like, oh I don't know, 16GB of RAM or
something, or 32 or 64--
that's shared between different AppEngine apps.
But basically with like a latency of one millisecond or
a few milliseconds, Memcache stores one consistent copy
that can be accessed by every instance of your app very
Of course this is not persistent, so don't use this
as a sort of substitute for the datastore.
I also want to mention something
called dog pile effect.
There are situations in production where temporarily
the memcache service is unavailable.
I mean there are other situations where it just sort
of expires data before you say it should be expired, but
sometimes it's unavailable.
And then if all your instances go to the data store, instead
they might actually overwhelm the data store.
So there's some tricks with time outs and keep trying not
to overwhelm the data store.
The best thing to do if you want to learn more about that
is just Google for dog pile effect.
So there's one final spot where it actually sometimes
make sense to cache stuff, and this might be a bit sort of
You can actually use the data store as a cache.
A nice example is Nick Johnson Bloggart application which is
a very simple blog management app.
And basically what he does is, whenever you create a new blog
page for when you update an existing blog page, he
pre-renders the sort of resulting output, which is
probably composed from the blog entry and info about the
author and info about other blog entries.
He sort of pregenerates all that and stores the fully
generated page in the data store.
So when a page is requested, when users are viewing the
blog pages, it's a single datastore store request. It
just fetches the pre-rendered page and returns it without
any sort of processing.
And that way he claims that they can do sort of 50
millisecond latency which is a very nice number I would say.
So, we're going to cut the Q&A time a little short
unfortunately, fortunately there's a lunch break right
after this.
So there are certain things that I've seen a lot of people
make mistakes with, and in general AppEngine team has
some experience with what typical user mistakes are.
There's a bunch of programming bugs that I actually discussed
in my Appstats talk last year.
So search for the Appstats video from last year.
I want to call up out two specific things, and there are
also blogs about these so I don't have to explain in too
much detail.
The first one is entity contention.
When you're writing this same entity or the same entity
group if you're using transactions, the data store
is actually limited to, let's say one write per second.
I think the data store team will say, well in practice, we
sometimes support a little more than one write per second
but not a whole lot.
So clearly if you're keeping sort of a global counter of
how many requests have been made to your service.
It's not a good idea to store that counter in data store in
a simplistic way, because you would be limiting your QPS to
about one request per second.
This problem has been known for a long time and there's a
fairly straightforward solution for it.
Just search for shard encounters.
And if you actually do the search right you'll also see
some alternatives that actually don't shard but but
use various useful approximations.

Finally, another thing that can happen in the data store,
is if you append data with a sequential increasing key, or
actually also same problem if you're using index property
value, if you sort of append to the end of the sequence of
data, all that data goes to the same thing
called a tablet server.
And if you want to understand more about tablet servers, go
to the talk called "More Nines Please." I believe it's
tomorrow morning.
It explains a lot more about how the AppEngine data store
works and what kind of things we've done in the last six
months or so to make it work better.
So anyway, the problem with sequentially appending data
that has predictable keys like this, is that all your data
always goes to the last tablet that is assigned to a
particular entity kind or table.
And normally tablets when they get too full or too busy they
split up and they sort of distribute the range of keys
that they handle in two.
But because of the append behavior of the application,
that doesn't actually help.
And no matter how often you split that last tablet, all
the data will always go to the last tablet in your list of
tablets for your data.
And now tablet servers are also limited.
I do not know exactly what the right value for n is and it
probably varies a bit on the exact properties of the data,
but let's say 10 or 20 writes per second.
So again that would sort of limit your QPS
to a sadly low number.
Or what it will do is you'll get lots of data store time
outs and application errors.
So what you have to do is randomize your data.
And sometimes you don't have to do something very
Just using the user name or something is often enough to
avoid this hot tablet issue.
If you want to learn more about this, go to Ikai Lan's
blog, he's one of our developer
relationships people.
He has a blog called Ikai Lan Says and he has actually
drawing skills that greatly surpass mine, so he has a
bunch of very entertaining cartoons that explain how sort
of the tablet splitting works under good circumstances and
under not so good circumstances.
So all I have left is a little summary of what we've learned
today, app engine scales that can support many thousands of
QPS, but to get to that point you still
have to do some work.
You have to understand your application and tune in and
load test it.
And one of the reasons is that sort of new instances are
created gradually, and sort of in order to optimally create
new instances, the request latency, how long it takes to
execute one request, is the key factor.
So treat your performance to tuning as a science.
We discussed some tools, some techniques, a bunch of
approaches for speeding up applications, a whole bunch of
different places where you can do caching.
And we discussed some common bottlenecks.
And well here are some links, and now maybe we do have
enough time, and if people want to ask questions, please
use the microphone.
And we'll both be available for answering questions.


MATT MASTRACCI: Hi I'm Matt Mastracci
from, actually.

We do a lot of continuous integration, and I was just
wondering how instance scaling works when you push a new
instance of your app?
Because we push probably 20 to 30 times a day.
I guess how are new instances scaled around the
times when we push?
JUSTIN HAUGH: Is this mike still on?

We can hear you.
JUSTIN HAUGH: When you deploy a new version of your
application or you do an update, AppEngine remembers
quite a lot about--

okay, is the mike not working?
So AppEngine remembers-- hello, is this working?
So AppEngine remembers when you deploy a new version,
quite a lot about the previous versions's scale and its
performance characteristics, and so generally speaking
that's usually pretty seamless.
It doesn't suffer the same sort of gradual ramp up effect
that you would normally expect.
JUSTIN HAUGH: Thanks for your quote.
ALBERT WONG: Hello, my name's Albert Wong.
I saw that Go was offered as a language for
AppEngine on the page.
Would Go allow you to have multithreaded
applications also?
JUSTIN HAUGH: That's a question for the Go team, I
think I'll probably defer it to them.
ALBERT WONG: Is there going to be a talk on that?
GUIDO VAN ROSSUM: Yeah there is today, it was like at this
very same time slot in a different room.
JUSTIN HAUGH: But if I could just speak for them, I would I
guess with high probability the answer is yes.
GUIDO VAN ROSSUM: Yeah, Go has sort of built in concurrency
support and they would be crazy not to turn that on.
It's new for us too, so we don't have much experience
with it yet.
JUSTIN HAUGH: Question over here?
Uh yeah, you mentioned for concurrency support on Python,
is that something that's going to come in the future?
Beause I guess for someone who's starting off with a
brand new app, would you say in terms of scalability would
Java in fact give you more power in the sense that it has
that threading support on AppEngine?
JUSTIN HAUGH: I can speak to that for just a sec.
We do have some plans around Python concurrency, they're
not ready to be announced.
But Python is quite scalable, we see some apps that get a
lot of traffic with Python.
GUIDO VAN ROSSUM: I would say that other things to do
consider besides the concurrency is that often a
Python app uses less memory than a similar Java app and it
also starts up faster, so instance creation latency is
much lower for typical Python apps.
And again that is sort of data that is valid today, I don't
know how valid that will be six months from now.
JUSTIN HAUGH: One one other thing I would just mention is
that the load time of your application really does matter
quite a lot.
And Java does have a little bit of a longer start up time.
Python starts very quickly.

I guess one thing I'd mention on the concurrency thing is no
GS is single threaded, so it's a pretty good at handling a
lot of requests for server.
My question is that you mentioned a test instance.
I was wondering is there a way to create a test instance,
with a test copy of data that you can write to but not
affect the production data?
GUIDO VAN ROSSUM: I would recommend just copying being
your production data using the bulk loader, so you just
download a copy of your data using the bulk loader, to your
own workstation and then you upload that to a
different app instance.
So you'd have to create two app instances, and you can't
do that with a test version, but if you create a separate
test app instance with just a different app ID, you can just
upload your production data there.
Great, thank you.
I was wondering, can you share a little bit more detail of
your formula?
So are you using some kind of dynamic programming approach
or are you doing some kind of prediction under QPS?
And specifically what I want to know is whether this
formula, whether it's a [? VP ?]
or not?
Does it apply per application or for all applications?
JUSTIN HAUGH: It's per application.
It's actually per version of your application.
So you can actually have multiple versions live at the
same time that are all receiving traffic and each one
is evaluated independently for its performance
And then the last question.
Did you consider having some kind of an API where the
application can give you the hints?
JUSTIN HAUGH: We do have some thoughts around that.
Nothing right now.

Introducing some more controls around certain scaling
variables or the way the formula works is something
that we've been thinking about.
For instance, allowing, for example, when your
when an instance has been sitting idle for a while when
traffic comes down, the determination of when to turn
that instance down, that's example something that we
might give control over.
Thank you.
With regards to Memcache, how are you handling key
distribution and how durable are the instances?
Like if one goes down, and how often do they go down?
Do you lose all your keys or is that distributed?
GUIDO VAN ROSSUM: It's a hash table.
When the Memcache server assigned to your app goes
down, you do lose all your keys. so that's why I put in
that warning.
There's a talk tomorrow about life in Google AppEngine
production I believe where they go into
a little more detail.
But basically model you need to think about is that each
application typically uses one specific Memcache server,
which is shared with other applications.
And the Memcache server tries to give each app its fair
share of key and data space.
It does somewhat penalize applications that use their
cache inefficiently.
Like if you have very few cache hits I think it might
evict your data a little sooner than when you're really
good at cache hits.
Thank you.
My question is tangentially related to scaling because I'm
finding myself having to create a lot of reports that
are generated basically by my app's own data but aren't from
a browser-based client.
And so I've read up on the Mapper API and some other ways
to kind of get around my own mental limitations of big
table versus SQL and some smart ways to do that with
tasks queues and whatnot.
But when I look at the data and the kinds of information
I'm trying to pull out, it's essentially a replication of
the kinds of things I would get if it were a browser-based
client running on Google Analytics.
So my question to you is, are there any plans or any thought
or anything in the roadmap to try to get server-side
analytics in a nice clean package?
Come to the next talk.
Thank you.
I'm curious if you guys have any per user
throttling in the system?
I've been using BulkLoader to try and download all of our
app's data.
We only have severals gigs of data, but it still takes eight
hours plus.
So I'm wondering if I'm being throttled on a per user basis
or if it's just something in the BulkLoader side.

JUSTIN HAUGH: I don't have that much familiarity with the
BulkLoader but I do know that it does have some performance
Because it's generally considered an offline process
so it's not something we're focusing too much thought into
making that very, very fast all the time.
But that is something we're aware of and I think--
Is there anything on the server?
Then there's nothing on the server side that's actually
intentionally throttling a user to be like, ah, you're
hitting the server too much?
That's possibly malicious and--
JUSTIN HAUGH: I don't think so.
GUIDO VAN ROSSUM: One thing you mentioned was that it's
considered an offline process.
So it may end up being treated at a lower priority.
It's also quite possible that your bulk loading is throttled
by some limitation on bandwidth on the network
between where you run the bulk loader client and our
Also the bulk loader actually has a bunch of tuning
parameters, if you haven't already looked into those I
would definitely recommend that.
You're welcome.
JUSTIN HAUGH: Any more questions?
All right, thanks everybody, and we'll be around afterward.