Google I/O 2012 - Getting the Most Out of Python 2.7 on App Engine

Uploaded by GoogleDevelopers on 29.06.2012


BRIAN QUINLAN: Hey, everyone.
My name's Brian.
Thanks, Gus.
And I'm a software engineer on the App Engine Python team.

Four months ago, we released the latest App Engine runtime,
Python 2.7.
And it's already in the last four months, it now gets 47%
more applications than the Python 2.5 runtime, which we
released four years ago.
So this talk is basically about how you get the most out
of the new Python 2.7 runtime.
So I'm kind of targeting this talk for people who know
something about Python and know
something about App Engine.
If you don't know anything about either, this talk is
going to completely go over your head.
And this is where the people who are watching this on
YouTube have the huge advantage.
They can just hit the Back button and go find something
else to do.
The people here who are sitting in the middle are
going to have to use some sort of ninja skills to sneak by
people and find a new session without
getting in people's way.

Even if this talk isn't valuable for you, I actually
have App Engine plushies to give away at the end as some
form of bribery.
BRIAN QUINLAN: You can't pre-claim them.
They'll be first come, first serve at the end.

There's actually more to the Python 2.7 runtime than just
Python 2.7 language features, which is actually why it's the
second point on the slide, not the first.
So I'm going to talk about four things.
One is our new library support for the Python 2.7.
Then I will talk a bit about just language features in
Python 2.7 that are useful for web application development,
some of the limitations that we've removed from the Python
2.7 App Engine runtime versus 2.5, and concurrent requests,
which is hard to explain in one sentence.
But it's possibly the most valuable feature that we're
adding to the Python 2.7 runtime.

So I'm going to start talking about third-party libraries.
And I'm just going to leave this slide up for a couple of
seconds, so people read that I designed this part.
And then people laughed.
I have my acknowledgement.
And we can move on.
We actually have to start a bit in the past to really
understand why this is important.
This isn't actually the Big Bang.
This is a picture of the simulation of the LHC, Large
Hadron Collider, producing Higgs bosons, because, if you
remember, this is all what we were doing in 2008.
We were basically waiting for the LHC to start and end the
entire universe.

Well, the App Engine team actually
beat them to the punch.
And we released App Engine in April of 2008 with Python 2.5
as our launch runtime.
So I don't know if you remember what the environment
was like in 2008, but it was pretty primitive.
There were Tyrannosaurus rexes chasing Triceratops around.
Well not really, but we had some pretty barbaric software.
Django hadn't hit 1.0 yet.
It's like the premiere Python web framework.
WebOb, which probably represents another half of
Python web applications that are built on top of it, it
hadn't hit 1.0 yet.
And App Engine evolved.
We made our Datastore much more reliable.
We offered an SLA.
We went out of preview.
But we were kind of stuck using these older libraries
for backwards compatibility.
We had no real way of upgrading without breaking all
of our existing applications.
And backwards compatibility is something
we take really seriously.
It also meant that we couldn't add features to some of our
own libraries for the same reason.
We couldn't make changes that we wanted to make, just
because they would have broken people.
So the solution for this, well, for the dinosaurs, it
was a big comet.
And for App Engine, it was the release of
the Python 2.7 runtime.
This was our launch cake.
It was tasty.
And let me just jump to the punch here.
So in Python 2.7 on App Engine, anyone who's done any
App Engine development should recognize this configuration
file, or at least the top, non-red part.
It's basically we're saying, we have an application that
I've creatively called myapp and then just
some data for it.
It handles every URL pattern with a script called "main."
And the new part is, we have an explicit libraries
declaration at the end.
So this means my application is dependent on Django.
And I really need version 1.2.
I'm explicitly depending on version 1.2.
So version 1.2 of Django is not the latest version that we
support on App Engine.
But the person who wrote this application doesn't have to
care, because they've explicitly stated, this is my
And they can upgrade to a newer version later, whenever
they feel like it.
It's under their control.
So this basically solves the problem of us not being able
to provide newer software due to reverse
compatibility problems.
You just explicitly state what third-party modules you're
dependent on.
So let me make this more concrete by showing you a
real, Hello World web application.
So this is pulled out of the tutorial for Python 2.5.
So you can see we start by importing a very simple
framework that we wrote, called webapp.
And this is actually the framework I was saying that we
couldn't upgrade, or we had problems upgrading for
backwards compatibility reason.
We define a very simple handler that, in response to
get requests, just outputs "Hello World!".
We make an application object.
We define a main function that runs the application object.
And then we do the standard Python, if name equals, equals
main, run the main function.

It's simple.
And a lot of this is boilerplate.
So if you look at the Python--
So I have a terrible memory.
So I have audience plants to remind me of things.
So the reminder here was this code will work absolutely fine
in the Python 2.7 runtime.
You can just take this Python 2.5 application, and it will
run perfectly in 2.7.
But this is, if you were writing new code, here was how
you would write it idiomatically in Python 2.7.
So you can see I've replaced the import
of webapp with webapp2.
So webapp2 is an open source, lightweight web framework that
was strongly inspired by Google's webapp Framework.
But the fact that it's versioned like the other
third-party like I showed you before-- so that means it
doesn't have to be strictly compatible with the releases.
You can say I am dependent on an exact version.
So in fact, it's so compatible that the only thing I had to
change to make the example work is I changed webapp to
webapp2 everywhere.
And you also might have noticed that the example is a
bit smaller.
That's because all of the boilerplate that came at the
end is no longer necessary in the App
Engine Python 2.7 runtime.
This is because Python 2.7 can natively speak WSGI, also
called "Wiz-gee." Well, the Python 2.5 runtime
could only use CGI.
So If you don't know what any of those mean, it doesn't
really matter.
Think of it this way, five lines of boilerplate are gone.

And here's what the configuration would look for
Python 2.5.
I have defined my application, version 1.
I'm using Python.
This first start is completely standard.
It will be in every application in the end.
I say every URL should be handled by
my Hello World script.
And in 2.7, it looks very similar.
The runtime has changed from Python to Python 2.7.
I have to add this explicit thread, safety declaration.
Right now, I'm just setting it to false, which is kind of the
safe option.
And I'll talk a lot about how you can set that to true at
the end of the talk.
The handler, the script changed from to
I'll talk about why you make that change later as well.
And finally, I say I'm dependent on the third-party
library webapp.2.
And I want to use the latest version, which is 2.5.1.

OK, but no one is going to make a billion dollars with
creating a Hello World application.
So this is my Google+, some photos.
Here's me hiking with some friends, holding a big rock.
And yeah, it's interesting.
And here, this is an App Engine application I wrote.
And I just pasted the URL for that picture at the end.
And here's a picture of me hiking with friends as viewed
by The Predator two seconds before it kills us all.
Has everyone seen the movie Predator?
Oh, not everyone has their hand up.
It had great one-liners like, "Get to the chopper," and "If
it bleeds, we can kill it."
So anyway, I'm thinking about calling my
new start-up Predigram.
I think we can probably flip it for about $1 billion in a
year or so.
And so, if you have maybe $50 million of VC
money, give me a call.
And unlike most developers, I will show you the code for our
start-up before you even have to invest.
But looking at the code does commit you to investing, so
leave the room if you're-- ah, whatever.
So if you look at the first line of this example-- this is
a complete example, by the way, the entire application,
21 lines of code that you can see I
squashed the imports together.
So the first line is just imports
of core Python libraries.
And it's the next line that's actually a bit more
I'm importing NumPy, which is a Python numeric library and
PIL, which is an imaging library.
So NumPy was the third most requested third-party library
for App Engine.
And PIL was the number one.
So both of those are now supported in Python 2.7.
And I'm not going to go into too much detail about how you
would actually do this kind of image transformation.
The basic idea is I don't care what the original colors in
the image are, I'm going to replace them by my own color
palette, which I defined in the beginning.
I'm using NumPy in a very trivial way when I write 4H
and NumPy linspace.
That's basically I want a linear interpolation between 0
and 1 and 256 steps, so I build a 256-color palette.
And then, if you look in the get handler for the request, I
take a URL, which is all the stuff that comes after image
and the URL scheme.
I load it using the Python URL lib.
I open the image I've downloaded, convert it into
grayscale, replace the palette, convert it into PNG
and output it.
And that 21 lines of code can be yours for $50 million.
And the configuration--
this is what the configuration looks like.
It's basically the same as what we saw before, the same
initial five lines to define the application.
Now I'm handling image slash in my image space.
My script name changed a bit.
And I'm dependent on NumPy, PIL.
And I'm still using webapp2.

OK, so those are some neat libraries I touched on.
And webapp2 is useful, but less neat.
Here is the full set of third-party libraries that we
support explicitly in the Python 2.7 runtime.
I'm not going to talk about them all, but I'll point out a
few others.
So Jinja2 is a templating library.
In the Python 2.7 runtime, we had our own simple templating
system called webapp.template, which was a very simple facade
over Django.
And our use of Django was kind of sketchy.
It wasn't thread-safe.
It had some problems.
So we're actually recommending people who like Django
templates and Django configuration to just use
Django templates directly.
And if they want a very compatible solution without
having to do Django
configuration, just use Jinja2.
lxml I should probably mention, because it was the
second most requested third-party library for App
Engine Python.
I only didn't create a demo for it, because it's very hard
to make an interesting demo that involves XML parsing.
But please contact me if you can think of a way.
And then the rest is just--
there's more, but I'm not going to get into them all.
So I will mention that this is early days.
We released this runtime four months ago.
So this is the set four months after release.
This list will grow over time, or you can expect it
to grow over time.
And we are responsive to people filing feature requests
on our external issue tracker.
So you can see we package the first most-requested module,
the second most-requested module, the
third requested module.
You can kind of see there's probably a pattern here.
Yeah, I think that's it.
Move to the--
Audience plants, very important.
So I also mentioned that these are the libraries that we
provide explicit support for you.
Python says it's kind of a batteries included language.
We throw in some more batteries that are included to
make it easier to do web development.
But if there's another battery you want, like your own web
application framework like Flask or Bottle or whatever,
then it's no problem.
You just download that yourself.
And it will get uploaded with your application.
Thank you, audience plant.
OK, cool.
This is my favorite section.
There's things in Python 2.7 itself that are new that are
useful for web development.
And the great thing about this is, basically, I can take
credit for work that other people did.

So-- oh, more demo.
So this is my value add.
Having a billion dollar company is cool, but
I want a bit more.
So I think this makes our company worth $1.5 billion.
So before, I had to paste a URL, and it would generate the
predator view of the image.
Now I can just take Google+ IDs and go to this different
application, called Predator +, and just paste the ID here.
And it'll show you a Google+ profile using Predator Vision.
So I think this will allow us to make our own Google+ clone
that provides the high-contrast colors that the
discriminating predator wants.
And we can completely corner the extraterrestrial market
for Google+ or, at least, homicidal extraterrestrials.
Anyway, value add.
Call me. $50 million gets you in.
So here's the app that adds that little feature.
Google+ has a JSON API.
Or sorry, it has a high-level Python API and also just a
I am using the JSON API because, basically, my example
would suck if I didn't, because I'm trying to
demonstrate JSON.
So you access this profile URL, filling in, you can see I
have user ID there.
You fill in the user ID with the user's actual ID, that
number you saw.
And then let's look at the actual code.
So in my get request, instead of taking a URL, I'm
interpreting it as a user ID.
If we read this code from the inside out, I'm taking the
profile URL and formatting it using the passed-in user ID.
So Format is a feature that wasn't
available in Python 2.5.
The next thing I'm doing is I'm using a URL Open to load
the Google+ profile.
So there's a Timeout Argument that is now available in
Python 2.7 for URL Open.
And that seems like a really small thing, but it's actually
a huge thing for web application development.
If you are using a Twitter API or Facebook or whatever,
you're interacting with any other service when you're
building your web application, it sucks if it's taking them
10 seconds to handle requests, if you are blocked for those
10 seconds waiting for them to come up with an answer.
It's probably preferable to say, look, I'll wait, in this
case, 200 milliseconds.
And if I don't get an answer from them, then I'll just
serve some sort of degraded response.
Well, in this particular case, it's very degraded.
I redirect them to a static image.
But you can probably do something smarter than that.
So I'm using this Timeout thing.
And finally, I'm using this new C implemented JSON module
that's available in Python 2.7 that
parses the Google+ profile.
There is the code.
By the way, this is a programming demo hat trick,
three new features in three lines of code.
And they're consecutive lines of code, so it's actually a
natural hat trick.
No one cares about hockey and-- anyway.

So finally, down here, you can see I build the URL just by
extracting the image from this.
The profile is a Python dictionary now.
I look up the image key, the URL key.
I split off all these query arguments that
I don't care about.
And I've got a URL.
And then I just proceed, like you saw in
the previous example.
There's good docs for the Google+ API, if you're
interested in how you would do something like this yourself.

OK, I'm kind of hurt, actually.
You all laughed at my VC thing, but no one actually
offered money.
So I'm going to go into banking, because banking is
where the money's at.
Sorry, that was bad.
But I'm going to highlight a few little things here.
So I'm trying to model a banking account system here.

You can see that the first thing that's kind of new, like
in the last couple of months new, is I'm using a new module
called NDB.
So NDB is the new, next-generation Python
Datastore framework.
It has some great features that are really hard to see,
so I'll just tell you about them.
So one feature is it does implicit caching.
So when you're creating items or looking them up, it'll
cache them locally in local memory.

It has another level of caching
where it will use Memcache.
And then, finally, it will use the Datastore
as persistent storage.
And that's done all behind the scenes.
You don't have to worry about how that actually works.
Another thing is it can be used asynchronously, which I
guess I could show, but I won't.
And I will show one feature, because Guido van Rossum wrote
this, and he told me I had to.
And it's called Structured Properties.
So you can see I have an Account class.
And the second thing I have to find, the most important
thing, the balance, how much cash you have.
But above that, I've said there's account_holder.
And the account_holder has a Structured
Property, which is a contact.
And in this bit of code, I don't have to define exactly
what a contact is.
That can be another NDB model defined elsewhere, in this
case, another module that defines
exactly what's in a contact.
And that can be reused across different NDB entities.
So this is a huge boon for people who are doing
complicated data modeling.
It means that they don't have to have--
you don't need one entity that defines 50,000 properties in
it, like everything that you would need to
have a real bank account.
It means you can abstract away content
into different modules.
OK, but here's where the important stuff happens.
How do you transfer money?
So ignore that decorator on the next line, the NDB
transactional thing.
So here's how our function works.
So we're going to take account_ID, where we're
transferring the money from, another account_ID that we're
transferring to, and an amount.
That's probably fairly straightforward.
So we're going to actually, given the ID, we're going to
look up the two accounts, the from_account and the
And then we're going to just to update the balances.
So from_account gets the amount deducted from it,
to_account has the balance added to it.
If the from_account balance is negative, then you get an
insufficient cash exception.
Otherwise, both of the entities get saved.
Now let's look back at the decorator, NDB
transactional xg=True.
So what this means-- and this is a fairly new feature in App
Engine-- it means I'm defining a transaction.
So when you're calling this function, this is done in a
Either both accounts will be updated with the semantics I
defined here, or neither will be.
And this is guaranteed.
So until recently, you could only guarantee modifications
to one entity group in a transaction.
Now you can do several.
This is a huge win for writing this kind of apps where you
have to guarantee that modifications are made to
several entities at once.
OK, I've written this code, but who knows if it actually
works or not.
So one thing that a lot of people don't take good
advantage of in App Engine is we have a really sweet testing
framework called Testbed.

And it basically allows you to write unit tests, but using
the simulated App Engine framework, so you don't have
to do complicated mocking.
So let me skip over, basically, all of the imports,
the set up and tear down.
This is boilerplate that you can get just by looking at the
Testbed documentation.
And let's look at this test insufficient cache.
So basically, I create and store two bank accounts, one
for the VC, which has $1 million in it, and one for me,
which has $200 in it.
And then this next line is kind of interesting.
This is a Python 2.7 testing feature.
It means the following code, like the code within this with
block, I expect it to raise the
insufficient cash exception.
If it raises that exception, it's working as intended.
And if it doesn't raise that exception, it's
not working as intended.
And then I transfer the money from the VC
account to Brian account.
But I want $5 million, not $1 million,
which is all they have.
So it all fails.
So this assertRaises is new in Python 2.7.
But this is like a tiny tip of what's new for testing in 2.7.
Python 2.7 actually added over 20 new test methods.
They rethought how the framework fits together.
There's a lot more places where you can
inject your test logic.
So I would, if you're using Python 2.7, look at the
release notes for it and look at all the new testing things.
It's very easy to let testing features slip by, unless
you're really keeping up.
There's a bunch of other stuff that's added to Python 2.7.
So I asked some Python people what they
thought was most important.
I started by asking myself.
And I thought this was the most important thing, the
datetime, timedelta class gained a total seconds method.
I added this myself.
I think this will revolutionize Python
It replaces the one-liner that you had to use before.

I asked some other people, some core Python developers.
I got this really excited email from Guido.
It goes, Python 2.7 advantages, class decorators!
Very exciting for him.
Alexander Motelli, author, Python luminary,
"Multi-context with" was his thing.
Brett Cannon, huge contributor, "Dict and set
comprehensions." So what we really learned here is that
core Python developers like really esoteric features.

But I have some other real users.
This conversation's going on on my Google+.
So here's what some people think.
"There was a big revamp to itertools.
A lot of powerful features there." "The collections
library has a lot of new classes in it, like Counter
and OrderDict." "You have set literals now." And someone
pointed out that the docs are much better than
they were for 2.5.

OK, done taking credit time.
So next, I'm going to talk about some of the limitations
we've removed from Python 2.7 versus Python 2.5.

So most of them are fairly esoteric, but this is kind of
an interesting one.
Who here knows the Python timeit module?
Oh, good.
I actually think that's probably
the best Python module.
And I looked up who wrote it, because I was going to send
them flowers or something.
But it turned out it was Guido.
And I think he's got enough credit for
the language already.
So it basically allows you to run code many times and
generate timings for it.
So the code I'm testing is code =.
And then I have this Pickle dumps line.
So what that's doing is it's taking a Python data
structure, serializing it, and writing it into a string I/O.
And setup, the block above it, defines the setup that's used
before that code is run.
Then I define the timer, which takes the code in the setup.
And I basically run this 5,000 times.
I do 10 trials.
And I take the fastest one.
And so I did this.
And I ran it in Python 2.5.
And I ran it in Python 2.7.
And the Python 2.5 output was 3.33, meaning that it took
3.33 seconds to execute this benchmark.
And then I ran it again in Python 2.7, which took 0.45
seconds to run this benchmark.
So you can see--
Yeah, be excited.
So you can see, in my previous examples, like when I was
doing PIL image manipulation, I was using cStringIO, a very
commonly used Python class.

cPickle is a lot less commonly used, at least explicitly.
But we use it internally in App Engine for things like
Memcache, when you're storing data in Memcache and also when
you're doing things like if you're using tasks to defer,
things like that.
So even if you don't use Pickle directly, you can
expect your App Engine apps that indirectly use it, their
performance to improve.
Most of the rest of this stuff is a bit esoteric, so I'll
just skim through it.
So we have some new modules, cPickle and cStringIO.
Before, in Python 2.5, they were just aliased to their
slower pure Python versions.
We have imp.
Please don't use that.
It's kind of a dangerous module to use, unless you
really know what you're doing.
Marshall, also fairly esoteric.
We support bytecode modification on Python 2.7.
So if you want to add go tos to Python, you can do that.
Actually, I can't think of a non-esoteric use for bytecode
modification either.
And finally, we have direct support for compiled Python
files, so you don't have to upload your
source in Python 2.7.
You can just upload the compiled data.
Probably, you don't want to do that most of the time.
But if you have a vendor who's only providing you compiled
code, then you can just use it without needing the source.
Or you can just re-negotiate with your vendor,
which is what I do.
Finally, concurrent requests.

I like my tagline.
It's basically saving you money while
making your app faster.
So we need a bit of background here.
Basically, what concurrent requests do is they reduce the
number of instances that your application needs.
So an instance is basically a copy of the Python interpreter
that's used to handle your requests.
So this is from the App Engine Admin Console.
I've clicked on the Instances tab.
And it's showing me here that I have seven instances.
And it's giving me some data about them.
So for many applications, the biggest charge that you're
going to get for App Engine is going to be for the instances
that you're using.
So keeping your instance usage low is potentially a big win.

So this is just kind of a pro tip.
So if you're in your App Engine
dashboard, there's a chart.
And there's a little pop-up there.
And buried in there is an item called Instances, which shows,
broken down into 6, 12, 24, et cetera, time chunks, how many
instances you're using, how many you're getting charged
for and how many there are total.
So App Engine will sometimes keep some instances around and
not charge you for that, just to handle extra load.
But you shouldn't count on that.
That's just a thing we do, if we have spare capacity.
And so you can see how your instance charges are
changing over time.

OK, so let me explain the actual feature.
So this is the way requests are handled at a 50,000-foot
view without concurrent requests.
So let's say you have an application and you have one
instance, so one copy of Python, that's not currently
handling a request.
So a request will come in for your application.
And App Engine will go, oh, I have a free instance.
And it will forward the request to the instance.
That's probably what you expected would happen.
So if you get another request that comes in and that one
instance, which is the only instance you have in this
example, is already busy handling the first request,
then App Engine has to create a second instance, wait til
that's created, and then forward that request there.
So if you have two concurrent requests, it basically means
you need two instances to handle them.
So looking at the math, very simply, if you have 10
requests per second, like every second you're getting 10
requests, and each request takes one second, then you'd
need 10 instances to handle the load, if I did the math
right, which I think I did.
So with concurrent requests, it changes a bit.
So your app receives a request.
And the request is forwarded to the instance.
And if you get another request, the request is
forwarded to the same instance, which then creates a
thread and handles the request in that thread.
So this means, in this diagram, we're serving twice
the traffic with a single--
I'm sorry, we're serving the same traffic, but with half
the instances.
So the obvious question here is, will this make my
application slower?
Because I have the same CPU resources for one instance,
but I'm now sending two requests for
it instead of one.
And the answer, of course, is yes, this will make the
request handling slower.
I'm trying not to look at the App Engine people who might be
glaring at me now.
But the slowdown is probably much smaller than you expect.
And for, basically, the rest of the talk, I'll walk through
an example where I load test a real application--
and by real, I mean my Predator application--
and see how it behaves.
So here's a reminder of, basically, what the Google+
variation of the Predator application does.
I removed some of the exception handling, because
cool people don't do exception handling, and it makes the
code too big.
And I've put comments that break the code
out into three stages.
So first, we load the Google+ profile.
And basically, when we're loading the Google+ profile,
we're not using the CPU.
We're just waiting for a network to get back to
us with some data.
Then we load the image, the profile image.
And once again, we're just waiting for IO there.
And finally, we use a ton of CPU to decode the JPEG,
convert to grayscale, do a palette substitution, and then
re-compress as PNG.
So if you look at this over time, I did timing.
And of course, these timings lined up perfectly on 100
millisecond boundaries.
It always takes exactly 100
milliseconds to load the profile.
It takes 300 milliseconds to load the image.
And then it takes 100 milliseconds, exactly, every
single time, to transform the image.
OK, I totally made these numbers up.
No, I didn't make them up, but they're obviously
approximations, based on many runs.
And I've rounded to the nearest 100 milliseconds.
Now, if we look at this program flow versus CPU usage,
we can see we start by basically using no CPU.
We're just waiting for a URL to be
downloaded from the internet.
And that continues the same for when we're loading the
profile image.
We're just waiting for the image to come
back from the network.
And finally, we have a big chunk of CPU usage while we're
transforming the image.
So in this particular case, we can see that we're spending,
basically, one fifth of our request time using the CPU.
And the rest, it's idle.
So with that concurrent request, that CPU time is
basically wasted.
There's just nothing for the CPU to do while it's doing
this downloading.

So that's the theory.
Now let's load test it.
So I don't want to actually do URL fetching in my load test,
because the performance is too variable.
Downloading stuff from the internet takes a variable
amount of time.
So I've cleverly replaced the downloading with sleep
So I sleep 100 milliseconds to simulate loading the Google+
profile and another 300 milliseconds to simulate
loading the image.
And I put a picture of a rock in the application directory.
And that's the one I transform.
So I upload this application.
And then I use a tool called jMeter.
So has anyone used jMeter?
Oh, wow, lots of people.
So jMeter is basically a load testing tool.
And basically what it does is you tell it, create a number
of fake clients that hit a particular URL.
And it simulates hitting that URL.
And as soon as it's loading, hit refresh, wait til it loads
again and keep hitting refresh.
It's basically simulating a person who
keeps hitting refresh.
And in this particular example, I'm simulating 25
people hitting refresh continuously on my Predator
So at the top, I've copied out the data that I got here.
So in this case, it's done 790 requests, so it's giving it a
throughput of 1,500 requests per minute.
And the average and median numbers you see on the right
are latency.
So the average request is taking 950 milliseconds to be
responded to.
So I actually did this load test.
And I compiled a table with my results.
So if you look at the top row, this is just the default
configuration for App Engine use.
It's what's called F1 instances.
There are 600 megahertz virtual CPUs.
And I generated the timing's for that without concurrent
requests enabled.
So the latency that I measured was, basically, one second, so
1,015 milliseconds.
It created 24 instances to handle this load.
And I just normalized the dollar cost.
To handle this load, let's say it costs $1.
Now in the next line what I've done is I've done the same
load test, but turning concurrent requests on.
So you can see that latency actually increased by 30%.
Kind of sucks.
But the number of instances required to handle this
traffic went from 24 to 9, which is a big reduction.
And finally, the normalized dollar cost is $0.38.
So our latency increased by 30%, but our cost
was reduced by 60%.
So this might be a trade-off you're willing to make.
You might be saying, well, to pay 60% less money, I'm
willing to make my users wait 30% longer.
But if you don't like that trade-off, you can bump up the
instance cost from an F1 to an F2.
So an F2 is basically twice as fast as an F1.
So we get slightly different latency numbers.
We're now, instead of being 30% faster,
we're a bit less than--
sorry, it's 30% slower with concurrent requests on.
We're about 10% slower, a bit less than 10%.
We're using eight instances, instead of nine.
And our normalized dollar cost is 60-something cents.
So this is also a possible trade-off that you'd make.
Latency has only gone up a bit, but you're still saving
30% of the cache.
And then I did a final increase.
I jumped from an F2 to an F4 and reran the benchmark.
So in this particular case, latency actually fell from one
second to 0.9 seconds, about.
Number of instances went to six.
And due to a coincidence of math, the normalized dollar
cost is the same.
So you have 1/4 of the instances, but each of them is
four times more expensive.
So you end up with the same dollar cost.
So you can see it, concurrent requests, in this scenario,
are a win on, basically, every level.
If you turn them on and switch to an F4 instance, you would
get reduced latency, so a better user
experience at the same cost.
Or you could choose to give the user a slightly degraded
user experience from a latency point of view and save
yourself some money.

And I will point out that this workload is actually pretty
bad from App Engine's point of view.
If you think about what these requests are doing, they're
basically idling the CPU.
And then they're doing a huge CPU spike.
So that's a very tough workload for a scheduler to
deal with, because it's like, oh, this
instance, it's doing nothing.
I can start packing on requests.
And then suddenly, there's a huge CPU spike that actually
lasts for quite a long time.
So there are other workloads that have a more even mix of
CPU and IO.
And by IO, I don't mean just URL fetch.
I mean Datastore, Memcache.
Anything where the CPU is not used will generate a better
trade-off than the one you see here.
So I've clearly convinced you that concurrent requests are
awesome and that you should use them.
So how do you actually do it?
So the first thing you have to do is make sure that your code
is actually thread safe.
And thread safety is this huge thing that I don't want to
spend too much time talking about.
But basically, the easiest way to get yourself into trouble
and make your code not thread safe is to, basically, change
global variables without using some sort of lock.
So has anyone figured out what the problem with
this code is yet?
So I've cleverly decided to add a cache of the images I've
already transformed before.
So basically, what I'm going to do is, when a URL comes in,
I'm going to generate the Predator image.
And then I'm going to save it in this cache.

So if the URL is already in the cache, which you can see
in the second line in the get handler, then I'm going to
just output the cached version.
Otherwise, I'm going to compute the image as
normal in the end.
I don't want the cache to grow forever.
So if there is greater than 10 items in it, I'm just going to
randomly throw an item away.
And then I'm going to store the image I just
generated in the cache.
So the disaster here is I can check to see if the image is
in the cache while a concurrent request runs along.
And it removes that item from the cache.
And then, when I get to the next line where actually I'm
writing the cached item, then I blow up, because
the value's not there.
So this isn't a comprehensive examination of thread safety,
but this is the kind of pattern that will screw people
up where they're mutating a global variable without having
some sort of lock.
But I'm sure it'll be easy for you to fix that, if you have
problems like this.
So just do that and then go to step number two.
So step number two is you need to define a
WSGI application object.
Probably, existing App Engine apps will, pretty much,
already be doing that.
In all my examples, I've been defining a webapp2 WSGI
application, but you can use any WSGI application you want,
Django, Bottle, Flask.
WSGI is the dominant web application standard for
Python, so you're going to have no problems using
alternatives, if you want.
Then you basically take your app.yaml file and you take the
name of your module, put it in the script field, put a period
and then the application name and you're done.
Surprisingly, this is not the easiest step.
This is the easiest step.
You just take that thread safe false I showed you before, set
it to true, and you have concurrent requests enabled.
And you're all done.
Speaking of all done, I'm all done, except for just let me
summarize what we've talked about.
So with the Python 2.7 runtime, you get a much nicer
third-party library experience.
We support a bunch of libraries that people have
been really asking for.
And we support it in a way that's basically future-proof,
where we have an upgrade path for them.
There's some Python 2.7 features that are really
useful for web application development.
And there's some other features that are really
esoteric, but core Python developers seem to like them.
We have fewer limitations in the runtime.
And finally, we support concurrent requests, which
have the potential to save you some serious money and/or make
your apps faster for your users.

So thanks.
Thanks for coming.
BRIAN QUINLAN: Thanks, for Wikipedia, for
the dinosaur pictures.
Thanks to the Python community for making Python.
We couldn't really do App Engine without it.
And thanks for all you App Engine people,
for making cool apps.
AUDIENCE: Questions?
BRIAN QUINLAN: Absolutely.
Could you go up to the mics, just so this is recorded?
AUDIENCE: So I have a question.
Do you plan or is there any plan to support additional
runtimes, namely PyPy?
BRIAN QUINLAN: That's a great question that I won't answer.

We're thinking about new runtimes all the time, but we
can't really talk about whether we're actually going
to support them or not.
Sorry about that.
You can bug our product management folks-- for
example, there's one right there-- and try and beat some
sort of answer out of him, if you want.
AUDIENCE: A second quick question.
AUDIENCE: Can you use duck tests with
the unit test framework?
BRIAN QUINLAN: I haven't tried it, but I can't see why that
wouldn't work.
AUDIENCE: OK, thank you.
BRIAN QUINLAN: No worries.
Maybe alternate mics?
AUDIENCE: So you talked about concurrent requests.
One of the common requirements was to reduce the cost by
having concurrent requests.
AUDIENCE: The other most important thing is the
functionality that comes with concurrent requests,
especially with stateful applications.
So earlier, you never had to go to the same server.
It always goes to a different server.
AUDIENCE: But if you have to have a server with data
already in memory and you have to go back to the same server,
that's not possible with App Engine.
Now, with concurrent requests, that's possible.
But then it's not possible with App Engine.
BRIAN QUINLAN: So it's not possible in the model you're
thinking about.
So App Engine will send concurrent requests to one
instance, but it won't do that forever.
So say you have, for example, a million concurrent requests.
It's not going to send a million requests at one Python
It will start ramping up new ones as the
instances get saturated.
So you won't have shared memory, potentially, across
those shared requests.
Now we have another thing called Servers.
I don't know if you've looked into that.
AUDIENCE: Backends?
AUDIENCE: You mean backends?
BRIAN QUINLAN: Sorry, backends.
Sorry, my mistake, backends.
And they basically allow you to have some fixed number of
backends with, potentially, a large amount of memory to deal
with this kind of thing.
BRIAN QUINLAN: But actually, maybe you could come by the
booth afterwards.
And we could talk about this.
I'm curious what your use case is.
So could you wait a second?
[INAUDIBLE] will switch sides.
I have a question.
AUDIENCE: In your code you show that you were using the
webapp to HTTP request handler.
AUDIENCE: In order to use concurrent requests, do you
need to use those handlers?
BRIAN QUINLAN: Absolutely not.
You can use any thread safe WSGI application framework.
So you can use webapp2, Django, Bottle, PIL, whatever.
AUDIENCE: And I have a side question to that.
AUDIENCE: In the webapp2, you define a get method.
Do those methods map to the request type?
AUDIENCE: Like as opposed--
BRIAN QUINLAN: Yeah, sorry, they do.
So if you want to handle get requests, you define a method
called get.
If you want to handle post, you define a method called
post, push for push, et cetera, et cetera.
AUDIENCE: OK, so it's like a magical wrap kind of thing?
BRIAN QUINLAN: Yeah, yeah, but if there's a framework that
you're more comfortable with, just use it.

BRIAN QUINLAN: Not that webapp2 is bad
or anything, but--
AUDIENCE: So how does the saturation of an instance, how
is that determined?
What I'm specifically wondering is, if you have
requests that are really variable with their latency--
so in your benchmark example, they were all uniform latency
per request--
but if that's highly variable, say you have some that are a
couple of hundred milliseconds, some that are
over a second, how is that going to play into threading?
BRIAN QUINLAN: So it's hard to give you an answer that's not
going to get dated, because we want to be able to make
scheduled improvements without committing ourselves to a
particular implementation strategy.
But right now, we're basically sampling the CPU usage, back
by, I think, a couple hundred milliseconds.
And we're using that to judge whether the instance can
handle more requests.
So this is why this is pathological, because it
doesn't really have a long-term history of how
requests have behaved over time.
AUDIENCE: Mm-hm, I see.
BRIAN QUINLAN: Well, was that sufficiently precise?
Yeah, yeah, yeah.
I think that makes sense.
That's what I was wondering about.

AUDIENCE: You mentioned the new NDB module.
AUDIENCE: And you talked about a new property type there.
AUDIENCE: What were the differences between that and
the reference property, in short?
BRIAN QUINLAN: So a reference property creates
two separate entities.
So it's basically, you have two
separate entities in Datastore.
And a reference property joins them up.
What a structured property does is, basically, it puts
them into the same logical entity.
But it's an organizational thing, not a
Datastore level thing.
So it helps you with data modeling, but it doesn't
actually change the entity abstraction.
I felt I explained that really poorly.
Feel free to ask a question that gets me to clarify what I
meant there.
AUDIENCE: No, that's clear enough.
Thank you.
BRIAN QUINLAN: OK, thank you.
AUDIENCE: So structured properties and indexing,
what's the impact on the number of Datastore writes
per-- yeah.
BRIAN QUINLAN: It essentially doesn't change anything.
So if you have an indexed structured property--
sorry, let me phrase it in another way.
Whether a property appears as a structured property or in
the top level of a model, it has the same cost from an
indexing point of view.
You have another question.
AUDIENCE: But is it going to create an index for each
property within this structure?
BRIAN QUINLAN: If the properties within the
structured property are defined to be indexed, then it
will create an index for each of those fields.
So basically, if you imagine a structured property, like you
have a model and you have a group of code that's logically
combinable, like an address, and you want to have a line
one, line two zip code, whatever, if you extract that
out and put it in a structured property, it doesn't change
the indexing behavior at all.
I hope that helped.
BRIAN QUINLAN: Sorry, sorry, structured property,
structured property.
Did I say referenced property?
BRIAN QUINLAN: Well, no, because now you just have an--
sorry, the question was, does referenced property change the
indexing characteristics?
And the answer is, with referenced property, you have
another Datastore entity that represents the
other set of data.
And that other entity will still get indexed.
So no, no, it won't change the indexing characteristics.

AUDIENCE: So it might be a while since I looked at this,
but is it possible now, with a third-party library, to have
one that connects to an external service using a
custom TCP protocol?
BRIAN QUINLAN: No, it's not.
AUDIENCE: And is it ever going to be possible?
BRIAN QUINLAN: Ever going to be possible is such a hard
question to answer.
AUDIENCE: Well, is there any plans to make that possible?
BRIAN QUINLAN: Oh, it is on the road map, so yes.
AUDIENCE: Thank you.
BRIAN QUINLAN: Sorry, I didn't know if it was public or not.
My mistake.
Thanks for the question.
It allowed us to highlight a new feature
or an upcoming feature.
Yeah, sorry.
AUDIENCE: I understand that we have to pay for bandwidth when
we're talking between App Engine instances.
Is that going to be free at some point?
BRIAN QUINLAN: I have no information on
pricing changes there.
But I suspect not, because a PM is nodding his head.

But once again, he is available for your
haranguement or other entertainment, if you want to
have a go at him.
Anything else?
Or should I start giving away stuff?
AUDIENCE: I want you to start giving away stuff, but are you
still working on startup time at all, to reduce that a
little bit?
BRIAN QUINLAN: I am, personally, not working on
startup time.
I'm trying to think if anyone's doing anything to
reduce startup time right now.
I don't think so, actually.
But feel free to talk to me, if you're hitting some case
where startup time is slow.


AUDIENCE: Throw it.
You want me to throw them?
Oh, really?
BRIAN QUINLAN: No, no, no, no, I like the throw it into the
crowd strategy.
Ah, it seems so unsafe.
AUDIENCE: I want one!
AUDIENCE: Give me one for my boss.
He would kill me, if I come home without one.
BRIAN QUINLAN: Well, if you have some sort of death
threat, then--

could you go stand with that group and take your chances?

BRIAN QUINLAN: Oh, sorry, it wasn't for--
AUDIENCE: I think we're out, Brian.
Sorry, guys.