Make the Web Fast: Automagic site optimization with mod_pagespeed 1.0!


Uploaded by GoogleDevelopers on 04.10.2012

Transcript:

ILYA GRIGORIK: Hello everyone, and welcome to our "Make the
Web Fast" series here on Google Developers Live.
Today we'll be talking about a very awesome tool called
mod_pagespeed, which is a performance
JIT for your website.
So first off, my name is Ilya Grigorik.
I work with the "Make the Web Fast" team.
And with me today--
JOSHUA MARANTZ: Hi.
I'm Marinates And I work on the mod_pagespeed team.
ILYA GRIGORIK: Awesome.
So Josh, I heard through the rumor mill that there is an
upcoming 1.0 release for mod_pagespeed.
JOSHUA MARANTZ: That's correct.
The rumors are correct.
We are releasing mod_pagespeed 1.0 next week.
After two years of beta it's ready for broader adoption.
ILYA GRIGORIK: Wow.
So true to Google philosophy of keeping everything in beta,
that's two years and quite a few users, I
think, as well, right?
So it'll be really interesting to dive in and understand
what's happening underneath, because I spent some time
looking at what you guys have built, certainly
worked with the team.
And the first thing that stands out to me--
I guess we have some slides that I've prepared--
is that if you care about performance, there's a lot of
stuff that you need to care about.
In fact, my officemate, Steve Souders, wrote a couple books
on the subject.
And there is stuff like image compression, combining your
resources, and deferring JavaScript, cache extensions.
Frankly, it's a full-time job to keep up with all the things
that you need to do to keep your website fast.
JOSHUA MARANTZ: Yeah.
We believe that if you're building a website, you're
trying to convey information.
You're trying to sell a product, gain acceptance of an
idea, communicate something.
You're not trying to hack your website to make it faster.
So the less you have to focus on that, the better.
Now, it is important that you make websites fast because you
get more engagement, happier users.
People will come back.
But you don't want to spend all your time doing that.
ILYA GRIGORIK: Right.
And that's actually a very good point, very important
point, which is of course we care about speed.
But speed in itself can be almost a full-time job,
because even these best practices change over time.
The browsers get smarter.
We get more video on the web, more images on the web.
So these things shift, and you need to keep up with it.
And not only that, but some techniques, like let's say
spriting images together or inlining resources on the
page, are actually pretty heavy in terms of required
work, either for the design team or for the dev team.
So it adds additional complexity to
your development cycle.
So it's not all just easy wins along the way.
JOSHUA MARANTZ: And the other aspect is that sometimes it's
hard to draw a balance between being able to keep your site
up to date and keeping it cached.
For example, everybody knows who works on web performance
that the more assets on a website that
are cached, the better.
So if you go back to a typical newspaper site every day to
see what the news is, you'll typically wind up downloading
their JavaScript and their CSS, which hasn't changed--
usually has not changed--
every day, because they have to set a
fairly short cache lifetime.
I typically see a minute or an hour at the most in order so
that they can push changes out when they want to.
But they don't do it every day.
They probably do it once every few weeks.
And you want to be able to change your website on the fly
and have that propagate quickly to everybody, but also
have it so that when you don't change it, it stays in
people's caches.
So this is a complicated and messy thing to do manually.
It's a very easy thing to automatically.
ILYA GRIGORIK: Right.
And I think that is the core insight behind mod_pagespeed,
which is to say, sure, you can apply all these
optimizations yourself.
And in fact, you should know them, because they should be
best practices on your team.
But we can, in fact, automate some of this stuff.
And that's what mod_pagespeed is all about.
That's why we say it's a performance JIT, Just In Time
compiler, in your web server.
So maybe you can tell us a little bit as to what that
actually means and how you guys have gone
about doing this work.
JOSHUA MARANTZ: Sure.
There are a number of approaches to automated
website automation.
And our approach was to make it really easy to adopt.
So half the websites around the world are powered by
Apache web servers.
And so what we did was we packaged our optimization
framework as an open-source Apache module.
So you pretty much in three commands can download our
package, install it, and restart Apache, and your
website runs faster.
ILYA GRIGORIK: That's a pretty compelling pitch.
JOSHUA MARANTZ: There's then more you can do.
There's a core set of filters that we believe is very safe,
will benefit websites a lot and be very safe
to run on all websites.
And those come on when you do that process.
But then there's more that you can do if you're willing to
investigate and tune it a little bit.
But the whole idea was out of the box, really good
performance.
ILYA GRIGORIK: Right.
And safe, right?
So your website shouldn't be broken.
And I think we'll actually take a look at kind of deep in
the guts of some of the filters and how they work.
But before we even get there, one of the things that I
wanted to highlight was that I believe mod_pagespeed is
actually based on another open-source project.
JOSHUA MARANTZ: That's right.
So the way that it is structured is that we thought
Apache was a very good delivery vehicle for our
technology.
But we know it's not the only delivery vehicle for our
technology that can ever exist.
So we layered this as an optimization framework called
the PageSpeed Optimization Libraries, which is not tied
to any particular server.
It's a plug-in architecture.
And we'll get into that more a little bit later.
And then we packaged that with a connection to Apache, an
Apache gasket, if you will, so that it's just plug-and-play
and you don't have to modify the structure.
ILYA GRIGORIK: So mod_pagespeed is basically a
wrapper around PageSpeed Optimization
Libraries for Apache.
But if I want to adopt it to some other server--
maybe I've written one myself, or I'm using some other
popular server--
I could actually still reuse that same code.
JOSHUA MARANTZ: Yes.
And in fact, this is what we have done with PageSpeed
service, so that we've now deployed this on two very
different server stacks, one based on Apache, one an
internal Google one.
But we can bring the same technology in two very
different deployments.
And we are looking to expand to any server that rises in
popularity as well.
ILYA GRIGORIK: Right.
And PageSpeed Service is our hosted version of this, which
is actually still in beta, and we're still kind of
field-testing it with other customers.
But it's actually running on the same code
base, if you will.
JOSHUA MARANTZ: Yes.
ILYA GRIGORIK: Very cool.
So I think we covered some of this.
We have a 1.0 coming.
We know that it's an Apache module.
You guys have been working on it for over two years.
And you actually mentioned some of the core filters and
optional filters.
And it sounds like there's quite a few.
JOSHUA MARANTZ: Yeah.
There's a wide variety.
There's a lot of ideas.
Web performance is a topic that invites papers.
It invites conferences.
Many companies are founded around this.
And there's lots of ideas that are pouring into this.
But we try to take the ones that are most effective, that
are incredibly robust and predictable, and put them into
the core set so that the out-of-the-box experience is
really good.
And then there are a lot of other things that we are
working on, that we're validating, that we're making
sure are really solid and will make it in.
And there's others that we think will probably always be
kind of a manual configuration kind of option.
A good one of these is where we defer JavaScript.
That's a complicated thing to do and has generally amazing
effects on websites.
But it is something that you want to hand-validate, and you
don't want to just turn that on.
ILYA GRIGORIK: Right.
So I think this highlights kind of a general point, which
is to say there is a core set of filters that you should be
able to turn on or that will be turned on once you install
mod_pagespeed, and your site should just go faster.
But depending on your site, you probably want to spend
some time going through the available filters and just
seeing which ones may apply to your site.
And you'll be able to get more performance benefits out of
mod_pagespeed.
JOSHUA MARANTZ: Right.
ILYA GRIGORIK: Very cool.
So I think we're going to dive into the
details of some of these.
But I do want to touch on one point, which is we do support
2.2 and 2.4 of Apache?
JOSHUA MARANTZ: That's correct.
Apache 2.4 support came out recently.
And that's in our 1.0 release as well.
ILYA GRIGORIK: Awesome.
And you mentioned that it's just a couple of lines.
So we have, I guess, packages for Debian and RPMs that you
can just install.
JOSHUA MARANTZ: We do, although external developers
have generated packages for openSUSE and even for Windows.

It's an open-source product.
We have a build process and instructions for doing that.
And so people can put up other packages.
FreeBSD is another one has support.
ILYA GRIGORIK: Right.
So I can just build it from source, right?
JOSHUA MARANTZ: Yes.
ILYA GRIGORIK: OK.
Cool.
And then one more thing for 1.0.
I know that until recently, or until we ship 1.0, there was
one release tree, if you will, or one package.
And I think moving forward, once we release 1.0, we'll
actually have two.
JOSHUA MARANTZ: That's right.
The current release package, if you're using mod_pagespeed
today, you're on the beta channel if you
installed from binaries.
And we're going to continue to have that beta channel.
But the 1.0 release introduces a new stable channel.
And the way that this will work is that we will release
new features into beta.
And after we're really comfortable and solid with
them, then we'll update the stable channel.
And then when you update your packages and your operating
system with yum update or updating the Debian package
system, you'll upgrade based on the
channel that you've selected.
ILYA GRIGORIK: Cool.
That makes sense.
All right.
So let's dive into the guts of it.
But I think we touched on this already.
I'll just mention it briefly.
The whole point, I guess, with mod_pagespeed is to highlight
the things that you don't need to do.
So instead of having to worry about it do I need to have an
extra build process for optimizing images or
concatenating my CSS or JavaScript or all the rest,
all that is taken care of by mod_pagespeed.
And in fact, that means that I don't need to modify my
current workflow or my team's workflow to take advantage of
all these optimizations.
JOSHUA MARANTZ: That's exactly the point.
It's a drop-in solution for performing best practices for
web clients.
ILYA GRIGORIK: So the clients, or the visitors I should say,
would see the optimized resources.
I still have my original resources in my dev
environment.
And mod_pagespeed does the rest.
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: Very cool.
So you mentioned this earlier.
We have over 100,000 mod_pagespeed installs today,
since you guys announced the product.
And in fact, there is a number of partners who have already
installed it as part of their hosting infrastructure.
So for example, I know that in DreamHost or Go Daddy, you can
actually go into your control panel, click--
I think it's in these settings.
I'm not sure exactly where in the menu it is.
But I know there is a check-box.
You say, please accelerate my site.
And all of a sudden, the site goes faster.
And what happens under the hood is they enable
mod_pagespeed for your site.
JOSHUA MARANTZ: That's exactly right.
Having that check-box to just turn it on is even easier than
the three-step install process that I mentioned earlier.
ILYA GRIGORIK: So it's like the Turbo button
back in the old days.
It's like make my site fast.
Why wouldn't you turn that on?
Very cool.
So let's dive into, I guess, some of the guts.
Here's an example.
JOSHUA MARANTZ: Yeah.
So the best way to see it in action, this
is a shopping website.
And we made a video comparing a first view of that site on
Chrome with mod_pagespeed off and with mod_pagespeed on.
And this is--
ILYA GRIGORIK: So this is a side-by-side
recording of on and off.
Wow.
JOSHUA MARANTZ: Correct.
So this video was made with WebPagetest, which offers all
kinds of opportunities--
ILYA GRIGORIK: I'm going to play that again just so you
guys can see it.
So the page loads in 2.1 seconds.
And then the other one takes about five seconds, if I
scroll back here.
And you can see that the mod_pagespeed
one loads a lot faster.
Now, there could be many reasons for this, right?
I'm guessing it's a combination of filters that
come into effect to make this difference.
But it's 200% faster, or more, here.
JOSHUA MARANTZ: Yeah.
It's also important to note that even though the rendering
takes 2.1 seconds, it's actually pretty visible after
probably less than a second.
ILYA GRIGORIK: Right.
So if I just scrub back here.
So we have 1.8 seconds in, we already have the page, whereas
the other one is blank.
So that's a dramatic difference.
JOSHUA MARANTZ: So we can see a little bit of why that
happened by going to the next slide.

So those that have dabbled in web performance have seen
these waterfall diagrams that are available.
ILYA GRIGORIK: That's the bread and butter.
JOSHUA MARANTZ: They're available in Firebug and the
Chrome Developer Tools.
And these are from WebPagetest.
Again, this is from Chrome.
ILYA GRIGORIK: So this is the same website, right?
We're just looking at the waterfall charts.
JOSHUA MARANTZ: Exactly.
The same exact website.
On the left, the waterfall chart is tall, which means
there's a lot of resources.
And it's wide.
There's a lot of wide blue bars.
Those wide blue bars are big images which don't need to be
nearly that big.
And so on the right, they become a lot skinnier.
There's also a lot less bars.
So the two things that are most visible in the waterfall
diagram, in terms of the effects of mod_pagespeed
filters, are one, optimizing images.
So there's actually three ways in which
the images are optimized.
Number one, mod_pagespeed looks at the context in which
images are displayed.
Very often, images are taken from cameras at full
resolution and instantiated into very small divs or
elements in HTML, 100 by 200 or something.
And there's way more pixels being sent down to the browser
than the browser needs.
And this wastes bandwidth and it wastes CPU time on the
browser resizing.
So it's much better to resize on the server.
But who wants to do that?
Well, turn on mod_pagespeed and it happens automatically.
The other thing is that the images are typically at a much
higher quality ratio than you need for an LCD display or a
retina display.
And it is pretty straightforward to remove a
lot of the bytes of that image without
reducing any visible quality.
And the third point is that modern web browsers, including
Chrome and Opera, support a more modern format of image
called WebP, which Google released over a year ago,
which for the same quality can get you about 30% less bytes.
And so this is not something you would do manually.
But an automated tool can tune the experience, tune the HTML
that's delivered and the images that are delivered to
the browser in question.
So mod_pagespeed can take a JPEG resource and transcode it
and deliver it as WebP to Chrome and Opera, and to other
browsers deliver it as JPEG, so that it works either way.
So between all of those, we shrunk this site way down.
And actually, the waterfall diagram, that blue line
represents the onload event.
What happens after the onload event in this particular site
is third-party widgets that are loading asynchronously,
which is great.
So the site was built with that really well.
And so there's analytics running.
There's buttons from different third-party vendors that are
loading at that point.
But it doesn't block onloads, so the user is fully
interactive at the time of that blue line, which happens
way earlier than on the other site.
ILYA GRIGORIK: So it's really interesting that just by kind
of blurring your eyes, we can look at this waterfall and
just figure out what's happening just on the shape,
without even looking at the resources.
So you can say like, we optimized the images.
We probably concatenated some files, which is why it became
shorter, and a few other things.
JOSHUA MARANTZ: Yeah.
I mean, mod_pagespeed is not necessarily all that you would
ever want to do to make your site fast.
It's now running in two-point-something seconds.
So you could probably get it down to one second, because
there's a lot of kind of cascading effect here.
And there's not much parallelism,
especially after onload.
And so diving deeper into the waterfalls is something that
you might want to do if you want that next level.
But kind of without any effort at all you can get--
ILYA GRIGORIK: Yeah.
I'll take it.
Right.
2X just for turning on a flag.
I'll take it.
All right.
So now we will go under the hood of this thing.

OK.
So we'll start with a simple one.
So HTML Collapse.
This is an example filter, a very simple one but it will be
a good introduction to some of the more
interesting ones later.
JOSHUA MARANTZ: Yeah.
So this is kind of the simplest possible filter that
you can have.
And this is actually a filter that we have in
mod_pagespeed today.
It's a little bit more involved than this in reality.
But this is essentially it.
You can, as every filter can, register for interest in
various HTML events, as it were.
And as the events stream through the system, we can
say, hey, this one's interested in
a Characters node.
That's all Collapse Whitespace cares about.
And then it basically just wipes out extra spaces that
it's pretty sure can't matter.
And cases where it would matter is if it's in a pre
tag, and there's other cases as well.
ILYA GRIGORIK: Right.
So it's not as simple as it looks.
This is not like run a gsub and remove all the spaces.
You're actually parsing the HTML.
And you're saying, hey, this is inside of a pre tag or a
script tag, so the white space is significant.
But nonetheless, inside of your regular HTML markup, you
can still compress the extra white space.
JOSHUA MARANTZ: Exactly.
And this is a relatively popular filter.
It's actually not a core filter.
ILYA GRIGORIK: Interesting.
JOSHUA MARANTZ: And that's because we are a little bit
conservative.
And it is quite possible for an element to have its white
space become significant due to a JavaScript event, which
is not something that mod_pagespeed
currently looks at.
So this is a filter that's pretty safe to do.
But we leave it up to users to turn it on by default.
I've noticed a lot of users do turn this on, because it's
mostly pretty safe.
ILYA GRIGORIK: Right.
OK.
That makes sense.

So this is a more interesting one, right?
So now we're talking combining multiple CSS files.
So how does this work?
JOSHUA MARANTZ: Right.
So the basic idea here is that as HTML is streaming through
mod_pagespeed, we're parsing tags.
We're saying, hey, here's four link tags.
Let's collect all of those together, collect the contents
of those, and collect the names of the CSS files.
And we'll get into how that happens in a few minutes.
But what it is that happens is that those four link tags get
replaced with a single link tag.
ILYA GRIGORIK: Which is the one we see
on the bottom, right?
JOSHUA MARANTZ: Which is the one that we see on the bottom.
And it has the names of the original CSS files separated
by plus signs, literally.
And then there is a .pagespeed keyword, which is something
that we look for when we are serving it.
ILYA GRIGORIK: It's kind of a hint to
mod_pagespeed, if you will.
JOSHUA MARANTZ: Yeah.
And then there's a code "cc," which actually
means Combine CSS.
And then there's a HASH.
And this HASH is very important.
This HASH is the technology that lets mod_pagespeed serve
any resource with a one-year cache lifetime, because that
HASH is the MD5 sum of the optimized resource.
ILYA GRIGORIK: So it's the MD5 sum of the combined CSS files.
JOSHUA MARANTZ: Right.
So it's kind of a signature for this file.
Or you could think of it as a version of this file.
ILYA GRIGORIK: Right.
So if I modify, let's say, big.css, and I add extra white
space, and I save it, the MD5 sum would change, and you
would regenerate this resource.
JOSHUA MARANTZ: That's correct if we didn't minify that CSS
file and get rid of that white space.
ILYA GRIGORIK: OK.
Right.
So white space is a bad example.
JOSHUA MARANTZ: But if you actually change the content of
the CSS file, then we would have a different MD5 sum.
So we might have cached this one for a year.
And you might think it's stale, but it's OK because
we'll never reference it again.
ILYA GRIGORIK: Right.
OK.
And I guess maybe to backtrack a little bit, and the reason I
guess we want to do this is fetching multiple files
consumes maybe additional TCP connections.
So by combining it all together, we have one
resource, which you can fetch down faster.
And hopefully that'll lead to a faster render on the page.
JOSHUA MARANTZ: Exactly.
This is kind of the height of that waterfall
chart that you see.
If your waterfall chart, for example, doesn't fit on your
screen, you know that you have some work to do.
ILYA GRIGORIK: Right.
Yes.
That's a good rule of thumb, in general.
JOSHUA MARANTZ: The other point that I want to make is
that by providing long cache lifetimes, you make all the
caches that are in the network in between the server and the
client more effective.
You make the browser cache more effective.
You make any caching done at the ISP layer more effective.
And you make content delivery networks more effective,
because the versions of the assets that they store, they
know that they don't have to check back with the origin to
revalidate for a year.
ILYA GRIGORIK: Right.
OK.
Very interesting.
So let's take a look at the monster diagram.
So maybe you can just walk us through what happens when an
HTTP request comes up.
JOSHUA MARANTZ: Sure.
So this is the view of what happens in Apache.
Apache has a module architecture, which allows
anybody to write their own Apache module that can help
make some kind of transformation to the content
or the networking.
ILYA GRIGORIK: Some examples are, like, mod_deflate,
mod_security.
There's lots and lots of these things.
JOSHUA MARANTZ: Actually, mod_deflate is a really good
example, because what that one does is-- the most important
thing that you can do even before you run mod_pagespeed
is make sure to always compress your output.
And that is basically an output filter that just looks
at the stream of bytes that are coming through it and just
makes them smaller and adds the header to say, by the way,
I gzipped it.
ILYA GRIGORIK: This is a little bit of an aside, but
would you use mod_deflate with mod_pagespeed?
JOSHUA MARANTZ: Actually, if you have mod_pagespeed, we
will turn on mod_deflate.
So they work together.
Mod_pagespeed would be less effective if mod_deflate
wasn't there.
But they're complementary, because mod_pagespeed doesn't
attempt to gzip assets itself.
It depends on mod_deflate to do that.
But it does make them smaller in the first place.
And image compression is not really addressed by
mod_deflate as well.
So the way that an Apache module works is that it can
install into the Apache kernel an input filter, which takes
requests and mutates them in some way that's particular to
the filter.
Content generators can look at URLs and say, either I know
how to handle that one.
I'll take it over.
Nobody else needs to worry about it.
Or, that one's not for me.
I decline it.
I'll pass it on to the next one.
And they can install output filters, which just get put
into the chain of the byte stream as it goes through.
ILYA GRIGORIK: Right.
So here in this diagram, you just have the PHP handler.
So if I have a .PHP file, it would intercept that and say,
hey, that's for me.
I will generate the byte stream.
JOSHUA MARANTZ: Exactly.
So mod_pagespeed puts a handler in which looks at
those .pagespeed .resources.
ILYA GRIGORIK: So it's like a custom extension.
JOSHUA MARANTZ: Exactly.
And that's for handling resources, for handling
images, CSS, and JavaScript.
For HTML, it installs an output filter where it looks
at this stream of bytes going by.
And whenever it finds HTML, it parses it and tries to make
optimizations in it as it goes through.
So if an HTML file comes into Apache, what will typically
happen is it'll go through the input filters.
The PageSpeed resource handler will look at it.
But it won't do anything with it,
because it's not a resource.
The PHP handler, if PHP was handling those, would take the
URL and generate HTML out, which would then be sent to
mod_pagespeed's output filter, which would start looking at
HTML and deciding, based on the tags and the characters
that are parsed, whether it wants to mutate
those bytes or not.

An important thing that mod_pagespeed tries to do is
never slow down the page.
So some of the things that mod_pagespeed does are
actually compute-intensive or rely on the network.
ILYA GRIGORIK: Right.
That was actually going to be my question.
It sounds like a lot of work.
JOSHUA MARANTZ: There is definitely work going on.
So there's HTML parsing, but streaming parsers go fast, so
that's not really a problem.
But when we have to go and optimize an image-- well, we
have to fetch images, we have to optimize images-- we'll do
that in the background, typically, and also optimize
them in the background.
So we will only do the tag replacement for images if we
already had that in cache.
ILYA GRIGORIK: Interesting.
So let's say I've just started my web server.
Nobody has hit it.
And I make the first request, I would still get the original
unoptimized resource them?
JOSHUA MARANTZ: That's right.
Probably for the most part, your resources will come
through unoptimized, but Collapse
Whitespace would work.
ILYA GRIGORIK: Right.
OK.
So you would apply filters that work really fast.
And then on the second hit, you would actually serve me
the optimized content.
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: Right.
That's a very good point.
OK.
So maybe one more quick note, which is to say we talked
about PHP, but I think it's important to note that one of
the strong or popular applications for Apache is
that it can act as a proxy.
So if you have some other server running somewhere--
that can be another app server, maybe it's a Ruby app
server, Java, what have you--
and you're using mod_proxy, this still applies, right,
because it's effectively another handler?
JOSHUA MARANTZ: That's correct.
It's easy to set up mod_pagespeed as a reverse
proxy or actually as a forward proxy as well.
And that way, it can optimize content that's not necessarily
even generated within the Apache server.
ILYA GRIGORIK: So if I have a Java server running right now
serving my assets, I could actually put Apache in front,
turn on mod_pagespeed, and maybe inherit some of these
observations for free.
JOSHUA MARANTZ: Right.
That would be a reverse proxy application.
ILYA GRIGORIK: That's right.
Very cool.
So we talked a little about images.
And images are a big deal on the internet today.
Just prior to this, we were kind of talking, and we said
that over 50% of all the bandwidth on the internet is
video, which is moving pictures.
But then the second-biggest component is still images.
So you guys put a lot of work into optimizing images, in
particular.
And you already covered some examples, but this is kind of
an in-depth look at what happens.
JOSHUA MARANTZ: Yeah.
This is kind of the life of an image as it flies through
mod_pagespeed.
You're right.
A lot of the benefit of mod_pagespeed, the real wins
in terms of bandwidth usage and latency that mod_pagespeed
gets, at least in the core filter set, on first view are
from making images smaller.
And so we put a lot of effort into that.
And this is how it works at a high level.
So we install a filter called the image rewriting filter,
which scans for elements with image tags, and it looks for
the source attribute.
And the way that it works, in order to not slow down HTML
even on the first view, is it looks in a metadata cache to
see if we've seen this resource at this width and
height before.
So because we're optimizing images for the element that
they're going to be drawn into, those all go into the
key of the metadata cache, if you will.
And so when that's a hit, if we have a warm server, it
doesn't matter whether the browser cache is warm or cold,
but if the server cache is warm, then all we have to do
to deliver that optimized image is swap out that source
attribute with the one that we found in our metadata cache.
ILYA GRIGORIK: The optimized version of the image.
JOSHUA MARANTZ: Exactly.
And so if it's a miss, though, then we pretty much have to
give up on this round, because we're not going to fetch a
large image and optimize it on the fly it without
delaying the HTML.
So we spin up a machine that runs in the background--
not a physical machine, but a finite-state machine that runs
in the software--
and it goes off and it does the fetch
of the image resource.
It runs the image optimization algorithms.
And we discussed what those were before.
So we can do transcoding.
We'll do resizing.
And we'll do recompression.
ILYA GRIGORIK: Right.
And I'm guessing you guys also do stuff like removing extra
metadata, which is pretty popular in
like PNG images, right?
JOSHUA MARANTZ: Sure.
That's actually--
in the core set, we'll remove the metadata and resize, and
then it's an option to recompress.
ILYA GRIGORIK: Actually, I'll highlight the resize, because
I think this is very important.
You mentioned it, but I think it's still worth talking about
for a little bit.
So if I have an image--
say if I have an image tag that says the width of this
image is 100 pixels and the height is 100
pixels, so it's square--
but I can actually push a larger image into it.
It can be 1,000 by 1,000, which is actually not uncommon
on the internet.
Somebody takes a photo.
They resize it in whatever editor.
They upload it.
And you're actually getting the full-res image, which then
gets rescaled in the browser.
So just by providing the width and height in the markup,
mod_pagespeed will be smart enough to look at that and
say, yes, but the origin image is much bigger, so let me
rescale that and serve the proper version.
JOSHUA MARANTZ: Yeah.
I would go further to say not only is it not uncommon, it's
quite common to take images from your camera
and put them online.
ILYA GRIGORIK: So this alone saves me a lot of time,
because if I'm thinking about--
if I have a lot of images, you mentioned kind of the
newspaper use case earlier, right?
Lots of images there.
I can just define the width and height and push kind of
the resizing logic to mod_pagespeed.
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: That's very cool.
JOSHUA MARANTZ: And so we do this kind of gauntlet of image
optimizations.
And when it comes out the other side, we have a new URL
with kind of the instructions on how that got created and
coded into it.
So this image in this example-- this is on
modpagespeed.com, which has all of our examples.
On modpagespeed.com, you'll find this Puzzle.jpg is the
origin image.
That's shown in green.
The width in which it was displayed in our sample page
is 256 by 192.
ILYA GRIGORIK: Right.
So this is from the HTML markup.
JOSHUA MARANTZ: Exactly.
It was a JPG file originally, but we were
displaying it in Chrome.
And we took it and now we're going to transcode it to WebP
so that it's delivered more efficiently.
We also put into the URL the MD5 sum of this image file so
we can serve it for a long time.
And even if I change Puzzle.jpg, then it won't be a
problem with stale caches.
ILYA GRIGORIK: Right.
It's kind of a similar pattern to what we
saw with CSS earlier.
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: OK.
And I guess the WebP one is really interesting, because
this would get served-- you mentioned because this was in
Chrome, you'd get WebP.
But If I visited the same website in, let's say, Firefox
browser, which currently does not, unfortunately, support
WebP, I would still get a JPG.
JOSHUA MARANTZ: Exactly.
So as a site owner, you can make a decision, by using
mod_pagespeed, that you're going to serve images in a
modern web format that is not supported by all browsers, but
your site will still work well on all browsers.
ILYA GRIGORIK: Very cool.
So that's not even something that I could do with a build
step, right?
JOSHUA MARANTZ: Correct.
ILYA GRIGORIK: Yes.
Very nice.
JOSHUA MARANTZ: So I wanted to dive into a little bit of what
the PageSpeed Optimization Library is.
ILYA GRIGORIK: So this is the part that powers
modpagespeed.com, right?
JOSHUA MARANTZ: Right.
So this is a server-independent library
that does all of these optimizations.
And the way that it gets hooked up to-- and again, this
is completely open-source software.
But the way that it gets hooked into a server stack is
that whoever is doing that supplies some mechanism to do
HTTP fetching and some mechanism to do caching.
And in different environments, there are different
technologies for accomplishing these things.
ILYA GRIGORIK: These things are implemented in Apache.
So Apache, I'm guessing, already has an HTTP fetcher,
which you reuse, but the cache is likely something that you
guys have implemented yourself.
JOSHUA MARANTZ: Sure.
Actually, the cache that we use for mod_pagespeed is also
open sourced and would be the default setting.
But typically, in a serving environment that has some
maturity to it, there will be some other caching solution
you'll want to use instead of the one that
we have open sourced.
ILYA GRIGORIK: So in fact, maybe could I even use
something-- like if I'm building something with this
library, I could use memcached, right?
JOSHUA MARANTZ: Yes.
Yes.
You're kind of forcing me to tip my hand.
So a feature that we will be releasing soon but is not yet
in 1.0 is support of memcached, which is an
important feature for scaling up websites.
ILYA GRIGORIK: Right.
Nice.
OK.
So if I have a custom server, I could actually take this and
build my own mod_pagespeed variant.
JOSHUA MARANTZ: Exactly.
Yeah.
There's API documentation on the web in
the developers' site.
And we would be happy to support actively anybody
interested in porting this to a new platform.
ILYA GRIGORIK: Right.
And we'll mention this later, but you guys do have an active
Google Group where people can come in and discuss, propose
new filters, file bugs, all that kind of stuff.
JOSHUA MARANTZ: Yeah.
There's actually a variety of support forums.
There's the Google Groups.
There's the issues list.
People seem to be fairly active on
Stack Overflow as well.
We try to be responsive.
ILYA GRIGORIK: Yeah.
I see a lot of questions there.
JOSHUA MARANTZ: We try to be responsive to that.
But we track everything in our issues list, which is all
accessed off of code.google.com.
ILYA GRIGORIK: Right.
OK.
Perfect.
So I wanted to highlight a few kind of tips, configuration
tricks, and a few other things.
We looked at the guts.
We talked about kind of high-level things.
But one question that I get quite commonly with
mod_pagespeed is like, OK, great.
So I grabbed this, installed it.
I ran these three commands.
Now it's on.
What if it doesn't work, or I'm scared, or how can I
experiment with mod_pagespeed?
And there's a couple of ways to do that.
First of all, because we have this additional module
installed, you can actually configure through a couple of
different ways.
So you can use query parameters that will be
intercepted by mod_pagespeed.
So for example in this rewrite CSS example, we have
ModPagespeed=on, which basically says turn on
mod_pagespeed for this request only.
So you can have it disabled, but I'm going
to enable it here.
And by the way, enable this specific filter.
So if I want to experiment with some non-core filter, I
can just pass this in, see what happens, kind of test the
waters, and then decide if I want to make that the default
for my configuration or not.
JOSHUA MARANTZ: Yeah.
It's kind of a way to interactively rapidly iterate
on your site without having to restart Apache or anything.
ILYA GRIGORIK: I think it's one of my favorite features.
I love just being able to quickly get feedback on, how
is this going to look?
One alternative to that is to actually send HTTP headers.
So if you have some sort of a client or server solution that
you want to test with, that's another way.
And then the last one is-- we actually mentioned this-- the
mod_proxy forward proxy example, where you can
actually say, please fetch me this other site and run it
through PageSpeed and show me what will
happen when we do that.
ILYA GRIGORIK: Yeah.
This is a very good way if you're considering the option
of using mod_pagespeed on your site, but you're nervous about
like installing it and rolling out to your users without kind
of looking at it first--
ILYA GRIGORIK: Yeah.
Let's install it on 1,000 servers and see what happens.
JOSHUA MARANTZ: --you can install it on one server local
to your system, which is running your origin content.
It's running as a proxy.
And then you can look at your site through mod_pagespeed by
setting a browser proxy.
ILYA GRIGORIK: Right.
That's a very handy tool.
And in fact, all three of these are documented really
well on the mod_pagespeed site.
So I have a link down here.
But if you guys search on your favorite search engine for
mod_pagespeed and experiment, you'll find instructions for
how to set up the mod_proxy, which is really handy.
I wanted to highlight this, which is we mentioned already
that there is a lot of different filters.
And we do have good documentation.
And there's a couple different resources.
So one that you mentioned, which is modpagespeed.com,
where we actually list all the filters.
And we actually also provide the demos.
So it's usually kind of a simple file which illustrates
what the filter does.
So if you guys want to take a look at that, that's a very
good place.
And another one is, once again, the configuration, or
config filters, page on our developers.google.com site,
where we actually explain what each one does.
And we also highlight which ones are in the
core set and not.
And another thing I'll mention is that by default, when you
enable mod_pagespeed, as Josh said, you
have your core filters.
But you can actually say, don't worry
about the core filters.
I'm going to hand-tune all the filters myself.
So you can customize it completely for your site.
JOSHUA MARANTZ: Yeah.
By turning on the core filters, what you're doing is
you're kind of letting us make the decision as we move the
software and advance it of what we think is safe for most
sites, and you'll take that.
If you want to have total control, and when you upgrade
you'll decide which filters you want to enable for the new
release, then you can put it in pass-through mode and then
add the filters that you want.
ILYA GRIGORIK: Right.
So that's a good point.
So I should probably, unless I have a specific reason to
avoid core filters, I could leave that on because maybe in
the subsequent release you guys have added another filter
or improved another filter such that now it's considered
safe, and that would just be automatically included during
an upgrade.
JOSHUA MARANTZ: Yeah.
I'll give you an example.
I believe in the current release, we have a filter
called Flatten CSS Imports.
One of the biggest anti-patterns for performance
in CSS files is to use at-import.
But it's incredibly convenient to do it.
As a designer, that's what you want.
You want to be able to structure your code.
You want modular code.
So that's a good thing.
It's bad how it's delivered.
Mod_pagespeed with the Flatten CSS Imports filter will
flatten those all out so you get the best performance when
you deliver it, but you don't have to maintain that.
That was something that we built into the
product some time ago.
But we wanted to do a lot of testing on it to make sure it
was rock solid.
That's being promoted into the core filters in the next
release after 1.0.
And so if you have core filters then
you just get that.
ILYA GRIGORIK: Interesting OK.
That's good to know.
So we also touched on some configuration.
But one of the really nice things about Apache is that
you can configure it in a million different ways.
So there's your Apache config, where you can specify your
virtual hosts.
So mod_pagespeed can be configured at a v-host level.
So an example, down here we're saying mod_pagespeed is on for
this example site, and pass-through is actually the
command that tells us, don't include the core filters.
I'll hand-tune the filters that I want.
So we're just enabling these, I guess, five filters for this
example site.
But I can also be much more granular.
I can use the htaccess file.
So for example, I have my v-host.
I have my example file.
But in my slash, I don't know, assets I want to have a
different set of filters, I could actually drop in an
htaccess file with another configuration.
JOSHUA MARANTZ: Right.
And there's yet another twist, which is you can use a
directory scope in the configuration file.
ILYA GRIGORIK: Right.
So I could literally have different filters running on
different subsections of my website.
JOSHUA MARANTZ: Exactly.
Actually, the implementation of just how the options get
configured is itself a pretty big topic within the
mod_pagespeed codebase, because you can configure my
request headers, by query parameters, by virtual host,
by director scope, by virtual host, and at the root.
ILYA GRIGORIK: Yeah.
But I think it highlights the fact that our users
have asked for that.
So they are using all of these mechanisms to
customize their sites.
So we needed to have it.
And it allows you a lot of flexibility,
which is very nice.

And experiments.
So I think this is something that you
guys added just recently.
JOSHUA MARANTZ: That's right.
We've been traditionally using WebPagetest, which is an
amazing tool for doing detailed analysis.
That's how we produced the video and the waterfall
diagrams that we saw earlier.
But WebPagetest will allow you to run your tests from a set
of servers that are running in some corner of the world.
There's ones in Singapore, in Dublin, in
Virginia, and so on.
But what you really want to do at some point after you deploy
is see what experience your actual users are having.
And so what this does is it injects some performance
measurement, using Google Analytics, right into the web
pages and allows you to bucket users into experiment groups.
And you can say, for example, first of all you would
establish what Google Analytics ID you want to
report the data to.
And then you can say, well, I'm going to send a third of
my users into kind of a control bucket which doesn't
have any optimizations in it.
mod_pagespeed is running, but it isn't doing anything except
injecting the Analytics experiment.
The second one we can say, let's just have the image
compression and nothing else.
And the third one, let's have the default settings.
Or there's a whole set of options that you can do to
customize your experiments.
Then you can let this run for a day, a week.
Depending on the experiments you might leave a small
control group just to see how it's doing, and go back to log
into Google Analytics and see how users for each bucket are
faring in terms of the latency that they're seeing
on their web pages.
ILYA GRIGORIK: So this is really cool.
So what you've described there is the difference being
synthetic testing and real user measurement, which I
think is what you're referring to when you're saying Google
Analytics, right?
JOSHUA MARANTZ: Yes.
ILYA GRIGORIK: And we actually had an episode with Justin
Cutroni from Google Analytics where we talked about
navigation timing and why it's so important.
And the point that Justin always loves to make is that
it's great that the developers want to optimize the site.
They always want to optimize the site.
But how does it affect my bottom line?
Like the business metrics, the dollars as he put it.
So this will actually tell you.
So we have three buckets here.
And if I have in my Google Analytics some conversion
metrics-- that could be a purchase, that could be a
registration, even time on site or bounce rate--
now I could measure against that and say,
well, you know what?
Users that get a faster experience
are staying for longer.
Maybe they're converting for more.
And that makes for a very compelling case to the rest of
the team to say, this is why we should invest into more
performance optimization.
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: Awesome.
I love the business use case.
It's not just like speed for speed's sake.
Although speed for speed's sake is also good, because it
makes the web faster.
So this example, this is actually a very common
question that we see, which is many people have already
applied some optimizations to their site.
So a good example of that is something like domain
sharding, where the problem is that modern browsers allow up
to six connections per host.
So if you're hosting a lot of images on your domain, you may
get blocked as you're trying to download a lot of images.
So the general best practice for that is to say, well, host
it on different subdomains.
And then that will allow the browser to open multiple
connections--
more than six, I should say.
But that creates a little bit of complexity for
mod_pagespeed.
This is where you need to kind of hand-tune your
configuration.
So can you explain what's happening here?
JOSHUA MARANTZ: Sure.
So the challenge is that you want to--
well, there's a couple challenges.
So if somebody has hand-sharded their domains or,
in many cases, just done a simple best practice of moving
their resources to cookie-less domains, which is all good,
the first thing that you have to do if you want
mod_pagespeed to be effective is you have to let us know
what those domains are, because mod_pagespeed doesn't
know what the domain mapping is.
So we have pagespeed.com settings to tell us.
So the first thing you have to tell us is what are the
domains that are basically equivalent on your site.
And so if you have, like, static.example.com, your HTML
is coming on www.example.com, you have to authorize, at
least with ModPagespeedDomain, static.example.com.

And if you've done hand-sharding, you may have to
authorize more than one of those and tell us that they
are essentially equivalent by mapping them to kind of a
canonical name.
ILYA GRIGORIK: Right.
So if I'm running example.com, and I'm serving images from
example.com, then mod_pagespeed would say, yes,
I know that I'm hosting this.
Hence, I can optimize this asset.
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: But if I'm hosting on a cdn.example.com,
that could be anywhere, or it could be a third-party asset.
So mod_pagespeed won't touch that by default.
JOSHUA MARANTZ: Right.
If for example you're serving an image on Flickr or
something, Flickr is not yet running mod_pagespeed.
And so if you just rewrite the URL the way we did with the
.pagespeed [INAUDIBLE]
and it's on Flickr, then it just won't work, because
Flickr won't be able to decode that name.
So we wouldn't necessarily authorize that.

But if you have images on your site that you want to put onto
a CDN that knows how to reach back to your origin, then you
can do a domain mapping to say, I want to take the images
that are on example.com and put them on cdn.example.com.
Now when mod_pagespeed rewrites that URL, when it
optimizes the image or the CSS file, et cetera, it will
rewrite the domain to go onto the CDN.
This is, I think, kind of a development feature which
allows you, for example, to develop locally
and turn that off.
But then when you're ready to actually push resources to the
CDN, you can turn that on.
This also allows you to apply sharding.
So by establishing the shards, if you want to, for example,
shard two ways, then you can use the command that we gave
here, ModPagespeedShardDomain example.com to
example1, and example2.
ILYA GRIGORIK: That's the bottom one here.
JOSHUA MARANTZ: And then mod_pagespeed will kind of
randomly disperse the resources to those two domains
so that you can have more parallel connections.
ILYA GRIGORIK: So this is definitely a more advanced use
case where that's going to reach deeper into
mod_pagespeed and also think about how does this work in
the context of me using a CDN.
But that in itself is actually an important point.
It is CDN-friendly.
So you can make it work with your CDN provider and help
your CDN serve optimized assets.
JOSHUA MARANTZ: Exactly.
And this is something that I think it's useful to
experiment with.
One of the things that you probably don't want to do is
try to hand-shard your resources in your HTML file,
because the best practice is to shard domains,
but exactly to what?
I've seen the right answer be four, the right answer be two,
the right answer can sometimes just be one.
And so all the effort you do to hacking your HTML to edit
the domains really is kind of counter to the notion that you
want to experiment with it.
And you can experiment very easily by just iterating over
your pagespeed.conf file and looking at WebPagetest.
ILYA GRIGORIK: Yes.
That certainly makes it a lot easier.
Yeah.
So we talked about the forward proxy.
But I recently came across a blog post, I think it was
Frank Denis that wrote this really awesome blog post that
kind of blew me away, because what he did was he used
mod_pagespeed as a forwarding proxy for his phone.
And the basic observation was that when you're on your
mobile device, you probably don't have a Wi-Fi connection
most of the time.
You're in 3G.
If you're lucky, you're in 4G, what have you.
And you're downloading these massive websites.
So instead of using mod_pagespeed to accelerate
your site, why not use mod_pagespeed to accelerate
the rest of the web as you fetch it?
So in this diagram here, I have my phone.
We're sending a request through this forward proxy,
which is running mod_pagespeed.
Mod_pagespeed requests the actual site that I requested.
I get this fat response back with all kinds of unoptimized
images, et cetera.
Mod_pagespeed crunches all of that and sends me the
optimized assets, which I thought was
really, really clever.
So he did this with his iPhone.
And he observed that for the sites that he tested it on, he
got much faster renders and much fewer bytes.
And in fact, he shared some examples.
And we'll take a look at those later.
But these are the actual filters that he used.
So he shared those.
And some examples that I wanted to highlight was first,
he enabled core filters.
So that's kind of by default.
But I think he just wanted to have it
in there to be explicit.
He said, I'm going to rewrite images, convert JPGs to WebP--
so he knows that he's accessing this on Chrome on
iOS when he's using that--
convert PNG to JPG.
And in fact, this is an important one that you
mentioned earlier, when I'm on a mobile device, I have a
small screen.
I probably don't want 100% fidelity of all the pixels.
I'm OK with the 75% compression ratio.
And that gives me a lot of savings,
byte savings for images.
So this is kind of an interesting example.
And he also did a couple of aggressive filters, which say
defer all iframes until after onload and other things, just
to accelerate his browsing.
JOSHUA MARANTZ: Yes.
And pointing out, in particular, defer JavaScript
has a huge impact on the speed of websites.
It's something that you want to look at the results
of when you do it.
It was aggressive to put it into a forward proxy, but he
was extremely happy having done it.
ILYA GRIGORIK: Right.
Yeah.
And these are some examples.
So of course, this is not representative of the entire
web, but he kind of highlighted a few.
So for example, this over-blog URL, it went from 400
kilobytes to 271, which is pretty significant.
Going from 39 seconds of onloads to 2 seconds is a big
improvement.
And not only that, but you can see that because he was
combining resources, it went from 34 to 21.
So the mobile browser had to make fewer requests.
All of those things are a win.
And he got a better mobile experience.
Now this next one just kind of completely blew me away,
because I didn't believe it.
But it serves as a good example.
Cooking With Frank.
So this is a blog, lots of pictures.
And guess what?
The unoptimized version is 3.15 megs.
With compression, it comes out to be 10 times
smaller, 340 kilobytes.
So when I'm on my mobile data plan, I probably want the 340
kilobyte version.
It'll load much faster.
Instead of making 85 requests, it made 28 requests.
So this is a dramatic difference.
JOSHUA MARANTZ: Yeah.
We're still learning exactly what works really well on what
kind of mobile device and what kind of connection.
But it seems likely that having a lot less requests
will benefit mobile even more than it will benefit desktop.
ILYA GRIGORIK: Yeah.
So this, in general, seems like a very interesting area
to explore for mod_pagespeed, like I want this on my phone.

So just a quick recap.
We've covered a lot of stuff here.
So we talked about the upcoming 1.0 release.
It's an open-source Apache module.
It works with 2.2 and 2.4.
Kind of the pitch, if you will, is just-in-time
performance optimization for your website.
And it's already very widely deployed across the web.
So we feel it's 1.0 ready.
It's 1.0 ready by Google standards, which
is perpetual beta.
So that says a lot.
But one question I do have for you is, what's after 1.0?
Are we done?
JOSHUA MARANTZ: I feel like we're at the
beginning of this process.
We've definitely discovered that there is some meat to
chew on here.
There's a lot more that we can do.

SPDY is an obvious topic.
The rules change when you're working with SPDY.
Combining becomes less important, because you can
multiplex multiple resources over the same connection.
Inlining becomes less important.
ILYA GRIGORIK: Same reason, right?
JOSHUA MARANTZ: In the release that is coming after the 1.0
release, we'll start seeing some of the deeper SPDY
integration.
So Google also has a module called mod_spdy, which we work
pretty well with.
And look for more in that space.

I would say the big wins that we have right now, images;
extending cache lifetime, which is something that really
benefits repeat viewers to things like news sites;
deferring JavaScript.
There's kind of other big areas where we're more aware
of the networking characteristics of the page
and we're optimizing.

I feel like we're relatively early in our understanding.
We've found a lot of good things to do.
But when we find good things to do, it usually uncovers 10
more that we don't have time to do yet.
ILYA GRIGORIK: Yeah.
So I think that's very representative of the web
performance community in general.
I think we're still finding a lot of interesting edge cases.
And the browsers are only getting smarter.
We're only getting more and more assets on the web.
So in fact, we know that the web pages are growing, both in
size and number of requests.
So it sounds like there's a lot of work to do.
JOSHUA MARANTZ: There's an astounding
amount of work to do.
But I think that we've come to a point now where we have a
stake in the ground where we have demonstrable benefit.
We have adoption.
And we'd like to grow it.
And we're ready to take off from here.
ILYA GRIGORIK: So I'm glad that you guys are doing it,
because that makes my life a little bit easier.
I can install this and inherit all of the work that you've
put into this.
So I think for the last slide here, we've
covered some of these.
But I want to highlight these, because I get these questions
quite frequently on Stack Overflow, through email, and
through other means.
So I kind of bucketed them.
We already talked about mod_deflate, mod_expires.
So those work together with mod_pagespeed.
JOSHUA MARANTZ: That's right.
In fact, mod_pagespeed turns mod_deflate on.
And it's kind of dependent on mod_expires, because we have
to know how often to pull the origin resource.
And you definitely want to put an expires header.
You want to use that.
Tell us how often to check back to see if your resource
has updated.
Actually, I just want to point out one other thing.
Mod_pagespeed can also look directly at the file system,
in which case it can just stat the file to see if it's
changed, which is a little bit more efficient if your files
are right there on the same server, as opposed to being
generated by PHP or pulled from somewhere else.
ILYA GRIGORIK: Actually, that's a good point.
That's another config flag that you can find in our
documentation.
JOSHUA MARANTZ: That's correct.
So that's mod_pagespeed load from file.
And I think that if the files are there on your disk, just
get mod_pagespeed to look at them directly.
But if they're not and we have to do a fetch to get them,
then you definitely want to use mod_expires to tell us how
often to do that fetch.
ILYA GRIGORIK: Right.
OK.
For the CDN , I think we've covered a little bit.
JOSHUA MARANTZ: Yeah.
We covered it.
CDNs are driven by the cachability of resources we
make things cachable for a year.
ILYA GRIGORIK: Which also, I think, answers the next
question which is, if you're using a CDN-- or maybe if
you're not using a CDN, rather, but you are using
another cache in front, maybe a Squid, a Varnish, what have
you, maybe Nginx, those should still work, right?
JOSHUA MARANTZ: Exactly.
ILYA GRIGORIK: They'd just be more efficient.
JOSHUA MARANTZ: Exactly.
They'll just have to pull the origin less often.
ILYA GRIGORIK: Yeah.
OK.
Perfect.
So we talked about or we mentioned the
mod_pagespeed cache.
So mod_pagespeed has its own cache.
We talked about the upcoming memcache.
But as a developer, do I need to worry about that?
So if I have my assets--
and who manages that?
If I update my asset, do I need to worry about flushing
the cache, et cetera?
JOSHUA MARANTZ: So mod_pagespeed comes
pre-configured to use the file system as a cache.
And that works reasonably well.
As you scale up websites, you have to think a little bit.

We set the default cache, I think, at 100 megabytes.
Is that enough for your assets?
Or do you want to make that grow?
ILYA GRIGORIK: So it's something you can probably
tweak in the configuration.
JOSHUA MARANTZ: That's another configuration parameter.
How often we go and garbage collect that
cache is another question.
So when you change your assets, you don't have to
manually purge the cache.
Mod_pagespeed will just do it automatically.
ILYA GRIGORIK: And that was actually that file name kind
of scheme that we looked at earlier, right?

JOSHUA MARANTZ: Yeah.
Well, the files on the cache have recognizable names.
But they're not exactly that scheme.

But the hierarchy of your URL space for your assets is
reflected in the cache.
So you can kind of poke around the cache and see
what we have in there.
And you can just delete it.
They're just files.
ILYA GRIGORIK: But it sounds like generally speaking, I
shouldn't be touching them.
JOSHUA MARANTZ: But you don't really need to touch it.
You can just configure how big you want it to be and how
often you think we should go and purge it.
ILYA GRIGORIK: Perfect.
JOSHUA MARANTZ: And upcoming, you'll be able to say, well,
instead of storing the files on the disk, I want to store
them in memcache.
And here are the host and port numbers
of my memcache instances.
And then you can share that cache among multiple servers
so that you can scale up your website a little bit better.
ILYA GRIGORIK: Yeah.
That's very cool.
So we actually talked about affecting or not affecting the
page load time when the cache is empty.
So that was that if we don't have the image resource
optimized, we will just serve the original image.
But on the next hit, you will get the optimized resource.
So as you said, the last thing that mod_pagespeed wants to do
is to make your site slower.
That would be the anti-pattern.
So that should never happen.
But I'm guessing all of this work does
consume some resources.
So what should we expect?
If I install this on my server is there kind
of an average number?
Does it really vary based on the site, because it seems
like it would, right?
JOSHUA MARANTZ: Sure.
A very image-rich site that installs mod_pagespeed for the
first time will go through a period where we'll use
resources on the server to optimize the images.
There will be a bounded amount of resources.
This is actually another config parameter that you can
set, because we don't know exactly how many CPUs you have
or anything.
But by default, we will do, I believe, eight concurrent
image optimizations maximum per physical machine.
ILYA GRIGORIK: Right.
So it's like background workers
optimizing these images.
JOSHUA MARANTZ: And that's across all
of the Apache processes.

And so it doesn't just fan out arbitrarily until it kills
your machine.
ILYA GRIGORIK: Right.
That would be an anti-pattern.
JOSHUA MARANTZ: Yes.
That would be another anti-pattern for serving your
resources efficiently.
But what will happen is if you have a page full of images,
and the first time somebody goes to them, we'll start
spinning up the optimization of those, once those are in
cache, that'll settle back down.
So there will be typically be a few minutes--
it would vary on the site--
of where all these images get optimized, put into the cache,
and then you're good to go.
If the cache is too small, then it might be ongoing.
ILYA GRIGORIK: So the most part, if your website doesn't
change dramatically every couple of minutes, chances are
your visitors will be just hitting the cache.
And you would only see this extra work being done when you
have new assets or, for whatever reason, that asset
got evicted from the cache.
JOSHUA MARANTZ: Right.
ILYA GRIGORIK: And that's where you may want to go back
and configure or check, is your cache being used up?
Maybe you should increase the size or
something to that extent.
JOSHUA MARANTZ: This is probably also a good time to
point out that mod_pagespeed offers some visibility into
what it's doing, because it has a statistics page.
So on the local server, you can go to mod_pagespeed
statistics, which by default is accessible
only from local host.
But you can configure that too.
And then you'll see how many image rewrites are going on.
You'll see a variety of statistics, which kind of give
you a way to put your finger on the pulse of mod_pagespeed.
ILYA GRIGORIK: So I'm guessing if I'm using some monitoring
system, I could probably get the variables out of there,
shove it into Ganglia or some other system, and track all
that performance there as well.
JOSHUA MARANTZ: It's very scrape-able.
And in fact, I think very soon after mod_pagespeed was
released people started to say, well, I've hooked this up
to this visualization system, and here's what it's doing.
ILYA GRIGORIK: That's the first thing that I would look
for as well.
That makes perfect sense.
So shifting gears a little bit, we didn't specifically
talk about mobile, with the exception, I guess, of the
forward proxy.
But is there anything in particular that we need to be
aware about for mobile and mod_pagespeed?

JOSHUA MARANTZ: Mod_pagespeed, this is actually one of the
areas where I think we can do a lot more in the future.
But we're already providing a substantial benefit, making
things smaller and less requests.
It's all good.
ILYA GRIGORIK: It's images.
We saw that, right?
JOSHUA MARANTZ: It's all good.
What Frank Denis did was he cranked the quality
level down to 75.
Typically, we would recommend if you want to do this for
desktop, we would say 85 is a very safe number.
But for mobile, you might want to crank it down further.

I can't think of anything that mod_pagespeed does that would
be undesirable for mobile.
I think it's all good.
ILYA GRIGORIK: Smaller resources, fewer requests, all
of those things are prime candidates for improving
mobile performance.
JOSHUA MARANTZ: The only question, is
there more we can do?
And the answer is absolutely.
Stay tuned.
ILYA GRIGORIK: Right.
I think that's a good note to kind of end this on.
I'll just mention that we do have a lot of online resources
about mod_pagespeed, if we didn't answer your question.
So good places to start are modpagespeed.com.
I think there's actually links to the Google Group, the issue
list, and demos there.
So that's a great place to kind of kick off your
exploration.
We do have a Google Group where you can ask questions.
And of course, you can also just reach out to myself or
Josh, and we will be happy to answer any questions.
So thank you, guys.
JOSHUA MARANTZ: Thanks.