Google I/O 2012 - For Butter or Worse: Smoothing Out Performance in Android UIs


Uploaded by GoogleDevelopers on 29.06.2012

Transcript:
>>> Ladies and gentlemen, would you please welcome Chet Haase and Romain Guy.
[ Applause ] >>Chet Haase: Thanks. Welcome to "For Butter
or Worse," which I kind of thought we would actually have to explain what we meant by
"butter," but since they actually called it "Project Butter" in the keynote yesterday,
maybe you're absolutely on the same slide with us now.
This is Romain Guy. >>Romain Guy: So my name is Romain Guy. I'm
the tech lead of the Android UI toolkit. >>Chet Haase: And I'm Chet Haase, on the same
team. And we work on graphics and UI and animation and all that stuff. Moving stuff around which
is kind of what this stuff is about. It's a bit self-indulgent because we're going to
tell you a lot of stuff that we did internally to make the platform better but it is our
subjective and totally selfish believe that understanding more about the architecture
and what's actually going on in the platform will help you write better applications.
>>Romain Guy: There are three parts to this talk. The first part is what Chet just mentioned,
we're going to explain how things work under the hood, then we're going to show you a few
tools that you can use to identify problems in your application, and based on the result
of these tools we'll show you a few tips and tricks you can use in your application to
fix those issues. >>Chet Haase: But before we get there, we
don't -- >>Romain Guy: I must say he took that picture.
[ Laughter ] >>Chet Haase: He focuses on landscape and
I focus on fattening food products. We don't like to give a talk here at Google and Google
I/O without giving something away, so if you look under your seat, you will find a stick
of butter. It's not there because that would be ridiculous.
[ Laughter ] >>Romain Guy: And very gross. So let's define
a couple of terms for you. There's jank and there's butter. So jank is about two things.
It's basically discontinuous experiences in the UI. So one is choppy performances. Hiccups
and animations as they're happening, rendering is not quite happening fast enough so the
user sees stutters. The other meaning that we assigned to jank is discontinuous experiences
like layout -- sort of relayouting and resizing itself on the screen, in front of the user,
very discontinuous, very disconcerting. >>Romain Guy: And this is a word we use a
lot, so someone yesterday asked the question in the fireside chat, he wanted to know what
skills do we need to work on the Android team. Someone replied you need to know how to skydive.
Well, you also need to use this word in every sentence. If you come around the office, it's
pretty annoying, you hear jank, jank, jank, jank, jank, jank. I hate that word.
>>Chet Haase: Swiping the home screen feels very jank. Yeah, we actually got really tired
of this word; on the other hand, it was a nice word to describe exactly what we meant,
which was the experience is not good enough, what can we do to make it more buttery. So
butter means two things: Smooth performance, which is basically the opposite of that meaning
of jank, let's smooth things out and remove that choppiness and animations and rendering.
Such as home screen swiping is very buttery in Android 4.1 Jelly Bean. It also means a
fattening spread. We're not actually talking about that meaning today, but there's some
rice crispy treats right out there and you can get your dose of it right out there.
>>Romain Guy: And I have tip for you. If you take a stick of butter and you smudge it across
your screen you will get free antialiasing on all your applications.
[ Laughing ] >>Romain Guy: I'm joking, but remember "The
Empire Strikes Back," remember the big robots at the beginning, that was stop-motion animation,
that's what they did to do the motion blur, because you can see -- usually when they use
that technique you can see it's puppets. Well, they were just smudging some sort of jelly
in front of the camera to make it look blurry and unappetizing.
>>Chet Haase: That would have been a lot less effort. We could have spent the last several
months doing something else besides actually working on performance if we just had enough
butter. So, as I said, we're addressing the first concern of jank right now, choppy performance,
the other is sort of discontinue those UI experiences. Separate problem. This one is
all about graphic performance, animation performance, speeding up the platform as well as tips that
you can do to speed up your applications in particular.
So for any food product, there's always a recipe. If you want to make butter you're
going to make cream. If you want to make butter in Android there's essentially two elements
that we're talking about here. One is low latency by which we mean the time elapse between
something happens, like the user touches the screen and the effects of that action appear
on the screen so the user actually sees the consequence of that action. So what we want
to do is minimize latency. At the same time we also want to speed up frame rate. We want
to make these things happen as fast and as continuously as possible so that the user
has a continuous experience through user events as they're dragging things along or animations
that are happening on the screen during transitions or whatever.
So, first of all, let's talk about the latency issue. Specifically we'll talk about what
we did in the input area. So the overall way that input events appear on the screen, a
consequence of that action is there's a bunch of events. You can picture the finger hitting
the screen, there's a down action, moves around a bunch, there's a bunch of move events, eventually
the finger comes up. So there's a pile of events that come in and asynchronously put
into this event queue and they're batched up and they're sent over across some wire
and eventually we process those events and then we draw. And the affects of those drawing
-- those drawing operations are eventually seen on the screen by the user. So we flip
the buffer, they see the display. So that's the whole process, old and new. This is how
things happen. Events come in, we react to it, we draw, user sees the results.
>>Romain Guy: So here's a way to visualize the issue. So here we have a screen shot of
the Nexus 7, so the device you all got yesterday. And you can see we have an icon in the top
left corner of the launcher, we're just going to simulate a finger, long pressing on the
icon and moving it around. So we pick up the icon and then when we move it, that's latency.
You can see that the finger is moving faster than the icon itself. The maximum difference
between the position of the icon and the position of the finger, that's latency. And latency
is due to many things. Some of it is introduced by the software rewrites, some of it is because
of the hardware, it can be the touch screen, it can be the memory, it could be many, many
things. So we're going to talk about what we did in our software to reduce the latency
that we introduced in the system. >>Chet Haase: Specifically, let's talk about
what happened in the input system. So we drew a really complicated diagram. Get used to
complicated diagrams, because we're basically talking about complicated things, and complicated
things require complicated diagrams with primary colors in them.
>>Romain Guy: And you can see if you can do what we do and just pretend you understand.
>>Chet Haase: So this is not a real situation but this is essentially what's going on. You
can sort of picture the green string along the top as a series of input events that are
continuously coming into the system. And then occasionally -- this is in the pre-Jelly Bean
era -- we would batch up the events that were there already and we'd put them into a queue
to await later processing, eventually the thread kicks in at the activity level that
processes those events that were batched earlier, and then we draw the results and then we see
the results on the screen. And what you're really looking for is when do the things appear
to the user which is basically at the end of that drawing cycle. And you can think about
the latency involved in different parts of this system.
So, for instance, event A, which is the first one up there, didn't appear -- the consequences
of that didn't appear to the user until the end of that first draw action so fairly long
latency period. Event C, which happened just slightly after A, also didn't appear till
then, right? They were batched up together and all of them appeared, the results of those
actions appeared at the end of the first drawing cycle.
And then we go in and we batched up further events in the meantime asynchronously and
then they come in and we process those, events D-J and they also have these fairly extended
latency periods between when the user did something and when the results of that action
actually appeared on the screen. So one of the big things that we did to smooth
out performance in Jelly Bean was to address the system of too many layers of interaction
in queues and dispatching operations happening in input. What we really want to do is get
the input to the process as soon as possible so that when the user is ready to actually
process input events, it gets the latest thing that happens. So we can take a look at the
diagram as it actually happens in Jelly Bean. Again, we had this string of events going
on. We have vsync, we'll talk about vsync a little bit later, but basically this is
the time at which we can actually sort of grab things and process it which is a little
out of sync with the way the diagram happens to be drawn. We'll going to come in -- you'll
notice the blue bars went away. So we no longer have a dispatching operation that bundled
these things up and then sends them off into the ether somewhere. Instead, at the time
when we process, which is a proc A-F bar in here, we grab all the events at that time,
so we're not just stuck with the A-C events that happened further away in time; instead,
we can grab every event that happened up to that point. And the net effect is that we
have much smaller latencies. The older event may still take awhile to actually reach the
screen, but you can see the newer event that happened right before the process operation
appears in a much smaller time period at the end of the draw operation. So what we've got
is much better latency overall. There's a couple nuances to this. In the old system
it was also possible to really bog down the system. If you were sending in a lot of input
events, you could actually bog down the dispatcher and we would be way, way behind and getting
further behind all the time. The other nuance which they mentioned in the
keynote yesterday was not only are we streaming the events to get them there faster, but we're
also able to anticipate in some cases where the input events should be at exactly the
time that we need it. So we go back and we get the most recent event and if there is
no input event at that specific time, we could plot a little bit of a trajectory and say,
well, where would an input event be given the information we have about the preceding
events such as the finger moving on the screen. And this gives us much more input events and
much more recent input events that we can track things with.
And the best thing to do, we're not showing demos today because it's really hard to show
this stuff on the screen especially with this display technology. Set an Ice Cream Sandwich
device next to a Jelly Bean device and just do the same swiping and launching operations
on them and you'll see that we mean, especially with --
>>Romain Guy: And just one thing. On this diagram, the latency, for instance, of input
D and G, looks a lot worse than it actually is. When we measure latency, we measure the
number of frames of latency, so here the D and G has only one frame of latency and it's
really, really, really good. To give you an idea, like the best devices
that you can get today or the kind of performance we get with Jelly Bean on the Nexus, you can
expect around five frames of latency between the time, you know, your finger causes a reaction
from the device until you see the frame that corresponds to this action.
>>Chet Haase: I've been talking for a long time, I don't know --
>>Romain Guy: I like to listen to you. >>Chet Haase: Then I'll talk some more. So
frame rate is the other one. So reducing latency is good. Also making the frame rate faster
and more consistent is a good thing and frame rate is all about making the drawing time
smaller. So how can you actually draw your stuff faster. That's kind of what it's all
about, but that's not really all there is to it. For one thing, there's a lot of things
that go into what we mean by drawing and drawing faster. So we thought it might help to go
into a architecture diagram. >>Romain Guy: Yeah, so here's the explanation.
>>Chet Haase: Give you a second, everybody done? All right, good. So moving on, that's
how the rendering system works. Next topic -- maybe it would help to actually
walk through this a little bit 'cause we're serious, that is the diagram and you're going
to understand it in five minutes or you have to leave.
So there's essentially three parts to the process: Something happens, some event occurs
or an animation event, finger-dragging, whatever it is, something happens that we need to react
to, then we draw; big, nebulous box that has a lot of subcomponents to it, and then we
hand over control to the surface flinger object to actually get the results of that onto the
screen, that's the buffer flipping, it's posting the pixels actually onto the screen, the buffer
where the pixels are. So let's walk through the different parts of these three processes.
>>Chet Haase: And so surface flinger, for those of you who don't know what it is, it's
our window compositor, so it takes all the windows that are visible on the screen at
the same time and makes one single image and then pushes it to the display. So it's not
something you interact with directly but that's the -- the window manager and the kind of
code rewriting UI toolkit interact with surface flinger a lot.
>>Chet Haase: So the first part of the process, something happens. So let's say there's an
event, the user touches a screen. That will set a property value or propagate the event
somehow, let's say, you know, setting a translation property on some view, whatever. That causes
an invalidate, that's what we mean by something happens basically. There's an event, we react
to it, we set a value, we invalidate which tells the system next time we come around
to do something we need to redraw the affected view.
>>Romain Guy: And when we say "invalidate," we really mean view that invalidate. It's
probably a method that you have used yourself, this is what is happening here.
>>Chet Haase: And it's what happens internally as well, when you change something on one
of the standard views, we invalidate internally. Very core method. And there's right and wrong
ways to use it and we'll get into that later. Okay, now we're into the big part of the process.
The bit in the middle that says draw. A lot of different pieces to this. Measure and layout
is certainly part of draw. In an optimal situation, if you're in the middle of the animation,
hopefully you're not measuring and laying out, lay outing? Doing stuff with layout.
>>Romain Guy: That's what you shouldn't be doing.
>>Chet Haase: Yeah. But it is a standard part where we come in to do rendering, we say do
we need to do measure and layout and then we go on to the next step which is preparing
the draw. >>Romain Guy: And we have this really particular
name where we dequeue a buffer. We talk to surface flinger so we have this window compositor
and we ask surface flinger to give us a buffer in which we can draw so this is dequeuing
the buffer. >>Chet Haase: The next step is to update the
display width. So as of Android 3.0 in Honeycomb, we started GPU accelerating the canvas API
and a lot of the applications and framework that we run with internally and the way that
we did this was by creating these structures called display lists which is an intermediate
representation of all the drawing commands that a view does. So we create this display
list that the entire hierarchy basically says I need to move here, draw this, set a cliprect,
all this stuff, and then a display list later is issued into --
>>Romain Guy: If you want to know more about those two little blue boxes, update display
list and display list, you can go to YouTube, and we gave a talk last year at Google I/O,
a one-hour talk where we talk in detail about displays and how they work and how you should
use them. >>Chet Haase: But not now because we don't
have an hour for you to watch that, you can do that later.
So we update the display list and then we draw the display list. So now that we have
all the information about all the drawing commands that we need to issue, now we go
ahead and issue the commands which is basically a process of telling open GL, draw this line,
draw this bitmap, do these operations. And these are both working from this common display
list data structure. And then we get to the point where, okay, we've done all our drawings,
we're ready to tell surface flinger, hey, swap the buffers, I'm done, flip it, get it
onto the screen. So then we do the opposite of dequeue, we do an NQ, we say, here's our
buffer, we're done with it. >>Romain Guy: So once surface flinger gets
the buffer back from our application, it has to composite all the windows that you see
on the screen. It does that into a single buffer, we're simplifying a little bit, so
we compose everything into a single buffer and that's what we display. So compositing
windows, we'll talk a bit more about that a little later.
>>Chet Haase: So now everybody understands the architecture, right?
>>Romain Guy: Yes. So it's not that difficult. But it's important to understand the different
stages so -- to know we invalidate and updated the displays, then we draw the displays and
we send our buffer to surface flinger because we'll say at the end of the talk what you
can do to make every one of those steps a little faster.
So we mentioned during the keynote that we introduced vsync on Android. It's not quite
true. So for those of you who don't know what vsync is, it's basically the ability to synchronize
yourself, your code, with the refresh rate of the displays. So typically we talk about
60 Hertz, we think of displays as running at 60 FPS, so every 60 milliseconds you get
a pulse from the display that's the vsync that says I'm about to render, to put a new
buffer on the screen. And if you manage to synchronize everything with the display, you
can get much smoother performance and we'll explain how.
On Android, it's not quite true that the display is always at 60 Hertz, it will depend on the
device. So some devices will run at 55 Hertz, some devices will run at 57 Hertz. The Nexus
7 in particular runs at 60 Hertz, so this is a good device to test on. This can be a
good thing when it displays at 55 Hertz, you have a few more milliseconds per frame to
draw and still look really smooth. We talked about introducing vsync, but we've
always had vsync on Android. So surface flinger itself was synchronized with the display,
so that we never suffer from tearing. So tearing is an effect you see in games sometimes, especially
on consoles, when you turn around really quickly, you get this effect where you're seeing at
the same time on the screen the old buffer, so the last position where you were in the
game and then you see the new buffer coming in and they're both on the screen at the same
time because the display is busy refreshing line by line.
So to avoid that issue we had vsync. But now what we do is also use the vsync pulse to
synchronize the logic inside the applications. So we synchronize all our animations, the
scrolling, the flings, dispatching the touch events with the vsync to get smoother results.
>>Chet Haase: The process of getting pixels to the screen. This is more education for
you, how do things work, so that we can see actually how to optimize it, is basically
three steps you can think of this as. First, we're updating the display list, which we
saw before. This is generally CPU operations, so this is calling your on-draw method or
our on-draw method internally for the standard components, and then we're drawing the display,
this is done on the GPU, we actually call GL draw and then it queues the things up in
the GL and does its operations on the GPU and then actually displaying the pixels, flipping
the buffer, this is, again, GPU operations, to take that buffer and post it to the screen.
And then visible is just more of a point in time where the user can actually see it which
comes in handy when we're going to talk about how we sped things up to make things visible
as consistently as possible. So in the previous platform, in Ice Cream
Sandwich, this is kind of how things looked. Yes, we used vsync, but did it at a very low
level, which was when we get around to posting the buffer, we post it during the refresh
interval, which means there's no tearing on seeing the buffer. However, nothing else in
the system was actually working off of vsync. We were just working on event systems that
had similar timing behavior but had nothing really to do with vsync itself.
So we would come in at any random time during the frame to actually do our drawing operation.
So you can see here the display at the top, this is the buffer that it's currently showing.
So it may be showing buffer 0, which means that we can work on operations for buffer
1. So we do our update display list calls on the CPU when we're done with that, and
we pass control over to the GPU to do the actual issuing to do OpenGL. And they're both
working on buffer 1. And then when that's done, then that can be displayed on the next
vsync interval. So go into the first frame, display showing buffer 1. Life is good. Now,
if for some reason there's been a delay before we actually get around to rendering on the
CPU and updating the list again. So we go in and can't draw into buffer 2 until very
near the end of that vsync frame, which means that by the time we're done with that and
then the GPU operations that follow after it, it's too late. We can't vsync. So what
we end up with is jank; right? We basically displayed the buffer 1 twice in a row, which
means to the user they have seen a hiccup in an animation or a rendering because they
saw the same information on the screen for two frames in a row.
And 16 milliseconds may not seem like a lot of time and 60 frames per second may seem
really fast, but you see two of these, you may not know exactly what you saw, but you
will notice it. >>Romain Guy: What's difficult about it is,
if you see that kind of issue in your application and you use Traceview or one of our (indiscernible)
tools, you'll see that your code may be running really fast. Even if you take a fraction of
a millisecond to execute your draw methods, if that happens across the vsync boundary,
you will get the jank. There's nothing you can do about it. So we fixed it for you.
>>Chet Haase: Like this. So now we pulse everything in the system related
to rendering, animations, graphic input events, and then rendering results from those actions.
And we key it all off of vsync. So, basically, we give ourselves as much time as possible
within a frame to finish all those operations, get all the information to the buffer so that
by the time we get around to posting the buffer on the next vsync event, we're ready to go.
In this case, there's no jank because in all of these frames we had the information soon
enough that we could actually post the buffer and display the next frame and there was not
a hiccup not user. >>Romain Guy: Now if you're missing frames,
it's your fault because you're taking more than 16 milliseconds.
>>Chet Haase: There's a lot of information and advice about how things work. This is
not to say that you can't still jank, obviously. If any of these bars goes longer than 16 milliseconds
or if together they take longer than 16 milliseconds, that's a jank, too. This is just in a well-behaved
application, now that we have vsync, we have a much greater probability of actually having
a smooth experience. So display lists, as we said are a core part
of rendering with our GPU. It's cache in the intermediate rendering commands so we can
go ahead and issue they will very efficiently into OpenGL. So the process that this happens
with is a bunch of properties will be set, let's say you're fading in a view and sliding
it at the same time. We're setting translation and alpha properties on the view. All of these
cause their separate invalidates to propagate through the system. On the next vsync, we
get an event from the vsync pulse that will says, okay, time to actually render. So we
go through and update the display lists that have changed because of these operations.
We don't update everything, but we update the views that were changed. And then we draw
the display list in its current state. So what we found in this release was there
were certain key properties that were being animated over and over again, let's say in
launcher going between all apps and the home screen or swiping back and forth that we could
handle much more efficiently by simply setting some properties in a data structure. So we
called this display list properties. So rather than, on the previous slide, if you're setting
alpha and translation properties, instead of recreating the display list for all of
the affect views and then redrawing the Display, what we can actually do is simply set some
properties that the display list basically grabs on its way to the grocery store, on
its way to talking to GL, it says where are you now and what's your alpha value? And then
it draws it in there. Much more efficient, and there should be a little graph in here,
maybe on the next slide. It will be very exciting. This is specific to the properties that we
added in 3.0, which are the transform properties, translation X and Y, rotate, scale, as well
as the alpha property. So if you're setting these properties correctly or running object
animators or view property animators that are setting these properties, then they will
be taking this much more efficient route of simply setting some properties instead of
causing the invalidations, which costs time, as well as the rerendering of the views that
were affected. And here's the pretty graph.
>>Romain Guy: Here's why it matters. So in blue, you can see the time -- so this is taking
the example of you're in launcher, you click on the O apps (phonetic) button and (indiscernible)
animation to take you to a list of all the installed applications. So in blue, it's the
time we were taking to run through the Dalvik code of launcher to run all the draw methods
on ICS. The red line is the time we take on Jelly
Bean, because it's using display list properties, so now we don't have to run the draw methods
of any view in the system. We just go poke at the display list. We can avoid, like, most
of the invalidate work, and then we just start drawing the display.
In this particular case, we gain about up to 2 milliseconds per frame, which matters
a lot when you are trying to hit a target that's maximum 16 milliseconds.
>>Chet Haase: And this was an animation that was specifically either going into all apps
or back from all apps. So this is just data that I collected at the time to make sure
that I hadn't wasted my time for the past couple of weeks.
Somebody made the comment at the time, really, one millisecond, you save one millisecond,
does that matter? Think about 60 frames a second. That's 16 milliseconds in which you
have to accomplish everything; right? Every millisecond counts, especially in something
you're going to run really commonly like this. So, yeah, maybe I only saved a millisecond,
it was only taking one or two or three. But what if it was doing something more at the
time and it was just enough to push it over those 60 frames per second boundary? Then
we just avoided jank. >>Romain Guy: And as far as the UI toolkit
is concerned, that's a pretty simple animation. We're only changing the scale and the alpha
property. But if you were to change 20 of those properties at the same time, the gains
would be much, much higher. >>Chet Haase: Parallel processing. Let's talk
about triple buffering. So one of the things that we did in Jelly
Bean was to enable more parts of the system to work in parallel to give us more benefits.
If you think about it, there's three distinct parts of the system that all need to talk
to the GPU at one time, where we're doing work on the CPU that needs to sort of grab
a buffer and have a place to store the information. Then somebody else is going to be doing work
on the GPU. They need access to that buffer to get the information that we stored there
and then talk to the GPU and say these are the drawing commands, and then OpenGL is actually
doing work to process those commands as well. And, then, finally, the display system is
-- in queuing and dequeuing and posting the buffers, it needs access to the same information.
In the previous version of the platform, we were double buffered, which is very common,
which means I can be displaying a buffer while you're working on it. Well, there's three
people that work here; right? So, yeah, maybe you're working on it and I'm displaying the
buffer. But somebody else is waiting for one of those to free up to do their part of the
work chain. So what we did was enable triple buffering
in a lazy way where, when you need it, it will be there. So this -- we can get the low
latency the double buffering gives you, but we can also get the consistency that triple
buffering can give you, as shown in the following fascinating diagram.
Here is a very well behaved application where vsync, so we're drawing at the beginning of
the frame and can do all the CPU and GPU work in enough time that we're ready for the next
vsync interval, which means we're displaying buffer A, B, A, B, A. Life is good. But what
if some of these operations took a little bit too long? Which means that buffer B doesn't
get displayed, we're not finished with buffer B in time to vsync to it to flip the buffer,
which means that we're going to see buffer A twice. That's a jank. And then, finally,
we get around to the next vsync interval. Finally buffer A is freed up so that we can
operate on it again. And you can see all the wasted time at the end of that second frame
where we're just sitting there, waiting for buffer A to be free.
And, again, maybe we take a little bit too long and then we jank again. Right?
What we can do instead is allow these things to happen in parallel, so we may still have
a jank the first time we took too long to draw, but after that, we allocate a third
buffer, and that buffer is available immediately. So even though the display is still sitting
on buffer A, our CPU operations can actually be working with buffer C at the time, because
the GPU is busy with buffer B. And all of a sudden there's three pieces that are in
play at the same time for the three different components that need them, and we can get
-- yes, we will have an occasional jank when the system gets into this state. But then
we're going to have much more consistent performance after that.
>>Romain Guy: So to improve input latency, we removed the buffer, and to improve frame
rate consistency, we added the buffer. That makes no sense to me.
[ Laughter ] >>Chet Haase: All the stuff we're talking
about is actually far more complicated than we're unloading on you today. But these are
the sort of high-level concepts behind them. >>Romain Guy: So we wanted -- we mentioned
window composition before. I just wanted to explain really quickly how window composition
works. So window composition can use the GPU and
something we call a hardware composer. Besides the GPU, hardware often has the notion of
what to call an overlay, which can be seen as a bit map. Hardware usually has a limiting
number of overlays you can use to composite things together without using the GPU. Most
of the time, what we try to do is use overlays, we try to put the window of your application
inside an overlay. If we can fit all the windows on screen inside the overlays, the GPU is
completely free and we can keep those resources for the apps.
Unfortunately, sometimes we cannot use the overlays for all the windows on screen, and
we have to use the GPU. So you will see here on the left, we have something called the
frame buffers. So instead of sending your window, you're drawing into an overlay, we
send it to what we call a frame buffer. Surface flinger has to take them and compose them
together using the GPU and send that to an overlay.
The problem is that when we use the GPU, it takes time, like, GPU have a limited number
of pixels they can manipulate per second. Anything we do on top of what the application
is doing can be a source of problems. So now I want to show you a few tools that
you can use in your application to identify frame rate issues and fix them.
So the first one is called dumpsys gfx info. You have to use a command line to use it,
so you run ADB shell, dumpsys gfx info. But before you can do that, you go into developer
options, and so labeled on your Jelly Bean phone. And then at the end, you'll find an
item called profile GPU rendering. Turn that on. Then make sure to kill your application
first or at least to kill the window that you want to profile, and then you can run
the command. So I'll show you what the result of the command
looks like. I have a Nexus 7 hooked up right here. And
I'm just going to scroll a few times in the settings applications, and I'm going to output
the results. So I'm just scrolling. Run the command.
You'll get a lot of information about all the running processes. If you want, you can
specify the name of your process to eliminate a lot of information.
We mentioned this, too, in other talks. If you want to know more about the other information.
But what happens is those three columns of data that you see here. So you find the three
columns. And what you will do is just grab all this data, copy and paste it into a spreadsheet.
And then you will get a result like this one. One sec.
So this is the data you can grab. Here, I just create a stack graph, so every bar contains
the sum of the three columns. So the first column on the left in the data that we output
is the time it takes to update the display list on every frame. The middle column is
called process display list. It's the time we take to draw the actual display list. And
the last one is the time we take to swap the buffers, so to give the buffer back to surface
flinger. So here, when I was scrolling through settings,
you can see that when we do the sum of all those data, we are well below the 16 millisecond
limit. So this app is running at 60 FPS, we're vsync'd, everything is going great. And you
can see that most of the time, you should be spending most of the time in process display
list. So drawing, executing the display list should be where you spend the bulk of the
time. The blue part is your code. When you write
your Java code, your on draw method, that's in the blue bar at the bottom. And this is
where you can do most of the optimizations. Next, I wanted to show you Systrace. So Systrace
was mentioned several times as to -- if you attended the tools talk earlier today, you
will -- you saw what it looks like. It was mentioned during the keynote.
And I wanted to show you how you can use it to identify issues.
So let's imagine that you have an app that's misbehaving. You can use Systrace to understand
what's -- what in the system is making your app misbehave.
So, first of all, you have to enable Systrace. So you go back to developer options in settings
and then look for something called enable traces. You'll get a little dialogue. And
here, you can see the type of information that you want to trace.
In this particular case, we're only interested in graphics performance, so we're going to
set a graphics and view. There is a lot more you can use. If you're
doing audio processing or video playing, you can enable that.
Now, using the tool, once you have that enabled, is very simple. So you go back to your terminal.
Let's remove this. And I'm just going to capture a trace while
scrolling through settings. So the same test as I just did with the dumpsys gfx info.
In the SDK, you have to go in the directory called tools/systrace. And there, you'll find
a Python script called Systrace.py. So you run that, and while it's running, just make
your app react. Here I'm scrolling the ListView in settings.
>>Chet Haase: Captures the trace for five seconds.
>>Romain Guy: Yes. So you get an HTML file. Going to open.
And this is the result. So this gives you an overview of everything
that's going on in the system at the time you took the trace.
>>Chet Haase: In very comforting pastel colors. >>Romain Guy: It's very interesting, because
here you can see how many CPUs were active when I started the trace. This is a quad core
device but I needed only one CPU to get the job done, so we can see that only one CPU
is active. If I zoom in on it, so the UI is a little complicated, you have to use, if
you like, Doom or Quake games on PCUs, WASD, it's the same (indiscernible) to navigate
in the tool. We're going to get a better UI, I hope.
So we zoom in on the CPU, and you can see exactly what the CPU is doing. So, for instance,
here, if I click on it, I can see that settings, the settings application was doing something
for about 2 milliseconds. You can see that next, we had something called binder, so a
binder thread that was doing something for a tenth of a millisecond. It's a really powerful
tool, because you can get a lot of details about the system.
And the important part when you want to improve performance in your application is to look
at the process called surface flinger, remember, surface flinger is in charge, ultimately,
of putting your pixels on the screen. What we are capturing here, every bar that you
see is basically one frame. You can see here everything is very regular, so we're posting
our frames on the vsync at about 60 FPS. The only two places that we see here is because
I just reached the end of the list when I was calling it, so for a fraction of a second,
the time it took me to start going the other direction, nothing was happening.
You can also zoom in on your application, and here, you can see the graph we are talking
about. So deliver input event, this is where we received the touch events. This is where
we deliver the touch events to all your views. This is where you run your own code if you
intercept touch events. And then we have this perform (indiscernible) source method. This
is where we draw. This is where we dequeue the buffer. So we get a buffer from surface
flinger. Then get display list, this is what we call
update display list in the diagram. So here we spent about 1.2 milliseconds going through
the views in the UI toolkit, asking them to recreate that display list. So it was very
fast. Then draw display list, this is where we execute the display list, and you can see
it took only 2.3 milliseconds, and then we swap buffers and give the buffer back to surface
flinger. When we are done, if you look at the surface
flinger process, you can see that we finished our work and at the next vsync, surface flinger
takes our buffer and posts it on screen. So this example, I mean, just shows you how
to use the tool. It's pretty simple. You can do multiple selections to see everything that's
happening, you know, in a given period of time.
What's more interesting is to look at the tool when things go wrong.
So I won't show you the application, but I wrote a little application, it's a simple
ListView that's doing something bad. And so when you scroll the list, it's very janky,
very choppy. We're going to look at the output of Systrace to try and understand what was
going on. So let me find the trace.
Now, this one -- there we go. So looks the same. But already, when you look
at surface flinger, you can see that all the frames that we post, it's not regular anymore.
We have those big gaps. And if we zoom in on the application I was running, you can
see -- and we look for drawing, so here we are drawing, you can see that we're spending
only 4 milliseconds drawing. So the app should be smooth. The problem is not in our drawing
code. But if you look at what's happening between two frames, we see this huge block
called deliver input event. So because it's a ListView and I was calling
it with my finger on screen, that means we were spending time in the dispatching of the
touch events. In the case of a ListView, we called the adapter method called get view
while you are scrolling the screen. So chances are that's where we are spending most of our
time. Here, we can go further. If we zoom in on
this block here, there's a tiny bar, maybe it's hard to see, but you can see the state
of the thread. So here at the beginning for about 5 milliseconds, my process was running.
I was actually running code. And then after that, it's blank. That means that my thread
was waiting on something, my process was sleeping. So what we can do is go all the way to the
top and look at the CPU and see what the CPU was doing at the time. Here we can see that
a thread called binder 1 was doing a lot of work. At the bottom, you can see the thread
ID for binder 1. And then you can go back to the shell, and using ADP shell PS, so where
you can see all the processes running, and PS-T to see all the threads running, you can
identify that thread. And in this particular case, the thread was identified as a thread
that belongs to the contacts process. And it was slow because my application was
making a query for the content provider of contacts between -- in get view every other
frame. So that's what was blocking the application.
And usually when you see -- when you identify an issue like this, you can stop using Systrace,
and then you can use Traceview. So how many of you have used Traceview before?
Just raise your hand. Okay. That's pretty good. Everybody should
use Traceview. That means you either don't care about performance or your application
is awesome, in which case, congratulations. [ Laughter ]
>>Romain Guy: I wish all my applications were that good.
Let's see. So it's a trace I captured in the same application.
But you can invoke Traceview pretty easily yourself from the DMS or Eclipse.
So this is what it looks like. When you see a blank section in Traceview, it means that
your thread, your application was not running. So if you see one of those things, you can
go in Systrace and Systrace will tell you what's going on elsewhere in the system. So
we've already identified that we are waiting on this binder 1, we're waiting on the contacts
process. If we look at this block and we navigate through the parents, we can see that we were
doing a content resolver query. In my list activity, I have something called a slow adapter
that makes it easy to find the bug. So in the get view method, I was doing a query,
calling a query, and we were spending about -- oops, sorry -- we were spending about 52
milliseconds per call was spent doing the query.
So, obviously, I should stop querying the database on every frame.
>>Chet Haase: So there's an interesting relationship between Traceview and Systrace. Actually,
Traceview kind of inspired at least my desire for something like Systrace, which, fortunately,
people wrote. That was great. Because sometimes you look in Traceview and you say, well, actually
my stuff is not taking very long, but in the middle of a method call which I know for sure
is not doing anything, my thread is just swapped out and there's something else going on in
the system maybe regularly. And Systrace allows you to see what that other thing is that's
happening. Maybe you have a service running somewhere that is syncing on a regular basis
and, basically, stealing CPU cycles away from you. So they're both useful in their own way,
we don't do the per-method tracing in Systrace. It's more to get a system-wide overview of
what's going on. Traceview is really important to use to see what's actually going on in
the methods of your -- >>Romain Guy: So and all the vsync and triple
buffering work was actually made possible thanks to Systrace.
So just a reminder. Systrace by default will capture five seconds of traces. You can change
that as a comment line argument you can use. It will output an HTML file. The UI is not
that great right now. But we will improve that. And the benefit of the HTML5 is that
you can attach it to a bug report, send it by email. You don't need a special tool to
visualize it. You just open the HTML file in Chrome and you're good to go.
I mentioned before, window composition, I made a distinction between the GPU composition
with frame buffers and overlays. So you can use a tool called dumpsys surface
flinger to see the state of overlays and frame buffers in the system.
If you run that command, and I'm not going to run it because we're running out of time,
you're going to see, you're going to get a huge amount of logs. And somewhere towards
the end, you're going to see a little table that's going to list all the windows currently
visible on screen and where they are not, they are on overlays. Here I wrote a simple
application, the same application I showed you in Systrace. You can see we have the status
bar and we have the navigation bar at the bottom with the home button and the back button.
And everything is in an overlay. Everything is great. We're not using the GPU for the
window composition. You have access to the entire GPU inside your application.
If I modify the application to invoke a popup window, then we are running out of overlays.
That was on the Nexus 7, when we run out of overlays, we have to revert back to GPU composition.
So suddenly, I have three windows that are in frame buffers.
So the time it takes to composite those three windows together is taking away from my application.
So here, if I see that in my application and I need the screen to be really fast, you should
think about whether or not you need that extra window. Like, maybe you can turn that window
into a view in the main activity. Now, be very careful when you use this tool,
dumpsys surface flinger. We have a special application after the application is done
drawing, everything reverts back to overlays. Make sure that you run this comment as the
application is drawing, when you're scrolling, when there's an animation going on -- what
do you mean the other way around? Sorry, yes. I'm glad that we have a surface
flinger guy in the room. It reverses to frame buffers when you're done drawing. And there's
a good reason for that, basically, to save battery.
So take away, make sure that your application is drawing when you're running the command
or the information you're going to get is not going to be very useful. That's what I
just said. So a few other tools you should use. Traceview,
we just saw Traceview. Heirarchy viewer, for those of you who don't
know what this tool is about, you can check out the documentation on developer.android.com.
And if you're not aware of it, this tool will not work -- it's very useful to develop your
UI, but it will not work on retail devices. So if you go to the store and buy a device,
Hierarchy will not work. So on GitHub, I put a little library called
U server. It's one class, one Java file that you put in your project. Just read your documentation.
You have to add about two lines per activity to make it work. That will enable the use
of hierarchy in your application. So.
Tracer for OpenGL, if you're writing applications, you can look at it. If you're using hardware
acceleration in your application and your frame rate is choppy, you can take a look
tracer for OpenGL ES to identify what view is taking so much time.
There was a demo of it earlier today. Allocation tracker. How many of you have used
allocation tracker before? That's awesome. The number is increasing every year. I like
that. It's funny, as I was taking this screen shot
a couple weeks before we finished Jelly Bean, I identified an allocation that the framework
was doing several times per frame. >>Chet Haase: I don't know where that came
from. >>Romain Guy: That's okay. We're just allocating
dozens of exceptions in all their strings, per frame.
[ Laughter ] >>Chet Haase: It was exceptional coding.
>>Romain Guy: It's good that we have tools. We have two types of tools, this one and this
one. [ Laughter ]
[ Applause ] >>Romain Guy: All right. Now the part that
you're probably most interested in, tips and tricks. What can you do in your application
to solve the problems that you identified with the tools.
So these are the things that you can fix. You can make the frame rate more consistent.
You can lower the latency. You can increase the speed of drawing display list. You can
increase the speed of updating display list, and, finally, you can free up the GPU for
your own application. So first one.
>>Chet Haase: Is about allocations, related to the allocation tracker.
>>Romain Guy: How many of you have used this key word before?
>>Chet Haase: Okay. Seem to know how to use it.
The best way to use it, in fact, is like this. Don't use this. In particular, obviously,
we need to allocate objects. Don't do it during an animation or during your interloop or during
performance-sensitive operations or methods of these.
Because, basically, even though that looks like a really small temporary object, it may
take up significant time actually creating the object, and, more importantly, it creates
garbage that will have to be collected at some point in time.
We put a fair amount of effort -- besides that previous bug that we were talking about
-- into making sure that we are not doing allocations in the middle of our animation
routines or our rendering logic, because we don't want to cause jank in the middle of
an animation. And one of the ways that that happens regularly is by creating little bits
of garbage here and there and then in the middle of that 500 millisecond animation,
you're going to pause for four 4 milliseconds, which may be enough to shove you over the
frame barrier. So avoid allocating if you can. There's various techniques for this,
transient, static objects that you use, pass around, or whatever. But try to avoid it when
you need to. This will give you more consistent frame rate
overall, because, basically, you avoid those hiccups.
>>Romain Guy: The next big thing you can do is stop writing code. Just do as little as
you can. It's pretty obvious. But we had a great example. In one of our applications
that shipped on Jelly Bean, we identified a performance issue. What was happening was,
in the get view method, it first was creating an object. But the constructor in that object
was doing about 200 string comparisons. And we were actually spending most of our time
doing the string comparisons instead of drawing or doing the layout.
And, you know, it looks innocent when you write the code. And over the years, like,
your coding will be used in different places or differently than you intended it to be.
So be very careful. Use Traceview, use Systrace. If you do so, you're going to improve the
consistency of the frame rate and you're going to lower latency.
>>Chet Haase: Choreographer is a new class that was introduced, new API that was introduced
around vsync capabilities. Most of its capabilities are actually under the hoods. It's basically
the logic in charge of making sure that everybody's actually running on that vsync pulse, the
animations, the input events, and the rendering stuff. So you don't topic talk to choreographer
directly, but you can. If you are using the system in an atypical way, then you can actually
link into the vsync system that everybody else --
>>Romain Guy: Just like the tools we showed you, choreographer, that's what we used in
the platform. So all the vsync work we talked about, the UI toolkit uses choreographer to
make it happen. >>Chet Haase: And the way that you would hook
into this, actually, you hook into it automatically, just through using the mechanisms of the platform.
If you're using animators, those animators are actually running are actually running
on the vsync pulse provided by choreographer. You don't need to do anything. But if you're
doing something else, if you're doing your own custom runnable animations that are getting
posted later, you can hook into the same vsync pulse by calling methods such as the new post
invalidate on animation. >>Romain Guy: And post invalidate on animation,
and this one, post on animation, are available in the support library. So on previous versions
of the platform, they will behave just like a post runnable or post invalidate. And on
Jelly Bean, you'll automatically benefit from the vsync.
>>Chet Haase: And there's also a way to get -- to register a call back that gets called
on the next frame. All of these are one shot deals. You basically post a request, you get
a call back. If you want to be called back on a regular basis, you would post another
one. >>Romain Guy: This one in particular post
frame call back if you're writing a game or some OpenGL animation, this is probably what
the -- the API you want to use. It's completely independent from view. So just take a look
at it and use it. I think this one, though, is not in the support
library. >>Chet Haase: So using choreographer helps
give you that consistent frame rate because now everything in the universe is synced on
than vsync pulse for obvious and wonderful reasons. And lower latency because then you're
reducing the amount of time that you're actually spending in the frame. You're doing things
as soon up-front during the refresh frame as possible.
>>Romain Guy: Layers. Use hardware layers. We talked about layers a lot last year. So
if you go back to our Google I/O 2011 talk, Android accelerated rendering.
>>Chet Haase: Accelerated -- >>Romain Guy: The talk we gave last year,
we went on and on and on about layers. So if you want to know more about layers, go
watch that talk. Basically, what you want to do if you're writing
animations and you're using the view property animator, so first of all, the view property
animator has tons of optimizations under the hood that we cannot apply with normal animations.
So you get a lot of benefits from using it. And we introduced -- well, Chet introduced
a new API this time around called withlayer. We will automatically set up a layer on your
view at the beginning of the animation, and we will remove it at the end of the animation.
So do not go wild and use withlayer on any view. Try to do that on large views or complex
views that have tons of children. But don't do that on a button or just a simple text
view. So consistent frame rate. Faster display drawing.
Clipping. Clipping is very important. The UI toolkit, does a lot of clipping for you.
It's one of the biggest optimizations that we have and that you have at your disposal
when you're writing custom code. The first part of clipping is to -- is about
doing proper invalidations. So it's very tempting when something changes in the view to just
call view.invalidate. It's easy. It works. Now, what we want you to do is call invalidate
and tell us what part of the view really needs to be redrawn. Because if you do so, we can
avoid a lot of work. So if you call invalidate on a view, so let's we have the view at the
top that's the tree of display list for the view and its children, and you call invalidate,
we'll have to redraw everything. It's a lot of work, and in the case of a ListView, for
example, or a complex view, it can take a lot of time.
Now, if you do an invalidate with a specific (indiscernible), what we can do is we can
reject views on the display list that are outside of this (indiscernible), and we do
a lot less work. And in very complex applications, when you
have hundreds of views, it can be very, very important, because we're going to do that
rejection work as early as possible in the tree so we can get rid of most of the views
and just ignore them completely. So if you're spending too much time updating
display list or drawing display list, you can do that.
And if you want to know if you're doing things right, go to developer options. There is a
new option called show GPU view updates. If you turn that on, I hope you're not epileptic.
The system is going to Flash in red quickly, the regions of the screen that we redraw.
It's very useful to see exactly what your application is doing.
>>Chet Haase: So this is a tip that came out of one of the applications that we were working
on during Jelly Bean that's. >>Romain Guy: Google Now.
>>Chet Haase: Yes, that one. So we're a 2D API. We have no idea of the
structure of your application. All we know is the rendering commands that you give us.
So if you have a view that's really complex and you're drawing lines and text and bit
maps and all the stuff in the view and then you have another view that's drawn directly
on top of it, what -- we're going to draw the first one, and then we're going to draw
the second one. And the user is not going to see most of what we spent that time drawing
for the first view because it's covered by the second view. So the idea here is, you
have that information about your activity. We don't. Well, you can tell us about it.
So you can actually tell us the information about what's being clipped out. So, basically,
don't waste our time trying to draw that stuff if it's not going to show up to the user anyway.
So in this case, there are two cards here. The first one you only see that header information.
And the second one, you see all of the content in it.
So maybe in the first one, you can actually just tell us, you know, what, don't draw the
stuff beyond here, because we're going to -- you're going to waste your time doing that
anyway. So the red region that we're showing here is the overdraw. It's the time that we
wasted drawing all this information that didn't appear to the user.
The easiest way to do this is to simply tell us to clip out that information. So you know
that you're being displayed with something else on top of you. You're only in header
mode where you only want to show the stuff at the top. Set a cliprect and then go ahead
and draw your content. You're going to spend a little bit of time drawing that content.
So maybe the simpler logic is, you know, that's okay. You are spending a little bit of time.
But in the meantime, you've given us a really important piece of information that when we
go to talk to OpenGL and say render the following commands, we're going to check it against
the cliprect and clipreject it immediately and just get rid of that information without
actually bothering to render it. So, again, you set the cliprect. And we're
not drawing the rest of the stuff in it. We're just that header, and then we draw the other
thing on top of it. And it gave this one application a big performance boost, because they had
several of these stacked cards, and they were basically wasting a lot of time drawing information
that never appeared to the user. So make the display list issuing much faster
by simply not having us do stuff that really doesn't matter.
>>Romain Guy: And you can go even further. If you use cliprect, there's a method on Canvas
called quickreject. You pass a rectangle and the method will tell you whether or not that
rectangle will be visible on screen when it comes time to draw.
So this is what we use a lot inside the UI toolkit to know whether or not we have to
draw a view. But if you set your own cliprect, you can avoid running extra code in your application
by just checking the state of the cliprect. So, for instance, here we have a list of items.
We have set a cliprect. And we can check whether each item will be visible or not. If the item
is not visible, we can skip it entirely. So we're going to avoid doing extra work, running
Java code that will generate drawing commands that we'll have to queue in the display list
and so on and so on. Our displays have a lot of optimizations around
clipping. We're going to try to do clipping ahead of time so that we don't have to do
it on every frame. But by doing this, you're going to avoid running extra Java code, which
can matter a lot. So if you do a quick reject, we're going to
make display list drawing faster, and updating display list will also be faster.
>>Chet Haase: So this is an optimization that came out of another application that shipped
on the device. This came from the contacts application, where they were having a janky
experience when animating one of these cards. You'd click on the icon, the thumbnail, and
it would animate up. And you'd get this nice experience where you see the activity below,
and then you get this sort of translucent, dim thing on top of that, and then you get
the contacts in full view. And when it was just static, it was great. When it was animating
into view, it was horrible. And we used Systrace and looked at what was going on in surface
flinger to see where the hiccups why coming from. It turns out it was coming from dim
window. So if you set an attribute on window manager and the layout params. You say dim
behind. Then, basically, the thing behind your current window will dim, you'll get this
nice translucent effect. And it's great if all you're doing is looking at this statically.
But if you're actually running a lot of code at the same time, in this case, the contacts
application was running an animation in the view on the top, then, basically, you're asking
the GPU to do a lot of work because you're shoved into that frame buffer versus overlay
situation. So all of a sudden now we have the GPU compositing frame buffers together
to put them into an overlay to get composited onto the screen. And at the same time, you're
doing a lot of work on the GPU just to draw your application because you're trying to
animate this thing in at the time. There was a very easy fix to this. It was
a very nonobvious problem until we looked at Systrace output and saw what was going
on. But there's fortunately a very easy fix.
Don't use dim window if you're running into this sort of situation, especially when running
an animation on that thing on top. Instead, you can simply set a background that
is the color or the translucent color that you need. You can get the exact same visual
effect without actually introducing that extra window into the hierarchy that tossed everything
into frame buffers and caused the problem to begin with.
So in this case, you're going to get faster display list drawing, because we're simply
not doing as much stuff in parallel with the surface flinger also doing stuff on the given.
And, again, you'll get faster composition. So I think the takeaway here is, obviously,
especially in America, with our love of food --
>>Romain Guy: I have to say, this slide is entirely his fault, okay?
>>Chet Haase: I will take the credit. I'm okay with that, because I have to own it.
Spread the word. [ Laughter ]
[ Applause ] >>Chet Haase: Thank you.
[ Applause ] >>Romain Guy: If you want to know more about
performance in rendering and graphics, check out our Google I/O 2011 talk. And you can
also go on parleys.com, where we have a bunch of talks, videos, and slides where we also
touch those subjects. And we have exactly two minutes left for Q&A.
If you have a question, walk up to the mike, and we'll try to answer it.
>>Chet Haase: We'll probably be -- we're going to be doing -- watching some talks this afternoon.
Otherwise, in and out of Android office hours. >>Romain Guy: You should go to the next session
by Jeff Sharkey. You will learn even more about performance.
>>Chet Haase: Yes. Yes.
>>> So one question is, I've got a ListView with a ton of images. They're drawing. And
I think that the drawing of those bit maps are slow. And so I have two questions. Number
one is, is there a way -- I heard that compressed textures are faster to draw, but I had bit
maps that I'm passing as bit map drawables to an image view.
Is there any way to use compress textures there?
>>Romain Guy: No. We won't be able to use compress textures.
How big are your drawables in how big are the images?
>>> They're 256 by 256 bit maps. Like, 20 on a page.
>>Romain Guy: That shouldn't be an issue. The issue is probably somewhere else.
>>Chet Haase: One question, too, is are you -- are they drawing at that size or are you
scaling them into a different size? >>> They're at that size.
>>Romain Guy: So that's -- I would say it's almost definitely not the issue.
>>> Well, so it's definitely also, like, loading those bit maps, especially as JPEGs. And,
unfortunately, there's, like, thousands of them on a page. So I was wondering if I could
store them as compressed textures and then load them directly to you guys.
>>Romain Guy: Yeah, you won't be able to do that. But use Systrace and Traceview and you'll
be able to figure out what's going on. I really doubt that the issue is drawing.
>>> Okay. >>Chet Haase: I think, given 30 seconds, I
think we're out of time. We should probably get off the mikes.
Thank you. [ Applause ]