Faster HTML and CSS: Layout Engine Internals for Web Developers


Uploaded by GoogleTechTalks on 17.11.2008

Transcript:
>> Thanks for coming. This talk was started--we were thinking about doing some experiments
and around performance in CSS and search the web with search engines found very little.
There's not a lot of research out there on this so it occurred that maybe we should go
to the guy with the secret knowledge. How does this stuff really work? We're poking
at the black box but we don't even know how it works. So, I want to introduce David Baron,
he's a software engineer at Mozilla where he's worked since 1998. He's a member of the
W3C CSS standards working group and he is here to present.
>> BARON: Thanks. So, it will be a little interesting because right now I have the slides
there but I don't have them here. So, you know, I make it a little confused to which
was I'm moving the mouse at some point, but in any case. So, what I'm going to talk about
is really the--so, just--well, one little pointer is that, if anyone can't see them
I also put the slides on the web although they--at dbaron.org/talks. But they, sort
of players are relatively recent browser to look at because I basically wrote them against
Mozilla trunk using HTML and SPG. So it's a little fun. Anyway, essentially what I'm
going to talk about is, it's sort of a how browsers work from my perspective which, you
know, you ask any engineer who works on web browsers how they work and they have, you
know, they work more in a different area and they're going to be biased towards talking
about that area of the browser rather than some of the other areas that they have less
experience in. So, I'm going to try and sort of give a--how browsers were talk but added
it pretty heavily to focus on the parts that are relevant towards what authors can do to
make webpages faster. But it's still fundamentally sort of this is how a web browser goes from
getting stuff on the wire to just playing stuff on the screen but with a lot of other
comments interjected in. So, when--I'm coming at this from the perspective of somebody who
works at Mozilla. You know, so I work on one web browser. I don't--I know a lot more about
one than I know about the others. So a lot of the details that I'm going to sort of,
that I'm going to get into, some of them are sort of pretty common across browsers whereas
some of the others vary more between browsers. And I'm not even necessarily sure in some
cases which are which. And beyond that a lot of these things vary between browser versions
in that--when you're writing a web browser you have a lot of compatibility requirements
in terms of what you have to--basically anything that you've shipped before that other browsers
do too, you have to keep doing. And it's not necessarily that way with performance. If
you're slow at something before nobody is really going to complain if you find some
way to speed it up. So we sort of have a lot more ability to change the performance characteristics
of our loud engine than we do to change the output in terms of, you know, visual output
or behavior or other characteristics. So an example of that would be like--in Firefox3,
we made some pretty significant changes--well, there where many significant changes between
Firefox2 and 3 but one of them that's relevant here is that the way we handle style changes
changed drastically so that we would coalesce separate style changes and process them all
at once rather than processing them all separately which could pretty significantly change performance
characteristics of pages that were exercising that. But, you know, it doesn't break anyone.
We have the ability to do that. So, with that preface in mind, I want to just sort of dive
in and talk about the data structures we have in web browsers and some of the things we
do. So, sort of one of the central data structures we have is the DOM tree or the contact tree.
We call it a bunch of different things. But basically, HTML, this bunch of tags is a serialization
of a tree structure and most modern browsers actually turn that into an actual in memory
tree structure. This--in some older browser that actually wasn't the case, but these days
most browsers actually have a tree in memory and when you use the DOM APIs that work on
a tree, there's an underlying tree data structure that looks pretty much like you think it would
look. So, you know, a simple HTML document has a bunch of HTML element nodes and bunch
of text nodes and so on. And, you know, the types of the nodes in this content tree or
things like HTML elements and then there are specific types of HTML elements. And they
differ based on, you know, based on what DOM methods they have. You can also have an SVG
DOM tree or if you have something like the slides I'm using today, you can have a DOM
tree that mixes both of them. But the thing about this tree structure is that the types
of the nodes are related to the types of the elements. Then--now in addition to that and
this varies--this starts to vary a little bit more between browsers but I think it's
still reasonably similar. We have a second tree structure that represents the--what we
render for all of these elements. So, we call it the frame tree which is sort of odd and
I will tend to use that term just because it's the term I'm used to using. People will
also call it the rendering tree. I'll probably use them interchangeably. The nodes in this
tree all represent rectangles, essentially. But the more important difference is that
the types of the objects in this tree aren't things like element type. There are things
like CSS--the values of the CSS display property will mostly correspond to the types of nodes
in this tree, so block or inline or various table types or text nodes. In many cases there's
a one to one correspondence between the nodes in these trees, but in some cases there isn't.
For example, a node that has CSS display none wouldn't create any nodes in the rendering
tree. So you--like you see in this slide, the head element doesn't have--there's no
nodes in the rendering tree pointing to the head element because we make the head element
display none, so there's just nothing generated there. Likewise, there are cases--especially
when we break elements across lines or pages where you'll have multiple rectangles representing
a single element in the DOM. So, with these two data structures--now, I want to sort of
walk through the process of what we do as we display a webpage. So, for a start, there's
sort of--I'm actually going to start off to the left edge of this slide. But we start
off, you know, we're just--we're reading HTML. Things like parsing are mostly linear time
in the length of the thing you're parsing. It's not--there's nothing all that complicated
in terms of parsing that--at least that I'm interested in. Then again, it's sort of not
like, not as much my--the area that I work on, so I'm sure there are other people at
Mozilla who could talk about performance aspects of parsing for quite awhile. But there are
actually some--in some cases, parsing--the process of parsing a document actually ends
up not being linear time, not because--not so much because of the algorithms but because
of the way it's done incrementally. And in particular, if one element has a very, very
large number of children, you end up with some quadratic time algorithms showing up
just because of the process of incrementally loading a document and incrementally adding
those children, there are--they're sort of a very small quadratic term because there's
some operations, especially when you're dealing with laying out the document and displaying
it where you're going to walk over that child list as a--as it incrementally grows every
single time, though buy and large it's linear time. So, the more interesting stuff about
the process of loading a document is dealing with things like loading style sheets and
loading scripts and loading images. Because when you're just sort of in this static version
and--what I--what I'm sort of going through now is the sort of the case of displaying
an HTML document that's just static and doesn't change dynamically which is sort of the basic
case, and then I'm going to go back over again and talk about how we handle dynamic changes
that are a lot more relevant to use of HTML and applications. But in any case, the--so,
loading images is sort of straightforward. You--at--when you're constructing the Dom
tree, you hit a node that's an image, you kick off a network load that starts loading
that image and you just keep going. It's a synchronous. It's not giving you some huge
penalty, although there's, of course issues with a limited number of it--we're limiting
the number of HTDP connections to a given server. So there's some serialization against
other resources that you might be loading. Scripts and style sheets are a little bit
more interesting because scripts have this model that I suspect a lot of you are familiar
with, where what's in a script executes at that point. So right where the script is linked,
you are executing that script. So you have to wait for the script to load because the
script could document .write, a start tag and not an end tag. It could document .write,
all sorts of things. And the programming model used on the web is a synchronous model where
your--where the--where the script has to execute at the point its loading. So when you're waiting
for scripts to load, you're often waiting for--you're essentially not even parsing the
HTML to find other things to load. That was actually true until yesterday on Mozilla Trunk.
We actually landed a patch yesterday that finally sort of speculatively parses the HTML
after the script on the assumption that the script isn't going to do anything too serious
and starts initiating the network loads for things that it finds linked there. But even
so, this is--this is something that you have to be pretty careful with. Style sheets are
sort of in between case because the idea with style sheets is that they can have drastic
effect on the rendering tree but they have no effect on the Dom tree at all. So, you
really want to wait for style sheets to load before you construct the rendering tree but
you can keep building the Dom tree and potentially executing script even while they're loading.
That said the way that's happened has changed overtime in Mozilla. The original way this
was implemented is somebody said, "Oh, hmm, we need to do this. Why don't we just reuse
the code for script?" So in the old days, we actually did all the things for--that we
do for script also from style sheets. That's changed such that we will load a style--we
will now continue parsing the HTML, continue loading the page while we're waiting for a
style sheet to load. But potentially that means you're running scripts and those scripts
could potentially ask for the result--for a layout information which means, suddenly
we need a rendering tree in order to give the script the information they asked for
which can potentially produce a problem that web developers hate so much that they've given
it a name which is Flash of Unstyled Content. And since we started doing this for style
sheets in Mozilla, we've actually started having this type of problem in a few rare
cases where a page asked for layout related information. So, this is all sort of the preface
to building up this content tree. Once we have the content tree, we then have to go
decide how to--what types of objects to put in the rendering tree. And since the types
of objects we put in the rendering tree depend on CSS styles like the display property and
some other properties, we actually have to compute the style for the element in order
to construct the rendering tree. So, the next thing I'm going to talk about is CSS selector
matching which is sort of fundamentally, it's from an algorithmic perspective, it's sort
of looks like it aught to be a bit of a performance hotspot although it actually ends up not being
not bad in most cases because of the way we optimize it. So the basic idea of CSS selector
matching is that you have a set of elements in the content tree, and you have a set of
CSS rules and for each element you're asking does this rule--does this selector--if this
selector matches this element then we'll use this rule. So fundamentally, you have a problem
where you're running this algorithm for every pair of element and selector which can add
up to a lot. So the question is, first of all, how we optimize that? And second of all,
what it is that can make that more or less expensive? So I actually want to briefly step
through the un-optimized version of CSS selector matching just so that it's clear how this
works because it says a few things about what types of CSS selectors can be faster or slower.
So, the--a CSS selector--the things over here on the right side of this slide are examples
of pretty simple CSS selectors. The first one reprints a div--any div elements, the
second one represents any element with a class attribute that's item, and third is any element
with an idea attribute that side bar, the fourth is any div element that has an idea
attribute with class side--with ID side bar, the fifth one represents any p element that
is a descendant of a div element. So, this is--this is one of the things I wanted to
talk about here which is essentially the process of matching. So, if we were trying to figure
out which selectors match the body element. In the un-optimized case, you'd sort of look
at this and say "No," you know, "this--it's not a div element." "No, it doesn't have class
item." "No, it doesn't have ID side bar." It doesn't really matter fundamentally which
one you match here first. To match this one, the way pretty much all browsers do it is
that CSS selectors match from right to left. So, the first thing you look at when you're
trying to match the selector is the part that's at the right of--the part to the right of
the right most combinator where the space is a combinator that represents is descendant
of the greater than sign represents is child of and so on. So, if you're trying to match
the p, say this p element here. These rule--it's--there's only one--there's no combinators in these.
You just look at the piece, the one simple selector which is the unit between the combinators
and in these four cases that doesn't match. In this case, when you match the--try to find
out if the p matches, the right most simple selector does match, so then you look at the
combinatory and say, "We'll we want to find an ancestor that's a div." So you look up
and say, "Okay, this one matches." So it turns out that selector matches. Whereas with a
selector like ul p, if you're trying to see if it matches this p, you start at the p,
it matches, then you look for an ancestor. So you end up walking all the way up the tree
looking for an ancestor to see if it matches. So, if you're dealing with a deep document
tree, you can potentially spend a lot of a lot of time even just on a single selector,
never mind this problem of multiplying all the elements times all the selectors. So,
there are actually some--some selectors are even worse in that there's back tracking required.
For example, this one here body, you're looking for a p that is a descendant of a div that's
the child of body. If you're trying to see if this p matches, what the browser will do
is well, it'll save that p matches--matches the rightmost part. Then we'll find the div
matches the next part but this div doesn't match the next part. So since the body div
relationship is a child relationship, you have to backtrack, try to match this div against
the middle part. That succeeds but this fails to match body. Back track, match this against
div and finally you get a match the third time. So that's just--that's the un-optimized
case. Now, the way we avoid the problem of having to match every element against every
selector, is that we hash the selectors into a bunch of buckets in advance to filter out
the one's that we know aren't going to match. And on that filtering is done on the rightmost
part of the selector, in other words the part to the right of the last combinatory. So essentially,
we'll save it if an element has--if a selector has an ID in that rightmost part of the selector,
we'll stick in into a hash table for selectors that have an ID, buy it's ID. If it has a
class, we'll stick it into a hash table for selectors that have a class, unless it was
already in the hash table for those with an ID. If it has a tag name, we'll stick it in
a hash table by tag name, and otherwise we'll just stick it in the list of all the selectors
that we couldn't classify. So then, when we want to find the selectors that match, say
this div here, what we'll do is go to that or hash of selectors with IDs in them, pull
out the entry for ID equal side bar. There's no class on that div, so it doesn't matter.
We'll then go to the hash for all the selectors with--for the hash of selectors by tag name
and pull out the div--pull out the selectors that have div in that rightmost part. And
we will then combine those lists and only run those selectors. So what this is saying
is that in Mozilla, and I think this is also reasonably true in other browser engines,
your selectors are going to cause much less of a performance problem if they're more specific--if
the rightmost part of them is as specific as possible. Because then you won't even--there
won't even be any code at all to deal with testing them against all these other elements
that probably are going to fail but maybe not all that quickly in the algorithm. So--in
any case--so once we have the list of selectors that match, we have a set of--we have a list
of CSS rules that match. You take all the declarations, compute a property value for
every element and start constructing this rendering tree which is, you know, there aren't
too many interesting performance characteristics of the--constructing the rendering tree in
terms of the static case. It's pretty much you go build objects and pretty boring. Then
once we have a rendering tree, we compute all the positions of those objects which is
like constructing the rendering tree, it's a recursive process in that--and one of the--so
essentially, the process of layout which we sometimes call reflow at Mozilla, involves
essentially assigning coordinates to the rectangle for all of these rendering objects. So, you
know, traditional document layout algorithms tend to treat widths as inputs and heights
as output. So it's this--so it's essentially done as a recursive algorithm where the apparent
will have some width input. It will compute its own width, tell its children to fit in
that width and they'll add up to some amount of height and then you'll come back out to
the parent. It'll finish--determine its own height and pass that on--back up to its own
parent. Now, it's not completely true that with certain input and heights are output,
there are cases where we use intrinsic widths of content where essentially widths are output
but that's not too relevant here. Now, the code that does this is going to vary a lot
by frame type. How it's optimized is going to vary a lot by frame type. So, you know,
block--things like blocks and tables probably are optimized reasonably well. Unusual things
might not be--might not be as careful about being efficient. Then once we've computed
all these rectangles, we then come along and we want to actually display something so we
build a display list for all the things that we have to display within a rectangle. We--and
then we essentially paint that display list in back to front order using a 2D graphics
API. Now, that's sort of--the--now, in that painting process, there are some things that
make it slower. For example, if you have opacity which is group opacity, you have to compose
things and do an off-screen surface or paint things into an off-screen surface and then
compose that onto the rendering and so on. But I don't want to go into that too much.
Now, that sort of the one-pass-through simplified version for the static case. Now, when you're
building applications, you end up potentially dealing with more possibilities here. Sorry.
So, there's a lot of dynamic changes that--when you're writing a script, when you're writing--yeah,
when you're writing script that--there's a bunch of different types of dynamic changes
that can happen to cause changes in this whole pattern. So, you know, one of simplest sort
of adding and removing elements from the Dom which is something you can do with DOM APIs.
That's pretty common. Basically in that case, you sort of run through this same pattern
on the elements you added, the same static pattern in a pretty straightforward way. So
it's not all that interesting in terms of unusual performance characteristics. However
there are a bunch of other types of changes that have different performance characteristics.
So, essentially web browsers, in some sense CSS centric, in that they've sort of used
the design of CSS and sort of a central part of their architecture. So, a lot of changes,
changes that affect layout are often sort of indirectly CSS changes that happen because
something causes the computed style of CSS properties to change, which in turn causes--causes
the display to change. So, examples of, I mean, so the--sort of the simplest example
of that change is simply changing the style attribute. So you change element.style on
some element. You're pretty clearly changing the computed value--the computed style of
an element. So, they're sort of--I have this--they're sort of a bunch of--a bunch of different pads
I drew here. So I'm sort of thinking about, you know, what types of content changes. There
are--if you change the computed style of an element, if--sorry, the style attribute of
an element, you're going to change the computed style, there's no way around that. However
there are some other types of changes where you can avoid changing the computed style
but that sometimes change computed style. Like if you change an attribute, like the
class--the class attribute is pretty likely to affect computed style. But there are some
attributes that are more or less likely to change the style. And we have some optimizations
to detect whether or not they will which I'll talk about in a second. Then there sort of
a third class of changes where the changes are actually not completely avoiding the system
at all. One of the interesting ones there is scrolling which scrolling is a pretty optimized
process because it's something users do a lot and it's something that graphics cards
are reasonably good at doing in the common case which is in most cases scrolling down
a few pixels, you can simply tell the graphics card to move everything a few pixels up and
then you manually repaint the little slice at the bottom that appeared. So that's--that
not only avoids dealing with all of the systems except for painting, but it also avoids even
repainting anything but the little region that changed at the bottom. Now, there are
a bunch of cases where that's actually not the case where we have to repaint everything.
Some of the obvious ones are if you used background detachment fixed or position fixed which basically
are a way of creating something that doesn't move when you scroll. Then if you're drawing
something that's a composite of things that do move when you scroll on things that don't,
you have to repaint the whole thing. You can't just move bits on the screen. There's actually
a 3rd case there that's sort of interesting which is when you have overflow on an element
that's--so the CSS overflow property lets you create something that's scrollable inside
a document. If you have overflow on an element that is--that does not--that has a transparent
background and it's on top of something that's not uniform, then we again have to repaint
the whole thing when you scroll. And we've gotten better at detecting some of the optimization
cases there. I think at this point we will actually detect, you know, if you have something
that is--if you're scrolling something that has a transparent background but it's on top
of something that has a uniform background, I think we'll still optimize that but that's
probably something that differs a good bit across browsers and versions as well. To move
away from this--scrolling is sort of an interesting side point because it's something that you
can do programmatically through the DOM. You can change element.scroll top and element.scroll
left which is often a much faster way of doing something that affects the people do by changing
element.style.top and using absolute position in a relative positioning to move things.
When--there are--some of those effects can also be accomplished by scrolling something
programmatically and this could be something with overflow hidden which still can be scrolled
programmatically, and that's sort of a way to bypass this whole pipeline and just deal
with the repainting. So then there's this 3rd set of changes that sometimes cause new--sometimes
cause re-computation of style and sometimes don't. These are things like changing attributes.
And the classic case that needs to be pretty heavily optimized is what we call event states.
Now the event state that's important is the--in term of optimization is the hover state because
hover is a CSS selector that applies to whatever element is under--whatever elements are under--basically
the element that's underneath the mouse pointer and all of its ancestors. So in theory what
element matches that selector, changes every time the user moves the mouse. But--so, the
optimization through which we avoid doing this re-computation of style is essentially
geared towards optimizing that case so that we don't have to re-compute style every time
the user moves the mouse which is that every time--so when the element that is--when element
changes whether or not it's in the hover state. We essentially look at all of the CSS selectors
that have hover somewhere in them. So not necessarily in the rightmost part but we look
at any CSS selector that has Collin Hover in any part of a selector. If that selector
matches--if that part of the selector all the way to the left end matches the element,
then we might have to--then there might be some style change. Because you can have a
selector like Collin Hover space p that applies to any paragraph inside of an element that's
currently in the hover state. So we need to check not only the rightmost part but all
the part in the selector which sort of yield the 2nd guideline for fast CSS selectors which
is that, there's--anytime you write something like hover in a selector or write an attribute
selector based on the attribute change--an attribute that changes a lot, it's also worthwhile
to--to have that part of the selector be as specific as possible even if its not the rightmost
part of the selector. So, the reason this is so valuable to optimize is that at that
point we can check only one element. Whereas once we decide that we need to go through
and re-compute style for an element that implies that we're also re-computing style for all
of its descendants both because of its CSS inheritance. Because a lot of properties are
inherited. And because of a lot there--it's pretty common to have selectors that move
from just--that select based on ancestors. So the way of handling--so we sort of just
handle that all in the same code. So, it turns out that once--so once we decide that we do
need to do this restyling, we then coalesce as many restyles as possible. Now, essentially
what this means in the normal case, is that we post an event to the main event loop and
say, "When this event fires, we'll process all the restyles that have happened between
the first one and when the event fires." However, the web has sort of--the web has evolved with
a synchronous programming model that doesn't let us quite do that because basically the
expectation of script talkers and the expectation of all the pages that they've written that
we have to be compatible with, is that changes take affect immediately. So, when we say that--when
we do things asynchronously, we can do that as an optimization. But then if somebody asks--but
then if a script asks for information that depends on the thing that we're planning to
do later, we suddenly have to do that immediately to provide the information the script wanted.
And there's a lot of things, a lot of drama script, a lot of DOM APIs that actually require
this information. So if you're looking at the computed style for an element that requires
that all the style be up-to-date and, you know, it in fact requires that the layout
be up-to-date in some cases. If you're asking for various properties, like offset top or
offset left that requires that the layout be up-to-date which in turn requires that
the style will be up-to-date. So there's a lot of things that cause us to--cause us to
flush our cue of all the things that we would like to coalesce. And that poses a potential
danger to script authors because you can--it's pretty easy to write a loop where you're making
a change and then reading something that requires that change to be flushed. Whereas, for example,
if this were split into two loops, you could read all the data that you needed and then
may call the changes, you would then have all those changes coalesced and it would be
much faster to make all of them than if they've all been--than if you force them to all be
flushed separately. So, once--when we re-compute the style for a bunch of elements, we then
essentially compare the--compare the style for--the old style data and the new style
data. So we have, you know, we find out that, for example, the CSS display property changed.
The display property affects what type of frames we constructed. So if the display property
changed--changes, we need to go--we need to go construct frames and then go through the
rest of the pipeline. If, say the width property changes, that doesn't affect what type of
rendering objects we construct but it affects the layout and so we need to go through the
pipeline from here. It could also be that, say the color property changed, at which point--the
color property doesn't affect the first two so we can jump straight to the third. So,
when we handle this--when we handle all these things that change CSS properties, defending
on the property will do a different amount of work to handle that change. So, then there's
the question of how much work these different types of things take? So, reconstructing frames
for something, basically if we reconstruct the frame for an element, we're also reconstructing
the frames for all its descendants. That's just an invariant that we maintain. I don't
know if other browsers do that or not. But it's not particularly--its--there--there's
no interesting behavior regarding the depth of the tree except for things like--except
for a few odd cases where we have to go and reconstruct the ancestors because there are
a few really strange cases, like when you have blocks inside of inline elements where
suddenly we--there's enough complicated fix-up that we need to go for a lot more in order
to redo the fix-up from the top of the tree. Doing re-layout or reflow is a little bit
more interesting in that a re-layout is always a recursive process running down from the
top of the tree because we have this algorithm where the widths are input and the heights
are output. So if there is some change way down in the depth of the tree, its possible
that that change propagates out into different heights all the way back up to the top. So,
what we do to in--potentially some other things that we need to update during layout. Like
over--like the regions of overflow which are sort of like a second rectangle. So, when
we do incremental reflow its--there is this aspect that's a function of the depth of the
tree. So, this is a diagram that I--I stole from a presentation a colleague did six years
ago. Essentially we sort of optimize--we call these re-layout methods all the way down the
tree, and some of them aren't necessarily going to be all that efficient but they, in
turn, at least aren't going to re-layout all of their children, they're going to just re-layout
the child on the path to get to what needs the layout. So, the cost of doing re-layout
can be pretty heavily affected by the ancestors of that element. For example, if you have
an element that's inside a floating element that's got a lot of floating siblings, recovering
state--the state we have to recover for floating elements is pretty substantial because we
need to rebuild state in order to know where--what areas we can wrap around and what areas not
to wrap around. So the cost--the cost of a reflow can vary a lot depending on what--depending
on what something is inside, not just depending on what it is that is being laid out again.
So, then--the final step is repainting where--essentially what we're doing is we're invalidating regions,
telling the operating system that a region is invalid, it will then come back to us with
a paint event telling us to repaint that region. So, there's sort of this hierarchy of css
properties in terms of which cause more damaging style changes than others and this can introduce
some tradeoffs. For example, if you want to hide elements, there are actually multiple
ways that you can hide an element. You can change it to be display non and like I said
earlier, making something display none means were not going to construct any rendering
objects for it. So if you change something to display none, we're going to destroy all
the frames for it. And then if you change it back from display non to whatever it was
before, we then have to rebuild all the frames, lay them out over again and paint everything.
If in turn, you hide something with a visibility property, you don't incur any of those costs
because the visibility property doesn't affect frame--doesn't affect the frame tree, it doesn't
affect the layout. But you have slightly higher costs in terms of what you're doing every
time. In other words, the tradeoff between display and visibility is essentially that
with visibility, changes are cheaper but the overall cost when its not displayed is higher
because the visibility property, you still have the rendering object, you still have
to do all the layout but then you just don't paint it. So, I want to--I want to go back
now and talk about some implications of these four things that you can--for ways to test.
This is one of the things Lindsey asked me about--when he asked me to give this talk
is that people were thinking about, you know, what types of things are useful in terms of
testing performance of one pages. And some of that depends on what it is you want to
test but--so, one example is, if you want to figure out essentially what the cost of
building the frames and laying them out is, for some particular piece of content. You
could do something like the following: you could set the element or maybe it's the body
element to be display non, then you could get the offset top property of some random
element in the tree which will in turn flush all the style changes and flush the layout,
so that you've essentially flushed the buffer of what's cued up, then you get a time stamp,
then set the thing that you had set to display non back to its original value, then again
access offset top of some random element in order to flush everything. That will--what
that'll do is--getting offset top will flush all the style changes. It will recreate the
frames and lay them out again. And then you can look at another time stamp to see how
long that--how--essentially how heavy that trunk of markup is. Now, during that--within
something that's dynamic has a--could potentially throw-in some confounding factors because
by forcing these flushes, you're also forcing--splitting up things that could potentially be coalesced
within a real application. So, likewise, something I talked about a little bit earlier is dealing
with the cost of incremental layout. You know, you--one of the things that might be interesting
to test in terms of layout is how expensive some structure is in terms of the--its performance
effect on re-layout of what's inside of it because like I mentioned, the layout process
depends on the depth of the tree. So you can do, again, similar things by, you know, checking
an offset top, making some small change and then seeing how--seeing if different structures--seeing
if different structures take shorter or longer amounts of time to handle every layout. I'm
sure there's lots and lots of other examples here, but those are just a small number. And
hopefully I've--hopefully I've given you some ideas here for other things that you can test.
And I'm certainly open to questions about pretty much anything I talked about. Thanks.
And I was told if you have questions you should use the microphone.
>> Hello. Well, thanks for the great presentation. I do have a question. So, awhile back you
suggested using overflow hidden in changing scroll in order to move things around...
>> BARON: Yes. >> ...but I found in the past and I've done
that, it causes unrelated elements in the screen to kind of have stuff flashed behind
them. Do you know if there's a reason or work around for that or...?
>> BARON: I don't know. I'd be interested to see a test case for that. It's really not
something that should happen. I don't know. >> Okay. I was just curious. Thank you.
>> Thanks for the great talk. I have a question about absolute positioning and what kind of
optimization you do to not reflow the rest of the tree when you move it around, for example.
>> BARON: So, absolute positioning is sort of interesting in that it's--so absolute positioning
is--the CSS specification defines this concept that it calls the Containing Block where--that
sort of the--in--for normal elements, it's the nearest block level ancestor. But for
absolutely positioned elements, it's the nearest relatively positioned element or the view
port if there is no containing relative--oh, the near--sorry, the nearest positioned element
whether it's relatively or absolutely positioned or the view port. So, for absolutely position--so,
when we build the--this--when we build the frame tree, the absolutely positioned element
is a child of its containing block. So, for absolutely positioned elements that are positioned
relative to the view port, in other words, if their containing block is the view port,
then their--then in our implementation, their parent is the view port. So the only--so there's
essentially no structure that you have to delve down through in order to get to them.
But if an absolutely positioned element is inside a relatively positioned element that's
inside some complicated structure, we actually are going to go all the way down through that
structure to the relatively positioned element and then jump from there straight to the absolutely
positioned element. >> All right. Thanks.
>> BARON: Sure. >> So, you discussed the performance differences
between hiding an object using display non versus visibility hidden. So that sounds,
sort of like an implementation detail or is that actually--is that behavior somehow part
of the standard in...? >> BARON: So, in terms of visibility, it sort
of is part the standard because visibility is something that can be over ridden by descendants.
So, inside something that's visibility hidden, you could actually have something that's explicitly
visibility visible and then it suddenly appears again. So it is effectively part of the standard
that visibility--elements with visibility hidden need to have--need to be laid out.
As far as display, I mean, I could--it's not strictly part of the standard but if some
things display non, you don't know what display value it would have if it weren't non, so
you don't really know how--like if you were to try to lay it out, you wouldn't know what
display type to give it because what type of rendering objects you construct and how
you lay it out are a function of its display value.
>> So if you were building a three column layout for performance, do you pick tables
or do you pick floats? It sounds from what you said; you distinguish table elements being
very optimized from other things, so. I don't know what your [INDISTINCT] is.
>> BARON: Well, tables and floats are both reasonably well optimized although probably
for different cases. I tend--when I'm trying to do a layout, I tend not to be worried so
much about the performance aspect. In terms of floats versus tables, I worry more about
what can actually do the layout I want because usually there's only one answer that I can
come up with at which point I just run with it. I think a lot of it depends on what--like
I don't think there's one answer I would give for that. Like I think it depends on what
exactly you're trying to do. What you want to be flexible, what you--what you know widths
of and so on. >> Question. Near the end, you were talking
about if we were doing timing, we should have start up the clock, do some--whatever it is
you want to and then stop the clock. I found that any concept of timers or wall clock is
incredibly jittery with regards to browsers, is there any other measure of work I can use
in order to time it? >> BARON: Not that I know of. So, there's--I
think there have been some improvements to the accuracy of things like date.now recently.
It used to be very inaccurate on windows but I think that's fixed now. That is on--in Mozilla
on windows. My solution to timing things when the timers are inaccurate is always just do
it more times. >> Do you have performance benchmarks for
layout and rendering the test that you've improved?
>> BARON: That we use on the--yes. So we have a bunch of different performance benchmarks
that we keep track off. Some of them are just page loading benchmarks where we're essentially
timing the load of a whole set of pages that were downloaded at some point and which we've,
say archived for that benchmark. We also have some other benchmarks that are testing particular
things, a bunch of constructed test to test DOM performance and graphics performance.
Those are the--we also have some benchmarks for application performance as well. But those
are probably the key benchmarks that we're looking at all the time. There's also, you
know, people look at specific test cases as well but not for tracking purposes.
>> Coming back to Lindsey's question about cables versus floats for horizontal layouts.
What if you have deep nesting? Like if you want to create like a general--generally just
a structure for laying the things out horizontally, is deep nesting of tables generally more expensive
than deep nesting of floats or...? >> BARON: I can't think of the top of my head
why one of them is going to be worse than another immediately. One factor is going to
be that in some--so in some cases, tables depend more--require more intrinsic width
computation in that table cells have--so they're sort of--any piece of content has two intrinsic
widths. One is sort of--the simple way to think about it is using a paragraph as an
example. If you layout all the texts in a paragraph on one line, that's the larger of
the intrinsic widths and then smaller intrinsic widths of the two intrinsic widths is the
width of the longest word in the paragraph, and then you can sort of extrapolate those
intrinsic widths outwards. So, table cells have a rule that they will never go smaller
than the smaller of those intrinsic widths which means they always have to compute that
one, even if you've assigned them a width. They will still check and compute it and make
sure that they don't go below it, unless you're using fixed layout in which case you don't
have to deal with that. So there--now, some of that with deep nesting of tables, depending
on the browser, some browsers are going to respond pretty bad--basically, whenever you
have deep nesting of things that require intrinsic width computation; some browsers are going
to respond pretty badly in particular, Mozilla before--so Firefox 2 or earlier. And I suspect
Internet Explorer also, because there are sort of two fundamental different--fundamentally
different designs for how to do this intrinsic width calculation. And basically, Gecko recently
changed--between Firefox 2 and 3, Gecko changed from one to the other. So that now we don't
have--there's not as much of a penalty for dealing with deeply nested tables because
we've essentially separated the two. But what we did back in Firefox 2 was intrinsic width
computation was essentially also treated as a layout pass where we would sort of say "do
a layout at some arbitrary width". You know, at infinite width, essentially. And--so then
the process of redoing--the process of doing that would destroy the information that you
had in the normal layout. So if you had a series of deeply nested things that all needed
intrinsic width information, you could get into trouble essentially throwing away the
layout to compute an intrinsic width and then having to rebuild it multiple times as you
recurred down and up the tree in order to lay things out. So that's one reason you could
get in trouble with deeply nested structures, although that's--that should be much less
of a problem in Firefox now and shouldn't be a problem in WebKit or Opera.
>> You had also mentioned, dynamically altering an element that had a float element as an
ancestor is being potentially expensive because more of contextual information has to be recomputed.
>> BARON: Well, it's more dealing--it's more any--really any change where there's a bunch
of floats somewhere along that path. Because essentially when we layout an element, we
essentially, even--so when we're doing layout on an element because one of its descendants
needs to be laid out again, we still have to sort of look at each one of its children
and say "So does this child need to be laid out? Does this trial need to be laid out and
so on?" And if that trial is a float, then there's a bit of information that we deal
with. So if something along the path has a lot of float children that might be a problem.
But some of these are--like some of those problems are things that only show up if you're,
say, using single pixels divs to build 1,000 by 1,000 image. People who do that tend to
find all these performance problems in browsers that nobody else finds. So, you know, some
of those things might only the things you hit with very large numbers of children. Anyway,
thanks.