Stephen Woods: Creating Responsive HTML5 Touch Interfaces

Uploaded by yuilibrary on 19.02.2012

>> ALLEN RABINOVICH: Please welcome another frontend extraordinaire from Flickr: Stephen
>> AUDIENCE MEMBER: Yay Stephen! I love you!
>> STEPHEN WOODS: I love you too. So there's a really bizarre delay going on. It's going
to probably make it sound like I'm stuttering, so my apologies. I'm here today to talk about
HTML5 touch interfaces, which is, I guess, on everyone's mind right now. There's an ongoing
debate about are we doing native or are we doing web pages. I'm not going to talk about
it. I do web pages; that's my job, so I like them.
This is who I am. I'm Stephen Woods. I've been at Flickr for two years, I've been at
Yahoo! for five and a half. It's just horrible, you don't want to work here. I'm kidding.
This is the iPad touch experience we've created for Flickr. If you go to a Flickr photo page
on your iPad or your Android tablet and you click the magnifying glass, you're going to
get kind of a view like this. You can swipe on it, you can press play and just have an
animation go, and it can pinch to zoom on iOS devices. This is all done with JavaScript
and HTML; we're not using native pinch to zoom. Please check it out when you get a chance
because it is, in my opinion, having done it, it's pretty smooth.
I've been working on desktops for a long time, and I think a lot of you probably have as
well. As frontends, what we worry about all the time is the browser technology. We write
all this stuff, and even for the coolest stuff we need to make sure that we can support every
kind of browser, we need to have a fallback using the filter for IE8. On mobile we don't
have to worry about any of that stuff. On mobile we get to use WebKit almost all the
time. Unfortunately we have to worry about something worse: devices.
When there are only five browsers, it's easy to test in five browsers. On mobile we have...
I mean, I can just look out in the audience right now and I see iPads, I see Android phones,
I see the big Android phone, the little Android phone, Kindle Fires. They're all a different
screen size, they're all a different CPU, but they all use the same browser engine.
I'm not going to spend too much time on screen sizes, because this has been covered really
well here by Ethan Marcotte. I hope I said his name right. This presentation is up on
SlideShare, so you can get the link later. Media queries, break points, liquid layouts,
kind of solve this problem of screen size.
But there's another problem that's worse. That's an iPhone 3GS. 256 megabytes of RAM,
and a 271 Geekbench score. It's an iMac, it's a Bondi Blue iMac. We're trying to make awesome
interfaces on a 10 year old computer. It may look like a cell phone, but really you have
to think about it as a bad computer. Now it does have a 3D effects Voodoo card in it,
but otherwise it's a crappy computer. They're crappy computers with good video cards, but
it's still possible to make an experience that seems not crappy. If you use this, you'll
feel like it's not a bad experience. We do that through a lot of tricks, but the biggest
one is perceived performance. Again, this echo is interesting.
This is, I guess, five years or so I saw a talk about Tivo. The early Tivo devices were
really, really, really slow. You would click a button and wait a minute for your show to
play, but no one ever complained about how slow the Tivo was. That's because the moment
you pushed the button, you heard 'boop boop', immediate feedback, so it felt like you were
being listened to. I'm asking a machine to do something and I get a response. On the
desktop we solved this problem really easily. I press delete on a comment on Flickr and
I get the little spinning balls telling me the interface heard what I asked and it's
working on it.
Now, touch devices are a different problem. Like a lot of people in my generation, my
first experience with multi-touch was on Star Trek: The Next Generation. You remember the
transporter room in Star Trek: The Next Generation? If not, if you don't remember this by the
way, you're not real nerds and you're not allowed to be here. Chief O'Brien did this:
he took all three fingers and pulled down on a control. If you saw over his shoulder,
there was a little widget that's moving with his fingers. It's giving him immediate feedback
as to what he's doing. He's pulling the transporter things, I don't know what those do, but it
feels to him like he's using the old original series machine that was mixer levers or whatever
they were.
The feedback has to be continuous. The same thing the Tivo did, when you press the button
and it makes a little beep, you need to give the user feedback from the touch events, but
it has to be continuous feedback or they think the interface died. If you've ever done this,
if you've ever used an app that's bad, or anything else on your touch devices, you'll
be scrolling down, the scroll stops, and you think you did something wrong, the device
crashed. It certainly doesn't feel like it's just loading. It's not doing something natural.
I know that my wife told me that when she's using her iPad and it stops scrolling, she
thinks that her finger did something wrong, like her finger is broken or something. Regardless,
it doesn't feel right.
This brings me to this: conventions. If you remember, if you're my age, if you're in your
thirties, you remember this when you were a kid. I grew up with this stuff; I know what
a closed box is, what a scroll is. They're conventions of clicky interfaces, desktop
interfaces, and just second nature. Mobile has conventions too, and those conventions
are part of what make an interface feel responsive. That's my cat. He's very friendly.
The slide to unlock. Now, if you remember when the iPhone first came out, no one had
ever done it, so that's why it had to say 'slide to unlock'. I recently got a Kindle
Fire. It has the same widget, but it doesn't say what you're supposed to do, because there's
the convention now. People see this on a touch interface, and they know oh, I'm supposed
to slide to unlock. So conventions have already built up around touch interfaces, and when
you break those conventions, again, it feels like the interface is not working properly.
So on a webpage, how do we handle it? We get three big events: touchstart, touchmove, touchend.
Touchstart fires once, touchend fires once. This is really all you need to do almost every
possible interaction or gesture. On iOS you get the touchesArray. It's the same in Android,
but you only get one touch. Ice cream sandwich theoretically has more, but I haven't actually
seen it in action yet. On iOS you get eleven touch points; I'm not sure what the eleventh
is for, but you get it. Each touch gives you position information, and sometimes it gives
you scale and rotation. There's a synthetic gesture; I'll talk about that in a minute.
But even if you just pinch on a regular array, those touchesArrays give you some information
about scale, which we'll use in a minute for pinch to zoom.
On iOS there are gesture events. I don't recommend using those because they only work on iOS,
and if you look around this room there's a lot of Android devices out there and if you're
using iOS only stuff you're probably making a mistake. The gesturestart, gesturechange,
gestureend -- it's really handy. You can do pinch to zoom easier, you can do rotation
easier. Just forget it exists.
The best source for information about these events is... Apple provides a complete documentation
of everything you need to know about touch events. However, you will have to use your
brain a little bit to figure out where Android's not going to work or where your Android problems
are going to happen.
That said, how do I make it work? How do I make it feel not crappy? This first tip: prioritize
the user feedback. There's nothing more important than while you're making a gesture, telling
the user you heard the gesture. So you have to use hardware acceleration, and you need
to manage your memory. 256 megabytes of RAM, remember -- that's like nothing. Probably
most of us have at least 4 GB on our desktop. So what does it mean to prioritize user feedback?
Don't do anything when someone is doing a gesture. Don't load stuff, don't calculate,
because everything is expensive and in JavaScript everything blocks, with some exceptions; I'll
talk about that in a minute.
Now, treat the DOM as write-only. This has been really beneficial at Flickr. You do your
own math, and when at all possible, CSS transitions, because then you're offloading the work into
a separate thread and it's not blocking the UI.
All right. So what do I mean when I say write-only DOM? The DOM touches are really expensive,
no matter what. Anytime you have to ask the DOM a question, or set something to the DOM,
work has to be done and it's going to be blocking execution of the JS. You already know where
everything on the page is, you have a device that uses WebKit, so you know a pixel's a
pixel. You've kept track of where everything is, you don't have to keep checking. You don't
need to feel insecure or make sure you've got everything right. The other thing is use
matrix transforms. I'll tell you more about that later, but it's a really great way to
handle it.
The swipe gesture. This is the most common gesture you see on a mobile interface. If
your web page supports touch, it needs to support swipes. One touch I really like, no
pun intended: on the New York Times website they have these featured photos, and they
don't say anything on them about swipeable, but they are swipeable. You can touch and
swipe because it's a natural expectation; if I see a photo or if I see any kind of content,
I think I can interact with it with swipes. The swipe gesture's easy, right? It's distance
equals my current position minus my start position, applied to translate3D. I used translate3D
because it's hardware accelerated. That means you're getting that off of the CPU, you need
the CPU for doing JavaScript.
Another important trick is what I call a snap back/snap forward. If you recall on the iPhone
or iOS device, when you scroll with your finger, if you hit the bottom of the page, the page
doesn't just stop, it keeps going. You let go and it bounces back. That's what I was
talking about before, the feedback telling the user that you heard what they were saying.
So if I scroll down and the scroll just stopped, I wouldn't know if the page crashed, if my
finger broke, something didn't work, or I wouldn't know what to think. But by bouncing
back you tell the user oh, I heard you scroll, but you're just at the bottom of the page.
The same thing you need to do. When you swipe, if that gesture's happening, your element
has to be moving. Then you schedule a snap. You know how far is too far, you know how
far is not far enough, so when they let go, you snap forward and use a transition for
iOS uses physics for the scroll; they actually calculate momentum and have it pretty nice.
You don't really need to do that. It's better, but it's expensive; you have to calculate
it in JS, you have to do the math. Just schedule a transition with easing. In my experience
it's close enough, and it'll do the job.
Another word about scrolling. Use native scrolling if you can, because there are a lot of people
out there who, to overcome some interface things, have implemented scrolling themselves.
But scrolling is the most fundamental interaction you do with a device, so if you are trying
to make scrolling feel right, you're up against a real... You have to crawl up a long way
because the user knows intuitively what scrolling is going to be like. These days you can use
-webkit-overflow-scrolling. It's pretty nice. It doesn't quite work; it works well enough.
It doesn't bounce the whole way, so what happens is it bounces the entire page. Libraries like
Scrollability and iScroll 4, they'll handle that bouncing, but use native when available.
One of the things everyone wants to be able to do is pinch to zoom. Pinch to zoom is,
like I said, like the slide to unlock, like to swipe is a fundamental interaction with
the iOS device. Users come to your web page and the first thing they do, if they see something
they can't see, is they start unpinching. With the Flickr website, we didn't support
pinch zoom for a long time, and our users were constantly complaining because they saw
a small photo, they wanted it bigger, and pinch to zoom didn't work. We had broken the
Now, why can't you do native pinch to zoom? People ask this all the time about Flickr:
why don't you just enable pinch to zoom? Because native pinch to zoom is for zooming on a web
page, not zooming in on a part of a web page. There's no API, there's no way to control
it, so the user zooms in and they lose the entire interface.
So how do you make pinch to zoom? I'm sorry; this is totally difficult and the more I think
about it, the more difficult it is. But it's possible. You can do it, it's just going to
require some thinking. Use matrix transforms. That keeps you off the DOM as much as possible;
you can cue up all the different changes you need to make, make those transformations to
the matrix, and then apply them. Anyone who talks about matrix transforms throws this
up, so there you go.
This is actually not that useful to you, because you don't need to know the math. If you're
not doing rotations, this is all you need to know. Those are scale, those are translate.
The actual facts of how matrix mathematics work, what is happening when a transform is
being applied, it's really interesting. You can do cool stuff. Most of what you need to
do you don't need to know about. For 3D, this is the yellow ones, those are scale, and green
is translate. That's all you need to know. The rest, you can just let it go -- unless
you want to, then you can do some neat stuff.
The cool thing about transforms, though, is it keeps whatever complex state you figured
out, and for pinch to zoom it's really important, because you have multiple transforms. Rather
than applying them one at a time -- remember, each time you touch the DOM it's not free
-- this allows you to cue up the transforms, apply them in one go, and then move on.
OK, so pinch to zoom. The hard part about pinch to zoom is understanding what's actually
happening. The first thing you do is determine the center of the touch points, determine
the scale factor, and scale the element. It's a lot, right? It's easier to look at; I've
found it's easier to explain if you can actually see it. If pinch to zoom works the way the
demos work, it would be like this. The center point is the object, the little star. You
pinch with your finger, and if you just apply the scale value to the image, or the element,
it's going to scale out from the center. That feels wrong, right, because the experience
of pinch to zoom is like I'm stretching the image. But this approach makes it looks like
I'm turning a dial or something, I'm not stretching the image.
This is what it's supposed to be like. It's supposed to scale around the center of the
touch point. How do you do that? It's pretty simple. We scale and we translate, so you
have to figure out how much to translate by to keep the center of the scale point in the
same place. The math to do this takes a while to derive; I'm not going to explain it because
I did the work for you. You need to set the transform origin to the top left. Here's your
magic formula -- this gives you the number you want. It took me a long time to figure
this out. There are a lot of people who will try and explain the math. I've found that
you probably don't need to understand it, you just need to use it.
Like all good presentations, I have some pro tips. The virtual pixels on high DPI Android
devices and actually on all iOS devices, what you think is a pixel is sort of a nebulous
number that's kind of derived from the pixels, but it's not device pixels. When you say oh,
I want to go to 321 on the device, it's giving you a number based on whatever you set the
viewport width in meta tag. So it's not really pixels, it's just a grid reference. Now, in
order to move things by pixels, it's going to have to round, so you're trusting the device
to do rounding. That's why moving the transform origin doesn't work, because if you move the
transform origin, which some of you might have thought is natural and the easiest way
to change the center of scale, there's only so many pixels it can move it by. So the more
you scale, the more it's rounding and the less accurate that center becomes. As you
scale out, it starts to slip around.
Let's talk about what we did at Flickr. The Flickr touch light box, it took a lot of time
but it's actually really simple. This is what it looks like. It's Flickr, so it's cats.
It's really like that on the device, by the way, it is that fast, and that's thanks to
translate3D. So it's simple, it's like any carousel. I'm keeping three nodes open: the
next, the previous, and the one you're looking at. I only move what you can see, which seems
pretty obvious; it took me a long time to realize. When you start to swipe, I move two
of the images. So if you're swiping to the left, I move the center and the next, or to
the right, the opposite. And I move them separately because it turned out that moving them as
one element doesn't get you anything in your performance. I tried it out. I think the fact
is it still has to do the work.
There's an event listener on the top for all events, because you don't want to have the
user have to worry about where they're swiping, especially if they're swiping between slides;
they just want to be able to grab anywhere. I use translate3D, and I do everything when
you stop moving. When the user is swiping, nothing is happening. I set, like, a semaphore
so I can make sure that if I think something should happen, I can cue it up for later and
wait to do the operations after the gesture is done, because remember, the most important
thing is user feedback. If you stop -- and you're going to stop if you touch the DOM
-- you will make it feel slow. Even if it's fast, even if I'm preloading images while
they're swiping, oh my gosh, that's so great, it feels bad because preloading isn't free,
the device has to do work.
Aggressive pruning. I think as frontend developers, we get in the habit of letting the browser
do garbage collection. We believe the browser can handle memory, that we don't have to worry
about memory. 256 megabytes of RAM, so you can't have anything. Now, one thing I discovered
-- and I have another slide for this, but thinking about it now -- when you offload
things to the hardware acceleration, you say translate3D, when it runs out of memory or
the textures, it just crashes on the iOS device. It doesn't swap, because where is it going
to swap? It just crashes. The way I handle that is aggressively pruning. I make there's
only three nodes other than the interface there at all times.
I clean off the CSS transforms. Once the transform's done, I take it off. Then it's not on the
hardware anymore, I just say transform equals null, and then it's clean.
Write-only DOM. I read it once at the beginning of the session and then I keep track of what's
going on, because there's really no point in reading; you already know what's happening.
Again, like I said, I don't do anything. When the swiping... The only job of the JavaScript
is to move the swiped element.
A couple of frustrating limitations. The retina screen is huge and device memory is small.
I've found that Apple's trying to optimize this a lot. They're trying to make sure you
don't crash the device. But that means that pinch to zoom only gets you so much, because
you can only make... I forget what it is, now. I think it's 1024 on the side? It's 3
megapixels anyway, for a JPEG, maximum, that it will display. If the image is bigger than
that, it downsamples it. So you're pinching to zoom but the image was downsampled, and
you don't get to look into it. The way I handle that is to tile, like a map, but we haven't
done that at Flickr yet.
Hardware acceleration is crashing. It doesn't do anything but crash, it doesn't seem to
slow down, it just crashes. There are tons of little places in iOS where it's trying
to automatically optimize. It's trying to outthink you as a developer because it thinks
you're an idiot, so it's trying to help the user to make it fast but it's optimizing in
ways that might make it frustrating for you as a developer. I just found lots of places
with that, and it was really frustrating.
I guess I'm going fast; I must be nervous or something. But here is the...
>> AUDIENCE MEMBER: Love you, Stephen!
>> STEPHEN WOODS: You love me? I'm glad you love me. Here's the SlideShare link. Does
anyone have any questions?
[inaudible question]
OK, there is no tools. OK, he wanted to know what kind of tools I use to monitor memory.
The tool I use is the browser crashed. That's the trick. The tooling isn't great. The simulators
don't work -- so they work, but they're not the same so you can't trust them. If you're
developing on Android or iOS, you need a device in your hand because the simulator... I mean,
it literally is not the same thing. I guess what I did is I would push it as far as it
would go, watch it crash, and then start rolling it back until it worked. I mean, we're all
so used to using these great tools like the Chrome developer tool, which does work on
iOS. I can't remember what it's called. There's a proxy you can use that will actually run
the WebKit developer tools on iOS, which is really handy, but it still doesn't tell you
when the memory card's going to barf and crash. You just have to try it.
>> TED DRAKE [offscreen]: Hi. For those who don't know I'm with the Accessibility Lab
here at Yahoo!, and whenever you use a custom touch gesture with JavaScript, it fails as
soon as you turn on a screen reader. We haven't found the solution for this yet, so all that
we ask until we do find that perfect solution is make sure you have backup buttons. Don't
make the only way someone can do something on your site a custom swipe. If you have an
alternate set of buttons or something like that, it'll work with the screen readers for
now, but that is one thing we are worried about and you can see they have the newer
and the older buttons at the top.
>> STEPHEN WOODS: I'd like to mention, too, from accessibility, I didn't mention the images
are divs. I found out that image tags are way slower than divs in terms of performance,
and I don't know why, it just is. But what that did is it didn't tell the screenreaders
that there was an image. They had a blog post on YDN about this, about how we the ARIA role
image on the div to tell the screenreader that there was an image in the interface.
Other questions? You want to get onto the sound.
>> ALLEN RABINOVICH: All right, it looks like we're done.