Bufferbloat: Dark Buffers in the Internet


Uploaded by GoogleTechTalks on 01.06.2011

Transcript:
>>
Let me turn the proceedings over to Professor Cerf; doctor, professor…
>> CERF: Hairless doctor. Hairless doctor. Perfect timing. Thanks.
>> GETTYS: How are you? >> CERF: Good to see you.
>> GETTYS: Good to see you. >> CERF: Well, good afternoon, everyone. My
name is Vint Cerf. I'm your chief Internet evangelist, in case you didn't know that.
So may all of your packets land in the right pit pocket. It's a real pleasure for me to
introduce Jim Gettys who has an interesting and varied background. He used to be an astronomer.
But he's been corrupted by computers like many of us have. And as a consequence, he
joined Digital Equipment Corporation and he worked on something called Project Athena,
not the one that you know of currently, but another one many years ago in a galaxy far,
far away. But in the course of that work, he was one of the creators of the X Window
System, something which, some--if you don't know about, you should learn about. And the
reason is that a lot of what we do at Google and in a sense is X Windows are on steroids
if you could think of, you know, HTML5 as being a way of producing a visible result
from some remote processing system which, of course, is on Cloud paradigm. So his background
and experience is very much in keeping with some of the philosophy that we have here.
He was the editor of the HTTP/1.1 spec, which had a very important effect on our business
because it lets you have one server with many, many different websites, whereas before that
didn't work out very well. He worked on Linux for handheld PDAs, something which you'd resonate
with our Android experiences. And after that, he became, among other things, the Vice President
of One Laptop per Child which I think some of you will know we were involved in to some
degree in helping to bring to fruition. So, he ended up at Bell Labs. And his--he says
he's supposed to be working in immersive teleconferencing. But one of the things that this triggers is
a desire to understand why the network doesn't support accurate and adequate teleconferencing.
And it is in consequence of trying to use his home network to do this sort of immersive
two-way interactive collaboration that he encountered the famous, now famous Bufferbloat
problem. And so, I think this--even if there is a debate on the conclusions that Jim has
reached on this subject, it's important to know something about the symptoms which he's
uncovered because it may have a severe impact on our own ability to deliver timely services.
And we all understand that Google is all about low latency. And so anything that gets in
the way of latency will get under Larry Page's skin and it's not where any of us want to
be. So, if this is an issue that we need to do something about, Jim is the guy to help
us understand it. So, Jim, I turn it over to you and I thank you very much for taking
the time to join us today. >> GETTYS: This really started as a--as a
personal--as a really personal history. I really need to acknowledge many, many people
who've helped along the way. And, in fact, some of you have helped and helped provide
pieces of this puzzle. My apologies if I've overlooked your contributions. And we'll take
it from there. So, I've been assembling a puzzle and the way I've characterized it is
something that many of you may have heard, "The Internet is slow today, daddy." I know
I did with my family again and again. And I would go try to debug my network and the
problem would typically go away. Although a number of times I was able to get it to
go on long enough that I was making support calls with my ISP and all that sort of stuff.
So about a year ago, I was--I was trying to prove that the empty Blue Box--a few of you
know what that is in this audience--needed to be thrown away. So I was playing with it
and doing simple-minded test, performance test, but also monitoring my latency. And
I saw this horrible behavior of one to two seconds latency with very rapid--rapidly varying
jitter. And since I tend to be a systematic type, I then tried the same thing without
the Blue Box and I got the same result and this was bizarre. At the time I was busy,
so I had to put it on hold for a couple months. And I got back to it late last June or early
July. I was trying to figure out what was going. An intermitting time for the third
time in the last three years, lightning has struck near my house and all of my existing
home network kit had been blown up again. And that actually plays into the story because
I intended to have the latest and greatest hardware, whether it'd be Ethernet switches
or cable modems or the like also made it more entertaining to try to debug. So, I want to
take us back a little bit in history for a moment to the 1980s and '90s. In that era,
those of you who lived through it with me, remember that we had really severe congestion
problems all over the Internet and unacceptable latencies. There was tons research done. There
was a whole set of algorithms that go into the rubric of AQM, Active Queue Management,
that were developed at the time. The most widely, most famous and widely deployed is
one called the RED, which will play into this later. But in any case, the problem of the
late 1990s, things seemed to ease down and partly because a lot more fiber got into the
ground. So, people actually caught up with the demand at least temporarily. And RED was
deployed. People were turning it on. And so, in fact, the congestion problems we experienced
really went down to a dull [INDISTINCT]. And so a lot of us, myself included, thought the
problem was solved. This is probably old hat to most of you in this audience, but sometimes
I get surprised what people do and don't know, so. One of the properties of TCP is it will--it
will fill any buffer just before--before a choke point in a network path given enough
time. And there's an unstated design assumption in TCP which is that if there's congestion,
there will be timely packet loss which allows it to react. It's an end to end turbo system,
okay. If you don't do that on a timely fashion, it destroys the congestion avoidance and the
control loops in TCP. And we drive to other transport protocols. In general, TCP is just
one of quite a few as that they are at least as good as TCP. We judge if they may be reasonable
to deploy into the Internet. So now I ask the questions, "What happens if this timeliness
assumption is violated by a lot?" And, you know, we're trying to--if you're working hard
on trying to avoid the packet loss, lots of engineers think that dropping packets is evil.
We should never do that now, should we? But, in fact, in a congested network, it is essential
for timely packet loss right now, since we don't have ECN deployed, to--to be--to occur
for the network to function correctly. Okay. So, you'll see this map a lot in this talk.
I have to talk about things all over the Internet. You'll see the same map again and again with
different things circled to try to keep you from getting completely lost. So here I am
at home on my poor little laptop, talking to a machine to MIT. It's almost exactly this
network configuration, though there were a couple more hops in the middle and then this
particular diagram. That's almost exactly what my initial tests were doing. So, I've
seen this problem in April and I've been trying to chase it, but lightning had done me in
and I was having trouble reproducing it. So, I'd try to find out what could be doing funny
things to my network and I was suspecting Comcast, they call PowerBoost. And so, it
turns out that Rich Woundy is--lives in the adjacent town to me and I had already sent
a mail saying, "Hey, let's go get together for lunch." In the morning, I was going to
have lunch with him. I finally figured out how to reproduce my problems again. That lunch
was really covered a pile of topics and he handed me a bunch of puzzle pieces. One of
which was, "Hey, there's this big buffer problem." Dave Clark had warned them of this several
years before. He'd been trying to chase it and he'd never been able to prove it. Pushing
back on vendors to--to keep their buffering minimal but it's hard to do without the proof.
And we explored topics of how I might rule in or out the PowerBoost feature and so on.
He also told me, to my surprise, that RED wasn't always turned on. I filed that away
for a future reference. That surprised me. I thought it should be. That there--he also
told me there are issues around ECN such that it's never been properly deployed. He also
gave me a point or two to the ICSI Netalyzr project. Any of you, whenever you go to a
new network should, as the first thing you do, run Netalyzr for you to find out what's
broken about your network. If you've just sit down, it's a nice little Java thing that
has--as much as the Netlyzer could figure out how to test in about a five-minute period.
And we'll tell you all sorts of wonderful things, so I highly recommend it. That event
is how I was able to know that even without looking at my interface that IPv6 wasn't on
Google Wi-Fi. So, the real smoking gun I got was the next day after lunch. And this is
a wonderful tool called SmokePing and so I happen to be copying or sinking the old [INDISTINCT]
archives from my house to MIT over this 10-millisecond long path. If you look at this, SmokePing
is both reporting latencies averaging well above one second, okay, along with bad packet
loss just while copying a file or actually a whole pile of files. It's about 20 gigabytes
of stuff. And the only times when the latency is low is because trying to surf the web or
read mail was so painful, I occasionally suspended it just so I could do something else. This
is the, daddy-the-Internet-is-slow-today problem. And you can do it to yourself. I was doing
it to myself. I just realized on how to reproduce it forever. So, in any case, I decided having,
in a previous life, that I--or I should have a packet capturing what--what was--what was
going on because this looked very bizarre to me. Ten-millisecond path, I should see
maybe a millisecond or two of additional latency but not a second latency. So maybe--I just
took Wireshark and there were these bursts of really strange behavior I could see, periodic
bursts. I could see that just in--just in SmokePing whether in Wireshark. And so I resurrected,
after several weeks of looking around, my old skills at TCP trace and xplot to be able
to plot it up graphically, so you could actually see what's going on. And so I did that. And
what I got--this is, by the way, what's part of an optimal single TCP transfer is supposed
to look like on a properly functioning network. Instead, what I got was this, okay. There
are a number of things to note here. Why is there something like, you know, a quarter
and a half megabyte of data in flight on over 10-millisecond? Why is it doing these horrific
bursts things which is exactly lined up to all the weird things I was seeing in the--in
the packet capture? You know, all of this sort of stuff. Just--this is over a period
of minutes, by the way. Why is it oscillating, like, ten-second periods? And this looked
like no TCP I expected at all. It should never occur that way. Okay. So the next week, I
went down to New Jersey and played around with my in-laws' FiOS service, okay. And so
where I am now is playing around with their--with their home router basically provided by the
FiOS guys and both wireless and wired and doing the same sort of capture and the same
sort of data. It's not as clean because I wasn't able to lock the family out, so there
was some cause traffic, but I got this out of that. Guess what? It's very much the same
sort of thing that happens to be several hundred millisecond latency over--over a 20-millisecond
path. You know, again, this is bizarre. And over wireless things, we're even much worse.
If you look at it in detail, it has the same sort of signature but, again, a 400-millisecond
kind of latencies out of this and the like. So, I'm seeing this stuff and--my cable's
are not working right now. FiOS isn't working right. What's going on here guys? So, at this
point, having had to stare at TCP to some extent in the 1990s, I knew enough for the
right people to call for help. And I did to a bunch of people, including Greg here, and
later Dick Sites, but a bunch of other people as well who looked at the traces in more detail.
Van looked at the data as well. It turns out that since I was using Linux on both ends,
there were even time-stamped data in the data sets I already had. So--and after about two
weeks of discussion, we [INDISTINCT] event on this, we even had to accept the congestion
avoidance was--had been defeated by all of this. Note that classifying traffic can't
really help you. There's only a single queue in these broadband devices. There's no way,
you know, the telephony services that a carrier provides, an ISP provides is being separately
provisioned from your data service. You have no way to cause you void traffic to preferentially
go over your data service and any of the broadband stuff right now. So, you know, and it--it
fundamentally can't help you. All that would at most is to--to know who or when and where
you got the name a little bit more later. Okay. Here's--here's the key dataset for broadband.
At ICSI, and I'm very--I'd like to thank them very much for the ability to show their--to
use their data. This was stuff published last June at [INDISTINCT] and again later this
fall. And this particular set of diagrams is split out by different technologies. You
can see cable, DSL and fiber. And it turns out, if you look at these carefully, you can
see that there is bad latency in all of the technologies. And in this case, green here
is half second latency. Everything above this is more half a second. And the normal telephony
standards for good audio is actually on that 150 milliseconds. So if we took this draw
line where 150 milliseconds would be here, it would actually be back here. Now, the other
thing is this is the lower bound. It turns out there was a bug in the Netalyzr and it
would sometimes miss identifying the buffer sizes. So the latencies are often much worse.
This, by the way, is Netalyzr four-second latencies, okay? So, the broadband is edge
and all the technology is broken. The only question is to what degree. Not a--not a pleasant
situation. What triggers this? All you have to do is to saturate your path, got to fill
the buffers. Those--that induces the latency; the latency while they're full. So this can
be also sorts of things. YouTube uploads or I once caught Google Chrome, with my permission,
uploading its dead corpse to Google for crash analysis. My network went to hell and I looked
at what it was doing in the background. And by God, Google Chrome was uploading its corpse.
Email with large attachments, Bittorent, got that later, file copies/backups, eventually
things that exists, lots of other things; video teleconferencing and so on and so forth.
And I'll talk more about web browsing in a little bit which is another interesting topic
in and of itself in this area. Okay. So, latency isn't the only thing that happens here. This
is also inducing abnormal packet loss, but not hugely dramatically so. But it seems to
at least some, on my circuit, do so and burst which is not so great. But the--this causes
time outs and lots of other protocols. So if you've wondered why you've seen occasional
DNS lookup failures or DHCP failures on busy networks, this is pretty good reason to believe
it's this sort of thought. What you want to do is to think what protocols are you involved
in which have those presumptions of timely behavior that's not being violated by the--by
Bufferbloat. Of course, gamers hate latency. They know this more than almost anybody. So,
ICSI has proven the broadband edge is just plain broken. But, unfortunately, I've gotten
really paranoid and I think it's almost everywhere and I'll try to prove this to you now. And
I hope you go away as paranoid as I have been since about last October or even before that.
I feared in August, in that mail exchange, what I was going to find when I poked and,
unfortunately, my worst fears were confirmed. So, along the way, I caught my--this behavior
on my home router, again, by SmokePing. Here, I'm observing eight-second latencies. This
just happened to be--any time I managed to catch it graphically, it was in SmokePing.
I can induce it trivially as I described in my blog. So, yes--so, I just wanted to understand.
Okay. Broadband is broken. Now what else is broken? So, I think the home routers are broken.
So, I started doing experiments just around my home routers and the like. So, the--so,
eventually, having tried three or four commercial home routers all of which were lousy about
this, I decided I really wanted to play with things. And I've been talking to a number
of people including Ted Cho and the like along the way. I'd already understood that other
buffering like Linux's transmit queue might be involved and so I installed OpenWRT so
I could actually twist the knobs on my own router. And I remember the day I twisted the
knob and absolutely nothing happened. Two days later, I finally realized that now I
was copying from my laptop upstream, which means that the bottleneck is 802.11 link,
which means the buffers are in my laptop. And as soon as I twisted the knob on my laptop,
the latencies come down. So, any time your broadband's bandwidth exceeds your wireless
transfer rates at home, the bottleneck shifts, okay. And so, you think you fixed it in one
place, well, there's lots of other places it can--it can be doing. So I already have
this problem in spades. I happen to have a house with a big chimney, so it's really easy
for me to get lower bandwidth at the time than my broadband. [INDISTINCT] in the processing
of doing experiments, I upped my service to higher broadband service. Okay. Well, what's
going on here, of course, is that since most of these home routers now are based on general
purpose operating systems, we--that's where this is coming from. And modern OS is--they're
not have a laundry list of places where they like to hide buffers without much thinking.
Let's do a simple calculation like 256 packets which turns out to be a number we see pretty
frequently. There's an order of three million bits, so that's, even at ten megabits a second,
is a third of a second. Now, what happens when you're in a busy conference and your
fair of share is one megabit per second or less? You can easily go from a third of a
second to where nothing works, okay. You know, so, a long with the interesting packet losses,
I also think they're probably occurring, though I'd love to see more real data on that. So,
you may say as I sort of started questioning myself, "Why don't I see this on the Ethernet?"
This is an interesting story by itself. So you do actually see it trivially on Mac OSX
in Linux on 100 megabit Ethernet. I just plugged my gigabit NIC into a 100 megabit switch copying
just to another machine, saturated the link. And it turns out that our Ethernet drivers
typically have 256 packets of ring in their 250 packet entry--256 packet entries roughly,
plus or minus some, in their interfaces. Windows is particularly interesting because it doesn't
happen or suffer from this. I wandered around Microsoft's website. After an hour or so,
I discovered that they almost certainly ran into this problem. Didn't quite understand
it, but put in a very pragmatic thing to do which is if you're, by default, Windows all
versions are raid limits, how fast it transmits to keep from saturating a 100-megabit switch.
That explains that. And as soon as, for example, I put the NIC to 10 megabits per second on
Windows, I saw of over 250 packets worth of buffering in the driver from the delay and
do the simple computation. So where does this hurt us? Well, it hurts us in lots of different
places. Potentially, in your machine rooms, though, you know, often not, in our handsets,
potentially and all over our laptops and the like. So it was a grave surprise to me, if
you understand, that this kind of queue management is necessary and buffer management is needed
in our host operating systems and it is not there currently. Okay. This is a little bit
more complicated, but it's really back to the 1990s congestion phenomena. If you've
got a lot of people sharing the same network and there's a lot of buffers all over, then
they can start to interact. And that's indeed what we see in 802.11 environments and 3G
kinds of environments. And so, you can see phenomena where the latency goes up as more
and more people use the network and, you know, peeking at some point. People time out before
the packets go away, so the time--the packets may get delivered but the people have timed
out. So, you went to observing packet loss in the way you thought it was. Their loss
was the person who moved on not because the network didn't necessarily transport them.
This is exactly what Dave Reed reported an interest about 18 months ago and that claim
that about. So, I now have--I now know that this is true for--in 3G networks, both in
the RNCs and also in the backhaul networks. So, there are problems in multiple places
there. Some of our carriers have real problems that they've not fully understood, I believe.
Okay, so, you all remember that one of the puzzle pieces handed me is why RED is not
being used everywhere, and I want to try to understand why that was or other AQM. So,
in August or so, I called up Van and asked him. Got a wonderful story. And several of--he
tried--Kathy Nichols, about 10 years ago, wandered into his office and proved to him
in a period of an hour too that the RED algorithm is fundamentally flawed, and that result is
as a hundred papers in intervening decade have proven. It requires careful tuning to
be used effectively. Since, if you tune it wrong, it can hurt you. Some network operators
have been understandably weary of turning it on when they should. And, therefore, some
networks are running with queue management to keep the buffers under control in the routers,
and some are not. You can get some of the--some of the ideas of what's wrong with classic
RED by looking at an early version of its first attempt to publish it. It did happen
to Speedy on the Internet. It's pulled the RED in a different light. The first time they
tried to publish it, it--the program committee didn't like the fact that he used a diagram
of a flushed toilet to explain the end-to-end servomechanism in TCP. The second time, it
was a blinded review. They got back reviewer's comments which said, "This can't possibly
be reasonable. The authors of this paper should go read the fundamental work on the subject,
e.g. Floyd Jacobson in 1993." Shall we say that Van tried but, you know, it didn't happen.
And so now, I'm in the unenviable position of trying to encourage them to get this paper
finished and out the door as it has so much more cogent explanation and a--potentially
a better algorithm than classic RED. So where are we seeing this? Well, we're seeing this
in any place you've got a lot of wireless networks and you see them in hotels, you see
it in some ISPs. There's about--there's a paper characterizing residential broadband
networks which specifically tried to look for queue management and the head ends of
the different technologies. And it looks like, from that paper, about one third of those
head ends are running without any queue management as well. Then we have it in 3G as I've already
mentioned. I have no idea what the state of LTE is. But I've also--three years ago, we're
seeing 22-second round trip times to over a satellite link to the Ministry of Education
in Peru. I mean, lots of places that it ought to be that isn't--hasn't been running. In
any case, there's still a little bit of trivial math. Let's--this is to try to convince you
that given the huge dynamic range of modern wireless networks, you need to be very careful.
Even when you're sharing that busy network and you might be only getting a few megabits
per second, even a packet or two of buffering is significant. But what happens if you go
to 25 packets or more? And then, unfortunately, there is various things that the 802--you
may have thought were good ideas that often are not, like trying to retransmit excessively
and stuff like that. So that's a different expression, but I'm--one I'm not very expert
at. In any case, what you get out of all this is a really ugly behavior, and it's really
complicated when you actually see it and look at packet traces. You see the circumferences
using 802.11, schools and hotels. All of us have seen the successful called hotel networks
and that sort of thing. This is sort of a--where we have the OLPC mesh network and even non-mesh
network meld. And we couldn't understand--we actually had to set up a full test bed to
try to figure out what was going on under load. And we did not understand where this
was going. I now believe it was one of the three and probably the second most important
phenomena we were suffering with, maybe the most important part for me to know at that
time. So this is the laundry list--laundry list which, of course, the real laundry list
is much longer than this. You think about places where packets can hide or buffers can
hide, you're almost certainly correct. So wireless chips themselves may have multiple
packets of buffering and the network device drivers may swirl away one or more packets.
And there are the ring buffers and that sort of thing. There's the--if you're running a
VM system on top of another, then you've got two stacks potentially of buffers one on top
of each other. Yes, so you have problems, obviously, in your 3G things, your backhaul
networks, you know, again and again. Even what you think of as Ethernet switches have
buffering in them. They have to to function correctly. Dave Reed pointed out to me--if
I remember the number correctly, he said, "Well, you know, to do a gigabit on a loaded
network and not draw packets and a switch, you need 8 milliseconds of buffering," which
I haven't thought about but it makes sense. So in any case, this gets to be really ugly.
So, there are other places you need to worry about. I was using SmokePing inside of Bell
Labs monitoring our network. And it turns out that we have very sophisticated classification
for VoIP and a number of other things all configured on our routers and no queue management.
So nothing is signaling and controlling the queues by calling the endpoints to slow down.
And for various reasons, this is going to get worse. I'll go into that I think a bit
more in a moment. Let's see. We have--there are tunnel devices where you can have this.
I've seen this in our IPSec infrastructure. The bottleneck there seems to be where it
touches ground and the firewall complex as far as I can tell from the--from the round
trip times. I expect to find on other places question marks or places I haven't actually
looked but expect to find it. You know, things like encryption buffers, firewall relays of
various sorts, you name it. Let's see. So buffers are only detectable when they're next
to the--to a saturated bottleneck. At all other times, the buffer is empty and don't
hurt you. So you could have buffers throughout your network and you only get that when--if
and when that buffer becomes adjacent to a bottleneck. All the other times they're dark.
They don't seem to hurt you. They cost you money and power, but they aren't actually
causing--hurting latency. So, you know, you have lots of these sorts of things. And, you
know, so I think I've gone through most of this. So, we'll move on. Let's see. Oh, I
hit the button wrong. So in any case, where do we have it? Well, you know, as you can
see, our poor network map is getting awfully busy with circles, isn't it? Do you see why
paranoia--why I've gotten really pretty paranoid? Okay. So--oh, I don't know, it was two or
three, four years ago, I forget exactly when, I was noting that the browser guys were changing
how many parallel TCP connections they would use. Originally in HTTP, we put a limit in
the specs saying, "Please don't use more than two TCP connections simultaneously." The reason
why we put that in the spec was at that time there were insufficiently buffered dial-up
modems in common spread use in the--in the 1990s. And so if you run a pile of TCP connections
at once, you would get horrific packet loss when too many packets arrived at this dial-up
modem banks on a loaded evening. So we said, "Guys, you know, be friendly. Do something
nice." So when this--when the browser started sprouting six or even more connections in
parallel recently, I thought about it and was nervous, this was two or three years ago,
but I had no reason to worry about it much. Well, unfortunately, it's a real problem and
it's a problem that's now being compounded at the server end by the changes that various
people are making or want to make with the initial congestion window to raise that from
four, I guess, the current proposal is 8 or 10. I don't know the exact state of things.
And the Google complex, the last I knew was set at eight. Due to a mistake in a--in a
[INDISTINCT] up the IPF, there was a blog post where somebody who had seen this noted
it. The Microsoft website was--they had infinity. That turns out to have been a broken load
balance around the pit stop. So please don't--we're all in this bloat together, please don't play
with them at this point. I've been assured that it's been fixed or is being fixed. The
problem is, if you go to an image heavy webpage, what happens? The browser opens up with a
pile of TCP connections and simultaneously makes a request on each and back comes some
number of packets, at least a word or four. And, of course, if the congestion window is
larger, it may be more like 10 for each of those connections. But, of course, with shorted
websites, things like that, it may impact. Usually, it would be many more than that.
Some browsers and some circumstance would bring more than six connections at once. It's
easy to get a hundred or more packets, you know, effectively coming back at your poor
little--lonely little broadband connection to arrive essentially simultaneously and they
go splat. Now, there's enough buffering there that these buffers that are in our broadband
edge will absorb these huge impulses, but it takes a very long time for the bits to
trickle out from there. And so even on a 50 megabit broadband service that I now have,
I'm observing at times 150 milliseconds of bursty latency when I hit an image of the
webpage. Now, if you ever had delusions, you'd like to have high quality telephony or voice
or video teleconferencing as I would like to have, and I think many other people here
would like to have, you'd sort of like it if the other people around you or even yourself
in routine web browsing didn't cause dropouts in your--in your audio and video and if you
didn't have so much latency in your connection to get around it, but it's like talking to
the moon, which it often is. We tried to deal with this a long time ago in HTTP 1.1. We
have the thing called pipelining. It's never been widely deployed in browsers. Almost all
servers support it to one degree or another. Opera has had support for a while. The Mozilla
folks seem to be having some pretty successful experiments with finally running it recently,
only a decade or more later than it should have been. There's a lot of brokenness that
is prepped in the meanwhile, so there's yet more band aids that you have to do to work
around the broken websites. In any case, I want to warn people, both working on the web
server side and the browser side, the two together are multiplying. Guys, we need to
step back and think about this multiplication. It's not good if you bump from two and four
to go to six or more and 10. Big difference, guys, and we're in the middle of doing this.
Is this good? I don't think so. Okay. So then there are some other interesting subtleties
about all of this. As I was thinking about all of this, I realized, "Huh." Actually,
it was Dave Park, I guess, in conversation said, "Oh, at BitTorrent, we do this too."
So I believe, and nobody so far has come to me and tried to claim I'm not right, that
at least part of BitTorrent's problems were incorrectly diagnosed and all of this. The
buffers were already very large in some of these devices when BitTorrent deployed. And
it turns out that Windows XP does not enable window scaling. So you it never gets more
than 64 kilobytes in flight at once. That actually takes multiple TCP connections just
to saturate a typical broadband device. So I think they misdiagnosed this. And the fundamental
regulatory thing from my point of view is that operational problems in the internet
should not be able to be kept secret. I think they were getting more and more complaints
from customers or costing them in a year with more and more service calls from the kids
who were downloading whatever and their parents were calling up at the network's program.
I don't--so that's sort of point one. I think this helped trigger--I've looked at the timing,
I think that this is true--helped trigger the whole discussions we're now having, sometimes
the [INDISTINCT] around network neutrality. The second thing is it turns out that all
of our broadband technologies actually don't use the data channels for telephony services.
DSLs an analog split, the cable guys have put it on a separate channel. Same thing is
true, I'm told, about fiber. So this means that conventional SIP VoIP or Skype is at
a fundamental disadvantage right now to what ISP can provide. I don't believe this was
intentional. I know the network--the conspiracy theorist will have a field day. I really don't
believe them, okay? I've just been arguing with such a person recently and I've, I think,
convinced them. But the fundamental issue out of all of this is if we want to deploy
anything with reasonable latency, if we don't roll up our sleeves and get this fixed, we're
not going to be able to play these services and have them actually work. We got more and
more--XP is going away, so there's higher and higher fraction of applications or able
to trivially saturate these links. And we have more and more applications that are saturating
the links. This is getting worse. Okay. So what can you do about it? So some of you by
the time you go to bed tonight, some of the pain, you--when I say you, I'm talking about
people in the audience here, the typical hacker here--can start to deal with this. They're
not ideal mitigations, but you can help--you can do this a lot quite easily with a lot
of the metal range or high-end home routers have--you'll find buried away under the QOS
Section sometimes, sometimes it's called bandwidth shaping, maps that you can twist. And you
want to set them below what your ISP actually provisions for you. And if you do that, you
can keep the buffers in the broadband gear from filling, okay, at least under many or
most circumstances. Now, this unfortunately defeats things like power boost switch or
turbo boost, whatever they call it. I never get that right. It also can't fix everything
in the broadband connection. So, it's--but you can go do that immediately and I'll show
you some results of that in a second. You know, obviously, educate people that, well,
you know, you probably need some queue management in your piece of the network. You better turn
it on. That's always entertaining because people who deal with operational networks
are justifiably very reluctant to go messing around with the router settings, particularly
with any thing that might ever conceivably hurt them. We have to fix. We need better
queue management algorithms. Part of a problem here is that RED is not a great algorithm
and it--Vint says it has no pair of working in 802.11. There are several other alternatives
that's worth exploring, but we need to get moving on this. So, let's see. So that sort
of thing. So this is where I started, okay? This is the first--the first SmokePing I showed
you. So having twisted my knobs, come to some cost of my bandwidth, significant cost of
my bandwidth, what I end up with--notice this is still 1.2 seconds. So now I end up with
this, a little bit nicer, guys. Okay. So there I am with, you know, variance of little, you
know, 10 or 15 milliseconds. That's more I like it guys. The 20 milliseconds, this happens
to be the--from where this is being observed from. So it really has made nearly [INDISTINCT]
of magnitude difference in my latency endeavor. So, that's sort of nice. And many of you could
go do this today. First thing you do is run Netalyzr when you get home and it will tell
you things like buffering. And there's also the measurement lab tools which are specifically
trying to test this specific thing. Okay, we have--yes, Dave.
>> So those are all the [INDISTINCT]? >> GETTYS: Yes, that's what I did myself to
make my home network work well. >> Okay.
>> GETTYS: Okay. It's not the end of the story because I still have the wireless hub.
>> Right. That's [INDISTINCT]. >> GETTYS: And this--that was just to make
the broadband problem go away, okay? I still got the home router and host problem, okay,
which we have to fix that too. We have to fix our operating systems. That's why the
home routers are broken and so on as well. So--but if you carefully make sure that you
always have more bandwidth on your wireless link than you do in your broadband link, the
bottleneck will always be in the broadband link. So if you can't control something, if
you can at least shove the bottleneck into a place where you can control it, you can
win temporarily. >> [INDISTINCT]
>> GETTYS: There's nothing--there's very little new here, Dave.
>> Oh, okay. >> GETTYS: This is--this is assembling a jigsaw
puzzle. That we've defeated TCP, congestion avoidance, that's a--that's a real surprise
or a few little puzzle pieces that we knew. It's assembling the pieces here. That's the--that's
the whole bit here. So this whole thing is getting worse. There's this downward compatibility
constraint that hardware vendors have. We've gone through multiple generations of Ethernet.
And so the values that were selected for--to make things go fast over a long haul, big
pipes, to super computer centers, that's where the knobs were tested because that's where
the research funding goes, has been to go fast has caused us to set our knobs for condition
that, in fact, we typically don't have. And each generation, it's been getting worse.
It's not even been scaled by the amount of bandwidth of a particular media at all. So
we have this in Ethernet, we have this in 802.11, DOCSIS, so on and so forth. So that's
a piece of it. The next is more and more apps are saturating, including web browsers in
little bursts are saturating the edges. XP is going away, so that's been a lot of waste,
but this is why you may have heard reports of Vista or Windows 7 performing more poorly,
I know I've seen some of these reports, more poorly on a home network than Windows XP did.
Okay. There's--and, of course, the other thing is that memory is now just, for most places,
very cheap. And so people don't even think about it. Okay. So you can't get memory chips
small enough if you did. So there's always too much memory. So why don't you just throw
it in there, seems to be the attitude. The other major--another major problem is that
our typical tests are not testing for latency under load, and so it's not visible to people.
So, just go to use--go to speedtest.net and copy a file from your host to your favorite
Google server sometime and see what speedtest.net starts telling you then. You won't like it,
but the tests aren't doing that. Needless to say, we've had certain conversations with
certain people about trying to cause some of the tests to be--make this obvious. This
is not hard to test for. Then there is this other interesting phenomena we know is going
on. If you have a hardware vendor who's making a hardware and their hardware indoor firmware
is slightly buggy, if they can paper over a performance problem they have to make some
arbitrary bandwidth spec by doubling or quadrupling the buffer size, memory is free. We already
have one on it, it won't cause us anything, we'll just turn it up to get that bandwidth
point. I know this is occurring from talking with people who are in the position of putting
people onto approve vendor--to an improved vendor's list for a major ISP. So that phenomena--this
is the one that got Dave Park really upset. This is the thing where he realized we have
to start pushing back on this immediately. So we really have to--and I'd love people
to think about this, what's the right marketing metric. They've stolen the speed word from
us. Speed used to be really latency, but it's really, you know, they now--they now say that
the capacity of bandwidth is speed, it's not horse. What's the right marketing metric?
Do we really need two numbers or we can invent one that can both somehow capture bandwidth
in relationship to the latency as well? Bob Brisco had some interesting ideas there. I
need to push on a bit more to see if we could get to a simple, nice marketing number. So
we really have to change the marketing dialogues so that when people go in to buy kit in a--in
a store, that that's one of the features on the boxes and that sort of thing. But what's
the right figure in there? We really need to have something simple to change the marketing
dialogue. >> [INDISTINCT]
>> GETTYS: It gets complicated. We can have a larger discussion. Yes. I mean, it's latency
[INDISTINCT] path begins at it because they're only measuring the one [INDISTINCT]. So you
see the--that's the sort of thing. So I was at the IETF. I gave essentially this talk
in abbreviated form at the transport area talk. This is some of the stuff that I put
together for that talk of what the IETF should do. People here have a lot of these puzzle
pieces that they can help worry about. You could immediately start mitigating problems
in various places. Please do. Obviously, worry about your handsets, both wireless and cellular.
Or if you got any middle boxes of various sorts, you might think about them, market
pressure on vendors. You're in a good place for that, too; browsers where we already have
the long discussion about browsers and the like.
>> [INDISTINCT] >> GETTYS: Be that as it may, I'm…
>> [INDISTINCT] >> GETTYS: But the 1.1 is already hugely,
widely deployed. I don't have anything against Speedy at all because 1.1 is out there.
>> [INDISTINCT] >> GETTYS: In the long term, too, but they're
short term things. It might helpful in the short term.
>> [INDISTINCT] >> GETTYS: The enemy of the good is the perfect.
>> No, no, no, I think [INDISTINCT]. >> GETTYS: It turns out that ultimately the
latency starts being more dominated by a total number of bits and running lots of TCP connections
starts wasting new bits. >> [INDISTINCT]
>> GETTYS: Yes, I understand. I mean, look--look, I wanted something--well, I tried to get people
to worry about something like Speedy in the 1990s. I'm not--you know, fine. Just get it
done. That's the point. I'm not arguing against it all, just--we needed out--we need things
like that out there. >> [INDISTINCT]
>> GETTYS: Be that as it may, in any case, there's lots of research to do here. Can we
deploy ECN? That's an interesting question, given they're working this out there. That
would be wonderful to not necessarily throw a packet away. Particularly when you're talking
to various wireless guys, the concept of dropping a packet on the floor, they've sweat blood
to get across some highly [INDISTINCT] link, it didn't go down very well. This--politically,
I can tell you if we can--if we can deploy ECN safely, it's going to be a much easier
solve in certain parts of the industry. >> [INDISTINCT]
>> GETTYS: All right. So what else is going on here--I'm trying to wrap this up and get
into general questions, then to ask that I start just getting it out there because conventional
publication would--is going to take too long. So I gave talk in Marie Hill and at the IETF.
I'm here this week because, you know--so what brings you here, Vint asked me to do his next
upcoming iTripoli internet computing editorial, whatever that's called. I forget what it's
called, X Space or something, I don't know. There's an ACM queue case study being put
together for this. Some of the--some of those papers wound up in CACM. We set a website
which has--you know, which has bug tracking and mailing list and all that sort of stuff.
One of the possible AQM algorithms is, I think, called Stochastic Fair Blue that have been
sitting out of tree in Linux for several years. It's not actually up tree in the latest Linux
release. We need to start testing that. Vint also pointed us at some very interesting buffer
management work done by folks in Ireland, Tan G. Lee, and I don't know the guys' names.
Tan G. Lee is some place in the United States now. Now it's rather called eBDP has been
re-implemented and needs testing. I'm actually running it online, Linux machine right now.
It does seem to be working better than the--where we had a little bit later. John Linville,
who's the Linux wireless maintainer is maintaining a "debloat testing" tree so we can integrate
patches for people to not have to necessarily patch the kernel [INDISTINCT] for things.
Dave Todd who's in the room here has been working mightily recently to get us to the
point where OpenWRT can take that kernel and start running more debloated. We need lots
of help from people on testing this stuff and making it better. At the moment, we've
decided that for various reasons, since we've got both operating system problems and device
driver problems both, to try to keep our life simpler. So the initial home router target
is a WNDR 3700. It costs about $120. You can go buy one today and help. The cable industry
has already started to deal with mitigating buffer bloat. At the IETF, I found out that
they've at least now are able to control with the--or will be able in the future--to be
able to control how much buffering is in the cable modems from the CMTSM. So the problem
will get somewhat better. It won't be solved rather than getting worse because they were
already testing 150 to 300 megabit modems that would have had buffering--that would
have had buffering that when plugged into, say, a 10 megabit per second or provision
to 10 megabit per second would have been bloated by factors of 15 or 30 at a minimum. So this
is at least one step in the right direction is already happening. For the cable guys,
I applaud them for that. There's just tons of work to do all over our systems. Please
come help. You know, please help, you know, in all the different places. And, you know,
so it's, you know, all--you know, we all are together in this--in this problem. So what's
the picture? So what's been killing latency in the internet? It's the dread, dread, Puffer
fish. Oh, for those of you who don't know about Puffer fish, they are highly poisonous.
That's what--they are what the Japanese call Fugu, which is--which people sometimes like
to slightly poison their lips with and sometimes more than their lips. So that's been poisoning
things. It's bloated, it's spiky, you know, that sort of thing. So that's what's been
going on. And it's time for general questions. Yes, Dave. How can YouTube help me? Well,
you know, we're recording this talk today, so we need to get it up on YouTube.
>> What can I change? >> GETTYS: What can you change? That's an
interesting question. When I was--when I gave this talk at Apple this morning, they--there
was a person in the audience there who said I should draw to your attention that YouTube
seems to be sort of piling as many packets as possible into the network all at once and
that the Flash Player that Google uses is actually friendlier. Whether that's true or
not, I don't know. But it may not be true, as I said, I don't know, you know, so.
>> [INDISTINCT] >> GETTYS: Yes. Well--and, of course, all
these things, when you've got big buffers hidden in various places means that all your--all
your control loops of trying to figure out how they're supposed to behave are going to
be really hard to make work well. And they're going to be all over map because the amount
of buffering is all over the map. Okay. Yes, Dave?
>> [INDISTINCT] >> GETTYS: Yes, and that poisons all these
little protocols; the DHTP, the RA, the--you know, you name it. All the little background
things that don't take much bandwidth, they actually care about those delays. Yes, sir.
>> [INDISTINCT] >> GETTYS: Okay. So, the story I heard from
Dave Oran is--as I met with him for lunch, he's an old friend, last fall was that about
ten years ago, there was a certain Taiwanese vendor of home boxes that from which a lot
of the home kit was descended for a long time and spread, where if it ever saw an ACM bit,
it would just out and out crash. And that's not the only instances that we've seen. Some
other instances--there used to be an ACM Hall of Shame, which unfortunately has gone missing
on the web. The database rotted or something. It used to be somebody is keeping track of
the broken stuff. Steve Bauer, MIT has been studying this in a really major way. He's
probed the top millionaire [INDISTINCT] websites to see what's going on there and he's looking
at some other stuff. He wrote--there's some preliminary reports on this at the CAIDA workshop
in February and my blog as a pointer to his slides. So you can see a bit of the work in
progress he's doing to try to understand how much is broken. The actual state of deployment
on the server side is going decently well. It's going for about 1% capable of supporting
ACM to, I think, the number was 12 to 14% in the last two years. And if you look at
the timing of when the operating systems released with server ECN support turned--you know,
enabled--in other words, this is--a server is willing to talk ECN if the client says
it wants to. If you look at the timing of that, that's actually not too bad a result.
It does seem to be getting deployed. And, of course, particularly in particular point
areas like the handsets and the like where the traffic patterns, so it would be quite
different than from the home where a lot of the broken kit may be. We may be able to use
it, we may not. I don't know. I mean, that's research that we need to do to know if we
can start really using it. >> If you'll forgive me for intervening, I
want to make sure that we have a formal thank you for our speaker so that I can sneak out
before more dirt is cast at the TCP than already has been cast.
>> GETTYS: Oh, by the way, anything [INDISTINCT] these buffers, UDP can just as easily follow
me. >> I know, I understand. Okay. I'm just making
a [INDISTINCT]. Anyway, Jim, I really appreciate your coming out and raising this as issue.
I hope there's something that we can do to contribute to the solution. So let me formally
close the session. You can stay here for as long as you can stand it than people who are
wanting to ask questions. But let me officially thank you very much for your time.