Latency and Cost Tradeoffs for Efficient Peer-to-Peer Assisted Content Distribution

Uploaded by GoogleTechTalks on 23.11.2009

SHAKKOTTAI: Thank you all for being here, and thanks, Andre, for having me over. My
name is Srinivas Shakkottai and I teach at Texas A&M University. The kind of stuff I've
been doing. So I got my PhD at Illinois Urbana-Champaign in 2007 and I was a post-doc at Stanford for
about a year and I've teaching at Texas A&M since early 2008. So I've been there about
a year and more, about two years now. So the stuff I'm going to talk about today has to
do with latency and cost tradeoffs, in particular, for peer-to-peer assisted content distribution.
But there's a much more general problem here on queries and how you kind of distribute
them, you know, based on the latency that they suffer. So before I get into the talk,
I just want to kind of give a little overview of the kind of stuff I've been doing and where
it fits in with networking in general. So this is a picture of the Internet created
by Brad Huffaker at CAIDA. So CAIDA is the place where I interned a couple of times.
So I've been wandering up there a bit and I spent two or three months in a whole bunch
of places. Now, the picture here basically shows the Internet at what is called the AS
level, the autonomous system level. So basically, what it indicates here is that, you know,
the location, the radial location, of the particular autonomous system is based on the
kind of number of customers it has, the amount of traffic it carries kind of stuff, and the
location on the ring is based on geography. So this is North America, that is the Atlantic
Ocean, Europe, Asia, Pacific Ocean. So, the reason why I put up this picture quite often
is that when you look at this, you see that there's a tremendous amount of interconnectivity,
you know, between multiple different locations. And very often, we don't really exploit this
diversity, you know? What I need might not be just in one location but it might be in
multiple locations. So really, the kind of stuff I've been doing, kind of in the space
of network control and network algorithms. I'm going to talk about content distribution
today. But in general, I've been working, for example, on protocols which take advantage
of the multiple routes between a source and the destination. I've been also looking at
how to provision QS, you know, on a end-to-end basis. I've been looking at access methods.
For example, if I'm sitting around with my laptop, I see multiple different access point.
Can I somehow split my traffic and take advantage of the fact that I might have diversity in
terms of number of channels available to me? Multitask takes the advantage of the idea
that there might be multiple people interested in the same thing. So why should I transfer
multiple Unicast to somebody when lots of people want it together? In particular, the
application we were looking at there was kind of like in a football stadium where people
have smartphones and people want to look at pretty much the same things, scores perhaps,
replays, and that kind of thing. So this is all has to do in the space of network optimization
and control. I also have been working on network economics. How do I price this thing? Each
of these kind of boxes is what--is an autonomous system. Google, for example, is an autonomous
system. The guy in the middle over there is Verizon business which is AS number 701. It's
had a long history. It used to be UUNET and Debian Planet and finally Verizon bought it.
Now, there's a good deal of competition between ISPs which operate at different levels. So,
good pricing models which are stable is a very interesting question, this plays right
into net neutrality because the economics has to do with the QIS as well, because you
cannot have QIS unless somebody is going to pay for it and you can't have anybody to pay
to for it unless you give them QIS. So economics and technology really ties up in the space
of the Internet economics because you need good pricing schemes, you need to know what
ISP competition is likely to lead to and that kind of thing. Finally, and this where Andre
comes in, I've been working a fair bit with CAIDA on network topology. Why this thing
looks this way, how does it evolve with time, understanding the performance of protocols
in the real world, and that kind of thing. So I can, if people are interested in this
space, I can of course spend some time, I'm here till the end of next week. So the thing
I started with just a little while ago was if I want something why should I go to the
other end of the world to fetch it? This here is Houston, which is close to college association
which I'm at, and that is Bombay in India. Now, if I want a piece of content, there is
no reason to ship it from Bombay, you have to get--you can get it from multiple different
locations. Indeed, if you have queries to be processed, you don't need to process it
at one particular server farm or one particular cluster you might have multiple different
locations you might be able to process it. So the question really is, how do we take
advantage of the diversity present in where my query or my content can be serviced from?
So this is a little picture which is generated by Pando Networks, which is a startup with
whom we have had some contact. So they basically illustrate the fact that, you know, when the
Internet was created, like this was a little earlier, but roughly speaking, the Internet
really took off in the early '90s. The big applications there were FTP and email, right?
So the idea was because the Internet was created to ship large datasets between, you know,
big labs like cyclotron data, which is these huge mass of datasets. You ship them from
SMAC in Stanford perhaps to Illinois where there have a supercomputer center. So you
know this FTP was enormous and email has a--was a--was in existence, and then people realized
that, "Hey, we can use this Internet for many other things." In particular, the Web started
growing up. And so Web traffic really took off in the mid to late '90s, but more recently,
the understanding that queries can be processed from multiple locations. So if I have a file
on my machine I can give it to somebody else. If somebody else has it, they can give it
to more people. So in other words, using end hosts as servers has really, really taken
off in the past couple of years. And you noticed that in 2006, for example, a good deal, about
70% of the bytes that these guys measured, various different measurement traces seem
to indicate anywhere between about 60 to 70 odd percent or 75% of the byte from the Internet
seemed to appear for peer-to-peer traffic. Now further, this curve stops at 2006. More
recently, there was a report in NANOG this year on the fact that there is a good deal
of video traffic growing up on the Internet. And so the P2P in North America for file sharing
seems to be going down but peer-to-peer for video distribution is really popular in East
Asia. In China, there are these things called PPLive, TvAnts, QQLive, and so on, all of
which, which you are using now, peer-to-peer ideas for video distribution. So the question,
high level again, is how do we use kind of a distributed resource like peer-to-peer servers
or multiple clusters in a server farm? The two servers might request rather than servicing
it in just one location. So, I'm going to talk about--these two, I'll talk about in
fair detail and the third one will be a little less. So the first question I'm going to deal
with has to do with latency and cost in so-called ISP friendly peer-to-peer. So idea here is
that, you know, I can service my query at multiple locations. There is latency at multiple
locations and there's a cost to get there. This might be a transit fee, it might be roundtrip
time, it might be a bandwidth cost. So the question is, how do am I going to do load
balancing in such a system in a very simple fast and distributed manner? So I need to
do this in a parallel fashion. Now, the second question I'm going to talk about has to do
with file sharing. So very often when things get popular, they get popular very quickly.
This is called a flash crowd. I'll give a little more idea of what it means in a little
while. But really, we're going to understand what constitutes a flash crowd and how do
I service such a crowd. Do I use a server alone or a fixed capacity content distribution
network? Do I use peer-to-peer or do I kind of join them somehow together? And finally,
if I do get time, I'll put up a slide or two on using peer-to-peer for video delivery as
well. So, the first problem has to do with latency versus cost in so-called ISP friendly
peer-to-peer. The motivation is the following, supposing I have multiple different Internet
service providers, say Comcast is one of them, Suddenlink is another, Roadrunner is one more.
To the peer-to-peer, when I asked for a list of peers to communicate with, what it gives
me is a list list of peers from all over the place. So in particular, if I belonged to
Comcast and I get some peers from there, some peers from there, some from here. Now, is
that good or is it bad? Now, it is good in the sense that it gives me the best peers.
I get the peers who are most likely to be able to give me something worthwhile. It is
bad in the sense that Internet service providers have a cost of transit traffic between each
other. So if I have to move traffic from here to here, it costs something. If I have to
move from there to there, it costs something and so on. So Internet service providers had
been kind of trying to kind of tamp down on peer-to-peer traffic due simply because it
increases between ISP costs. But if you just look at cost alone you don't realize that
the latency of users can suffer if you just force them to live in one particular ISP.
So to give a clearer description of what the problem is, we would like a system which takes
into account both transit fees and latency. Now, this is very important to this--the distributed
query processing as well, which I will just mention in just a second. So I can, you know,
reduce transit fees by simply localizing traffic. I just keep it local, no transit fee. I can
reduce latency by choosing the best guy who can serve me. The question is; how do I handle
both of these together? How do we achieve this optimal operating point? So this tussle
between fees and latency is what I want to talk about and there's a bit of work in this
space. Very often it's kind of heuristic. It doesn't quite capture the essence of the
problem in my opinion. But nevertheless, there are some solutions which work fairly well.
And what I'm going to talk about is a little bit about the analytical solution to the problem
and how we implemented a very nice distributed solution for this. So here is my problem.
So supposing, for example, let's say I'm thinking of a query processing location. So, basically
these guys are all front end clusters or basically, when my query shows up, it is handed off to
some particular front end. These requests which are coming down here are all the queries.
We'll use a pen. So, for example, this is a particular location. There's a fair number
of requests which are coming to this front end. Now this front end, usually what we would
think is it is connected or is associated with some backend processing engine. It could
be a cluster of some kind, so all these nodes could be some kind of processors, and this
front end's job perhaps is to assign these queries to that backend cluster. Fair enough.
If this just happens if there is just one front end and one backend available to it.
What happens if I have multiple locations? So here what the objective to this is, I have
lots of request coming in here and a fairly limited processing capability here. Here,
I have fewer requests but a larger amount of processing capability because I have more
nodes there. So really, I should--I would think that, "Hey, I should send some of these
requests over there so that they can be handled quicker." However, the quickness or the latency
of handling is not quite everything because there could potentially be either a roundtrip
end latency between these locations or a bandwidth cost or in my ISP case there could be a transit
fee between. So really, it's not purely about latency it is both about latency and fees.
Now, these--the other is just the same, exactly the same ideas are applied to load balancing
of queries. All queries were chopped, they have be handed off somewhere for processing.
Do I hand them off to my local cluster here? Do I hand them off somewhere else? What do
I do with it? So, that is really the requirement. This business of steady state and transient
refers to the fact that there could be clusters for whom the request rate is less than the
available capacity, while there could be other clusters for whom the request rate is bigger
than the available capacity. So clearly, I should hand off some part of this over here,
but what fraction? How do I do it in a dynamic fashion? How do I measure the latencies in
a feedback fashion? How do I do it in a simple fashion so that it converges quickly and efficiently?
So that is my first problem here. So, they have a little bit of notation here. So, all
I'm saying is that this superscript refers to the backend cluster, while the subscript
will refer to the front end which does the assignment. So really what I'm saying is,
any cluster, we assume the cluster named "i" is assumed to have "C" sup "i" queries per
unit time capacity. So for example, this cluster out here would have "C" sup one, C1 amount
of capacity for processing queries. The request arrived at the front end server, which I'm
calling a multi-track, mTracker, or whatever the front end server is. At some process,
it could be a Poisson process, it could be anything really, with certain parameter as
"X" sub "j" queries per unit time. So to go back to my picture, I might have request coming
in here at "X" sub "1" request per unit time. So the question is, what do I do with this
"X" sub "1" request per unit time? Do I send them local? Do I split them up? What do I
do with them? So the important thing is I actually don't need to know either of these
for my algorithm to work properly. I can work with measured feedback latency signals as
I'll illustrate when I get to the algorithms. So anyway, so we have a kind of time scale
separation argument here, which are the large time scale, the load changes. So in my P2P
system where I've taken several measurements, it's--the things that you get popular get
popular together. So of the order of like 20 to half an hour, 20 minutes to half an
hour, nothing much changes in terms of the number of request per second. So I can think
of each of these as kind of half an hour timeslots. Now, the capacity of the systems, each peer-to-peer
cluster, are--if you're thinking of backend clusters, the clusters themselves could be
dynamic and that some things could go online, some things could go offline, and so on, so
I assume that in the order of minutes the cluster capacity or the peer-to-peer cloud
capacity itself can change. So, this was like half an hour intervals. This could be something
like 10 minute intervals. Finally, I have to do my traffic assignment. Where am I going
to route my traffic to? This I do of the order of seconds. So really, I'm first going to
just deal with this. Looking at capacity as constant, load as constant. And then, I'm
going to go up here, and say, "How do I change the capacity?" Finally, I'm going to talk
about how does the load change? So, the notation which we had it's exactly the same thing that
I had earlier. So we have this "X" sub "j", the request rate at the front end server "j"
in number of queries per unit time and an assignment metrics "X" is the way I assigned
this. So what does this mean? So I have this front end server "j" and it has multiple options,
let's say, he has three options. So then, he has to decide how much of the traffic he's
going to send to option one, which will go to the cluster one, how much to send to two,
how much to send to three and so on. So basically, this is the way the front end server "j" partitions
his load. Now, there are multiple front end servers so each one is doing this partitioning.
We call the whole metrics of all the partitions, the assignment metrics, as capital X. So is
there any question on what this means? Basically to give another illustration of what's going
on, I have an "X" sub "j" coming in here, I have an "X11" going in here, an "X12" going
there, an "X13" going here, potentially. Similarly, this guy can send traffic to each one. So
this, the assignment metrics would be a 9x9 matrix, because each guy has potentially three
options. And so, you can have--some of them could be zero, of course; but potentially,
these are 9x9 assignment matrix here. >> I think you meant 3x3.
>> SHAKKOTTAI: Oh, sorry, 3x3, a total of nine. Yes, sorry. So there we have our assignment
matrix. The question is, what am I going to do with this? So again, to just recollect
the notation, the origin is always a subscript, this is the front end where the query is actually
received, and the superscript is the destination to which the query is routed. So that's the
notation all over the place. So "X" sub "j" sup "2" means that the "j" origin will route
this much amount of load to that particular cluster. And obviously, I cannot drop any
queries so the sum of the partition or the sum of the assignments must add up to a load
at that particular front end. So the question is, how am I going to do this assignment?
Now, in order to understand how to do the assignment I need to have some idea of what
latency is likely to look like. So what is latency? So really, I send you a query and
after a little while you respond. That little while or a long time depends on how many others
are also querying the same thing. So this is a very simple latency function. It comes
from queueing theory, and it's really straightforward to understand what's going on here. So basically,
the idea is, I have an engine which can respond to queries at rate, "C" sup "i" queries per
unit time and I feed in a load at a rate "z" sup "i" number of queries per unit time. These
things go and setup and queue up, and this query engine keeps processing them and throws
them out. The question is, somebody comes in here, he sits there, he kind of goes across,
gets served and gets out, so question is how long is he going to sit there? What kind of
shape of the function is this going to look like? And the answer is it always looks convex
increasing because the idea is, in fact that if you look at that expression, as "z" is
much smaller than "C", you have very low load. As "z" comes close to "C", the denominator
goes to zero, therefore the whole thing goes towards infinite. So the idea is, as I increase
the load on a particular server, the latency, the time between the arrival and departure,
is going to increase and our convex increase as well. And this is true not only of such
function it's true of pretty much any system you choose to measure. Like this is true for
deterministic--for a deterministic job size. It is true, for example, in peer-to-peer systems
which I've got some measurements on. It's pretty much always happens that as you get
closer and closer to capacity the number of the latency involved. In fact, if you look
at the 101 at six o' clock you'll see this in action so, you know, as you get closer
and closer to capacity, you have huge latencies. So, the total latency, which is my unhappiness
of my users, so the per user latency was that, and the number of users per unit time or number
of queries per unit time is "z" sup "i," so the total latency suffered by all the users
getting in here is this. So--well, how does that play into the picture of this distributed
cache--distributed query processing? So, that was my picture right? I had, you know, traffic
coming in or request coming in from multiple locations to a particular front end and that
in turn went to the backend to be processed. So how much traffic is coming in? So, we have
basically, traffic from there, there, and there to be summed up so basically, we have
traffic from front end "r" coming to cluster "i". So on the cluster "i," the total load,
is just the sum of traffic which comes from the various different locations. So in that
particular diagram, the cluster number one is handling X from two to one, from three
to one, and one from one to one itself so that's why I have to sum all the traffic which
comes in. And of course, the total latency has some expression. I'm just using the MM-1
expression here for illustration. So I just stick in "z" is equal to the total load coming
in and this is my total latency experienced at some particular cluster. So that is my
latency. Great. What else do I have? I have this transit fee. As I said, a transit fee
can be roundtrip time on the link, which is potentially propagation delay plus queuing
delay. It could be the bandwidth cost, because I have to pay somebody for bandwidth cost.
In my guess, what I'm saying is that it's a transit fee between ISPs. So whatever it
is, there is a certain fee which is charged or which is the cost incurred for transferring
users from one location to another, and that is charged on a per query basis. So if I transfer
one user from "r" to cluster "i," "r" is the front end where the user actually showed up,
and "i" is the cluster where it finally gets processed; per user, it cost me that much.
So again, we have to find out how much is the total transit fee to get to a particular
cluster. And that we do by simply summing up the total load. So in other words, I have
traffic coming in from two to one, from three to one, and from one directly to one. One
to one cost me nothing, two to one cost me per user something, three to one cost me per
user something, and that is really what is being summed up here. So this is the total
transit fee or the total cost in getting to the cluster, right? So much for that. So what
is my system cost? My system cost that I look at is the total latency plus the total transit
fee. So we developed an expression for the latency which is basically load divided by
"C" minus load. So if you get rid of this, this basically says for any particular cluster
"i," what we have is the latency on that cluster plus the transit fee to get to that cluster.
And then, I have to add up all of my clusters because that is my whole system. So, if I
have multiple locations each location experience is a latency, a total latency. Each location,
there's a cost to get there, that is the total cost of a particular cluster, overall clusters,
is the sum. All right, so we need to design a controller that somehow minimizes this.
This really is the unhappiness of the world as far as I'm concerned. So, how do I minimize
this unhappiness, right? It costs me and latency cost my users because they get unhappy. Transit
fees cost me because I have to pay it all the time. So the question is, how do I design
such a controller that is both simple, distributed and fast? So, these are the three things.
Distributed in particular is very common among network optimization folks simply because,
you know, there are millions and millions of flows on the Internet. There is no way
on Earth you can do an optimization with all those flows sitting in one particular, you
know, central computer. So, a distributed processing of such optimization problems has
been a big question. And there's lots of ways of handling this. Now, so the objective really
is, you know, this is my unhappiness that is the lowest unhappiness that somehow I want
to attain and I want a distributed way of getting there. So this makes me as little
unhappy as feasible. Now, there's a whole bunch of controllers which can do this--oops--so
the question is how do I solve it? And there are lots and lots of optimization techniques,
and I'm going to pick one which is particularly useful this problem. So usually, there are
some which are fairly complex in terms of the computation. You can do centralized computation,
you know, in one big machine. And these methods, although they give you very accurate solutions,
they require a lot of processing in a particular machine. So the other methods which I've put
down here, these are some of them, are basically distributed methods. So there's Newton's method,
there's log-linear learning, best-response which means do the best thing given the state,
gradient descent and so on. So what I'm going to describe is actually a coupling of these
two, which is particularly suited for this state. But really, what I wanted to say here
is there is whole bunch of tools which have been developed in the kind of OR optimization,
the operations research literature, which are useful for such problems. And so very
often, when we solve network optimization problems, you can go and you know, look in
this bag full of controllers and pull out one which is most suited and then show that
it's suitable for your particular application. So, I'll tell you what my controller is going
to be in a second. But before doing that, I need to know what the kind of per unit cost
is going to be. So what does this mean? I told you that the total cost of the system
is this, the total latency plus total transit fee. Now the question is, if I take a deficient
on how much load to send to a particular cluster, how does this cost change? So for example,
I have two clusters and I have an arrival rate of, you know, request coming in. Now
if I move some of the--some to one cluster and some to the other, maybe it's good, maybe
it's bad. The question is how good or how bad is it? So the question is, if I change
this, if I change my assignment on a particulate dimension so I move a little from one to the
other, how does the cost change? Is it going to be simple function? Is it going to be messy?
How do I this? So the answer is, usually, in fact, in this particular example, it's
actually very simple. So, all I need to do is basically, I need to understand the impact
on the cost of changing one dimension. So it recollect "x" over sub "j" of "i" was the
amount of traffic which "j," the front end "j," was sending to cluster "i." So the question
is what impact does that one dimension change have on the overall cost of the system? And
the answer is obtained by partially differentiating one with the other and the answer has a really
nice form here. What it says is, I call this the marginal cost, marginal cost of a cluster,
the front end "j" on the cluster "i," and this marginal cost has three terms, one, is
the marginal latency. I send people there, they become unhappy. So how unhappy do they
become? I have a per user unhappiness. What--the next thing is the transit fee. I send one
guy from one place to the other, it cost you "P-j-i" because it costs you per user that
much. So in other words these two are very intuitive. I send one person from "A" to "B."
If he gets unhappy there's going to be amount of latency experiences at "B." If I send one
guy from "A" to "B," the amounted cost in terms of transit fee is "P-j-i." Great. There's
a third one, which is not quite so obvious, which is my impact on the other's unhappiness.
So in other words, I send a guy from "A" to "B," there are already guys being served out
there so the more I send there the other guys who are being served also get more unhappy
because my impact on their latency has to be considered as well. That is the third term,
which I call a congestion cost. So, these three together is really the impact on the
total cost of the system. So there's a marginal latency which is the user experiences, a marginal
transit fee which I have to pay, and a marginal congestion which is experienced by everybody.
So, this really is how the unhappiness of the world is going to change as I change my
particular dimension. Now these three, mind you, are all measurable. In fact in the system
which I'm going to consider, they don't even know our "C" or the "x" or--we know the "p"
but we don't know anything else. You can measure this, you can measure that, and you know this
value because it's a cost given to you. So the question is, now that I know what my marginal
impact on the system is, how am I going to kind of change my assignment? So that's where
this--the dynamics comes in. So to repeat, I have currently an assignment matrix, who
is getting sent where, I want to change this matrix. I change it on each dimension separately.
So I have to change really the amount that I'm assigning from "j" to "s." So the "j"
is the front end to which the request came, "s" is the cluster where it was served, how
do I change the number of users I'm sending to each one? And the answer in this case turns
out to be a very nice one and this is one called replicator dynamics. This comes from
evolutionary thought. The idea is things which are doing well you should more of them. Things
which are doing badly, do less of them. So that's pretty much all just the same. So supposedly
I was sitting here and I have three options available to me. Option one costs of, let's
say, $10. Option two costs, I don't know, $10 or $20. Option three costs me $5. So clearly
I'm going to send my guys to the $5 option. I keep doing that until the $5 becomes $10.
Now, the 10 and the 10, I have two of them which have exactly the same cost, then perhaps
I'll do it equally until they all hit 20 and then, you know, the system is essentially
stable. So really what I'm looking for is a comparison between the cost per unit for
a particular option from--of sending users from "j" to "s" with the average cost over
all the options available to me. So again in the three option example, I would calculate
my average cost over the three options and I would know the individual cost because I
calculated. If the--a particular option is doing better than average, which means it
has lower cost, I should do more of it. So if this has lower cost than the average, do
more of it. So this minus this will become positive because this has a lower cost than
the average cost so it goes up. If the particular options cost is higher than the average, which
means this is smaller than this because this cost is lower than this, this thing turns
out to be negative I do less of it. So do more of things which work well, do less of
things which work badly. And the--we can prove actually, this is actually stable. It actually
attains this point out here. That if we do this over and over again as a dynamic system
it goes to that lowest cost point which we were looking at, if we use the fee in this
fashion. So again, we have to consider per user latency, per user transit, and the impact
on others, the per user increase in congestion. If you add up these three and push it into
this kind of do more of whatever works for you, it actually can be proved to be optimal.
And how well does it work? Now this is on something called network simulator. So basically
we modified the BitTorrent client there so that you could actually move traffic from
one location to the other. And so the three curves which we are looking at is the following.
So single tracker means that we don't allow you to split, you have those multiple locations.
I say, "I will not allow you to move your traffic from one location to the other." Now
that is of course going to do very well in terms of transit fee because I'm not moving
anything it doesn't cost me anything. But it's horrible in terms of latency because
there are some clusters which have extra load, some clusters which have less load. So the
cluster which have extra load more than their capacity, they do really badly, so the overall
latency goes up significantly. Now the others which I'm comparing here are the following,
so the green one assumes that there is no price--no transit fee, that is. So basically
it says, "Okay. Shall we--move your traffic wherever you feel like. It's your business."
Of course, it does very well in terms of latency, it reduces the latency to bare minimum, but
it does cost you some, a fair bit, in terms of transit fee. The third one, which takes
into account both of these and uses this Replicator Dynamics kind of stuff, what happens to it?
It has a fairly low latency and a fairly low transit fee. We add up the two in the next
diagram. Out here, what we have is the lowest possible latency plus transit fee that can
be achieved with this particular load. So in other words, the assignment matrix here
is the optimal assignment matrix. And the nice thing about this is it converges fairly
fast. So basically, in our peer-to-peer system which we implemented, we basically took these
assignment decisions every eight seconds. So we took a decision, measured for eight
seconds, took a new decision, measured for eight seconds, and so on. So we measured basically
the per user latency which can be done in the cluster and fed back. We measured this
increase in congestion, which is the change in latency for a change in load, fed that
back. And of course the price was known, the transit fee was known so we took these decisions
every eight seconds. And we found in about four--one, two, three, four, five, so roughly
five or so, you know, splits or four or five number of assignment decisions, it converges.
In fact, this has very good convergence properties. So--and as I said earlier, the really nice
thing about this is, it is completely distributed, each front end can take its own decision based
on the cost that it sees so you don't have to solve, you know, centralize optimization
problem. Each one, each front end simply looks at the latencies which it sees in various
things. It sees the cost and it knows the congestion because it--it's fed that back
as well and it takes a decision at each time, based on with what query shows up. Now are
there any questions on this kind of assignment problem here? Yes?
>> You said this was made here from a system that you built, so what did you play it on?
Or how many front ends were there? >> SHAKKOTTAI: Okay. This is actually a network
simulator--simulations on a system, but it's not actually deployed. We're building a test
bed. But that, we don't yet have answers yet. So what this is, it's a network simulator,
simulates the whole protocol stack, and you basically take multiple BitTorrent clients
which are riding on that. So each guy comes, he starts up multiple TCP connections to various
different peers, tries to get it back. And we, on top of that, we built this so-called
tracker system which allows people to be assigned to various different domains and we artificially
assign cost of going to each of these domains. So under that is where our simulation is.
So this is basically we had of the order of something like, I don't know, something like
5,000 clients or so running simultaneously. And the number of times we ran it order to
get these plots, we ran the whole thing about 40 times and averaged out in these or these
standard deviation marks. >> For how many front ends?
>> SHAKKOTTAI: Four. >> Okay.
>> SHAKKOTTAI: Right. >> And that'll affect your convergence time
quite a bit, I would imagine. >> SHAKKOTTAI: That it does. But the thing
is each one has only so many options available. So the question is, how many--if you give
each front end the option of choosing everything on the planet, of course, it'll have lower
convergence time. If each front end has a set which it's allowed to choose from, it'll,
you know, decrease your convergence time significantly. So that's the thing. So, our idea was we don't
want to fetch, for example, going back to my Bombay versus Houston example, I don't
want to interact with a peer-to-peer network in Bombay, I would like to interact with things,
other things, in Texas and possibly California. So, the number of options should not be that
high because in any case my transit fee will get really large if I interact with multiple
faraway options. So, that was the object. >> I have one question. What's the addition
function that you use when you sum up the latency and their transit fee?
>> SHAKKOTTAI: We are just using plain addition. You can pick any convex combination and it
will still work. So we basically thought latency we measured in dollars, transit fee we measured
in dollars and just add them up together. So it's just simple addition here.
>> Okay. >> SHAKKOTTAI: But you can change the constant
in the front end; it will change nothing in the crux of the matter.
>> Can the transit be calculated simply? >> SHAKKOTAI: It could be. If the transit
fee itself is kind of point-to-point link latency, it could also be in that. So it's
there, in which case it's all latencies. >> Okay.
>> SHAKKOTTAI: So, the next question we wanted to answer in this space has to do with admission
control. The idea is, supposing my clusters are of a certain fixed capacity, whatever
it is, and supposing the load is much higher than that, then there is no assignment on
the planet which will already use my latency to an acceptable level. So at that point,
I need know what to drop. How much can I admit, how much do I, you know, drop? So, the kind
of idea expressed here is that given a particular load vector, I know how to minimize the total
unhappiness of the world. But then, this total unhappiness of the world, which is minimized,
might itself be unacceptable because there's simple too much load. In which case I better
start dropping some of them and how do I drop? So, the idea is that each user has a happiness
associated with himself that can be thought of, supposing for example, each user brings
$1 worth of happiness so each user pays me a dollar to be serviced. In which case, the
rate of reducers arrive as "xj," the rate of each money--I mean, each guy brings is
$1, in which case this log won't exist, I will just have dollars times "x." So each
user brings $1 and I try to minimize the happiness minus unhappiness. In a kind of generalized
setting in a network optimization, we often consider concave increasing functions like
log. The idea is that the marginal impact of each user becomes lower and lower. Like,
for example, supposing I have a query which comes as soft, say a request for search results
with images, with ads, and so on. So each marginal part of this, so I guess the search
results are really important, which is a big text to our, you know, a bunch of text which
I need. The images are perhaps slightly less important. Then perhaps the ads, I don't know
how important they are. I presume they are important, but the question is how so at some
point you have to decide which is more important that the other. So this idea here is that
the log tries to indicate that, you know, the marginal impact of the increasing number
of queries, somehow it is important, it does increase your happiness but it doesn't increase
it as much as, you know, in a linear function. So, if I have three queries, my first query
must be handled, the second query perhaps not so good, the third query perhaps even
lesser, you know, that kind of thing. So, this will work of course for linear log and
a whole bunch of concave functions, so. But in general, the high level idea here is you
want to maximize the sum total happiness or minus the sum total unhappiness. So, just
to put this in the right framework, the load assignment problem's job is to just minimize
total unhappiness, whereas the utility maximization problem is to maximize the total happiness
which is the, you know, things which make me feel good and for the things which make
me feel bad. So, the question is, how am I going to do this admission control in a simple
distributed fashion? So essentially what this means is each of these front end servers has
to decide whether or not to admit a query and it needs to do so, understanding what
the impact of that query is going to be on the happiness of the system. So really, if
this is the total happiness of the system, I need to know what the impact of changing
the "x," which is the load which is coming in on the system. In this case, it turns out
it's a very simple function. We just have to differentiate with the respect to "x-j."
And what we can do is use something called a gradient ascent-type controller, which essentially
says if something is, you know, increasing the cost then cut down on it, if something
is decreasing the cost, go up. And we implemented this as well on our NS tool, I think. Oops,
I went backwards. All right, and so, these three colors now are completely different.
Basically, there's just one system with three front ends. So there's a blue front end, a
red front end, and a green front end and they have to take admission decisions and the objective
here is to show that although they take their admission decisions in a distributed fashion,
it does converge and the total system happiness also converges. So the idea is that somehow
in a distributed fashion, I need to minimize cost and maximize happiness. So that is what
is being done here. This interestingly is about 7% from the optimal solution. This is
again being done over a huge number of BitTorrent clients. With actually in this case, it was
three front ends. So, now--so the key insight from this part of the talk is basically that
it is possible to align incentives in terms of end user latency and cost objectives and
it is possible to do it in a distributed fashion, in a fairly simple way, because everything
that you need is measurable. So I don't even need to know this, the capital "C," the capacity,
I don't need to know the load. All I need to know is this marginal impact which can
be measured because I can measure latency, I know cost. I can measure congestion. So
ongoing work in the P2P space, capacity changes in a weird fashion because people come, people
leave, and the question is how do I incentivize people to stay and serve other guys? And also,
we are just working on a very, very simple test bed which we want. We just want three
servers right now, in which--which are connected by real network links, with a certain cost
and want to see if this performs as it should. So that's the current state of that. Any further
questions on this part of the talk? Yes? >> I have a question of what happens when
the prices change over time? >> SHAKKOTTAI: When the prices change over
time. Okay. So if the price is going to change over time, it's...
>> It was 18 probably, like then there's nine times over...
>> SHAKKOTTAI: So then--okay. You--the question. Oh, okay, over that timescale nothing will
change. Because this, the convergence of the innermost loop is of the heart of actually
40 seconds to a minute. So, if your prices change of the order of days, no change whatsoever.
It will converge long before that, if will keep converging. So, any further things?
>> So what happens when you actually drop it? I mean, in a real system, wouldn't the
client then just try again and then... >> SHAKKOTTAI: Right.
>>'re still paying that transit cost and everything? I mean...
>> SHAKKOTTAI: If have you drop it, well, you'll drop it again. So basically, once you've
got this value out here, this is really the supportable traffic. So basically this says
once it's converged, that if I want to maximize the happiness of the world, this is the amount
of load I can support. So any further load, I just have to keep dropping.
>> Aren't you though paying that transit fee every time the request is made because you're
not making that decision to accept it until it arrives at the server?
>> SHAKKOTTAI: Right. So, you--okay. The query itself, as far as P2P goes, the query itself
is just this tiny thing saying, "Give me a piece or give me a part of the file." The
transit fee actually has--is in the service itself. So once I've been accepted, shipping
the traffic across will cost... >> It's actually not the--when you say the
cost is a per query basis, that "P-i-J" it's really only the response and not the actual?
>> SHAKKOTTAI: Yes, the service itself, yes. >> Okay.
>> SHAKKOTTAI: If I decide to accept it, otherwise I just drop it, yes. Although, in the admission
control problem, the admission control is done at the front end where it originated
so you won't even--the query won't go across to the other side if I'm not going to accept
it in the first place. So, what I'm trying to get at here is, I've got the--oops. Oh,
this is a good picture. So, here is where I take my admission decision. So I won't actually
send the query there if I think that, you know, it's going to have a bad impact on the
system. So, once you are dropped here, you're dropped. So you might come back, you might
be dropped again. So, it's only if I think that, you know, the system performance is
not going to be massively impacted by you that I accept you and in the inner loop I
decide where you're going to be shipped. So, the inner loop decides where to move users
around, the outer loop decides whether or not to admit them in the first place, if they
have anything there. >> Yes. And I was just thinking about it from
the search problem where the actual transit causes the additional latency.
>> SHAKKOTTAI: Yes. >> And there you're, you know, that's being
paid on the drop. It takes you that much extra longer to decide that your query is not going
to be serviced. >> SHAKKOTTAI: Actually, no, because it's
dropped at source. So, if anything, it'll add to the latency of the query proceeds as
in the front end, this front end, not that. >> Okay.
>> SHAKKOTTAI: So, in other words, you're dropped where your request originated. So,
there's... >> So, when you talk about congestion, do
you also consider the degrees in congestion? Let's say I go from one to two, so you'll
talk about the backend to increase in the congestion? Do you take also like the degrees
in congestion at one? You sum it up or all the...
>> SHAKKOTTAI: It has--that's where this stuff about comparing with the average happens.
Because I compare, so, the query comes here and I compare the latency out here with the
average or all the options available to me. So, I'm inherently comparing what will happen
if I sent it locally, because it's doing better there is why I send it there. If it were to
do better locally, I wouldn't even bother. So yes, there is a comparison there. It's
implicit, there's no explicit, you know, comparison now between the two. All right, I'm actually
running out of time, but nevertheless let me talk about this stuff about flash crowds.
And I--yes, that's out here. Oops, right. So, the idea here is--has to do with how do
I share files when they are very, very popular? So, the question here is the following. This
is a cute little graphic from Pando again. So, Pando basically says, "If I have a CDN
only," you can think of it as a big server, "As I have more and more people who request
the content from the server there's worse and worse performance because the latency
is bound to increase in exactly the same convex fashion." And if I want to service them at
a given quality, I either have to portion a lot, which is higher cost, or those guys
will, you know, be unhappy with me and leave, which again means a decreased revenue. So,
that the point is that, you know, if you can somehow utilize end user resources. And, you
know, you can think of these end users, one possibility is to think of them as caches
which are closer to the end homes, perhaps at the top level or perhaps even at the cable
add-ins, or you can think of them as end users like you and me. So basically, the point is,
if I can somehow utilize the resources lowered down in the hierarchy then more and more users,
or more and more viewers, of this content might mean better and better performance simply
because there are so many more copies of it available. And you can lower your cost because
you don't have to unicast all the way from your server. So the question is how are we
going to combine these two methods, you know, the server-based and the peer-to-peer based?
So before we ask how do we combine it we need to know what kind of load we are likely to
see. So, this is an analytical model for describing such things. What it describes really, is
a viral distribution, a viral propagation of interest. So, I of (t) is the number of
people who are interested in a particular piece of content of some type. So this is
the total population of guys who say, "I want it." Now how does that change with time? So,
this is called a bass diffusion. This is--comes from Frank Bass's work in 1960's, where he
was looking at the propagation of interest in microwave ovens. So there's this new gadget,
this microwave oven. How do people request it? What is the rate of change of interest
in such things? Now this is being used much earlier in population dynamics in viral model
into one. The idea is the following, so there are so many interested people right now, right,
and "N" is the total number of people. So "N" minus "i" are the people who aren't interested
right now. If I advertise, it reaches a constant fraction of the people who aren't interested.
So "K" is a constant which describes my advertising budget. Maybe I do television, billboards,
whatever, and it reaches some constant fraction of the population. On the other hand, people
tell each other things. So, each guy who's interested might tell the guy, "Hey, this
microwave oven is so cool." So, people who are interested pick somebody at random and
say this is interesting. The probability you'll find somebody who is not interested is the
number not interested by "N". So really, this captures two effects, an advertising effect
and a viral effect. Now, very often in the things like YouTube videos for example, there
isn't very much of an advertising effort, it's by and large viral. So, for a file distribution
of this kind we're going to drop the advertising and look it's essentially only at the viral
propagation, although the results which I'm going to talk about, essentially it holds
for that as well--the advertising as well. So, we can solve the differential equation
explicitly. You see this all over the place. This is the number of interested people and
it follows the so-called logistic function. The idea is initially nobody much is interested,
people start telling each other. At some point, suddenly you have this growth. This is an
exponential growth at that point and then it levels off. This is characteristic of what
is called a flash crowd. Essentially the idea is, at some point, you see sudden increases
in interest of something. The question is how am I going to service this interest? Okay,
this is fairly common. I worked with a guy called Mike Friedman, who runs a content distribution
network on Coral called CoralCDN. It's based on PlanetLab which is this universal network
and he hosted a whole bunch of files on it. In particular, tsunami home videos, you know.
Basically, it's when the Asian tsunami struck there were lots of home videos and this is
one example of such, you know, interest growth. You know, you have this sudden spike of interest.
All right, so what am I going to do? I can use the CDN, which I call a bank of servers,
I can use peer-to-peer, or I can use both. And since I'm giving this talk, I'm going
to basically show that hybrid has some phenomenal performance improvements. So, a little bit
of notation again. I(t) is the number of interested people in this file, it's a cumulative demand.
P(t) is the number of people who possess the file, it's the cumulative service. So, I really
need to just work with these two and I'll say why this total latency is the area between
those in just a second. So, I have a server, it has a capacity "C," how am I going to,
kind of, understand the latency experienced? So what happens is this blue curve is the
demand, the I(t), and the green one is the service curve. So, what happens? Initially,
very few people asked me so the rate of request is much lower than the capacity, I can serve
it pretty much instantaneous, so it goes along there. At some point, the rate of requesting
is more than the capacity available. At that point I see this divergence. So the idea here
is--so, if somebody shows up here. So, the blue curve is demand. So, somebody showed
up at this time. He got served at this time. So there is a little delay. So, as time progresses,
since the arrival rate is so large, somebody who shows up here got served here. Shows up
here, got served here. So this area between the curves is the latency or the delay experience
between the demand and service. So, the slope of this curve is the capacity because the
server is running at capacity at this point. It simply cannot keep up. At some point when
the demand slackens off again, it catches up. So great. So basically we have this need
when my server just gives up and eventually it'll catch up and the area between these
curves is what I'm interested. Now, I can actually analytically characterize these exactly.
And the area scaling looks like theta "N" squared over "C". Don't bother to remember
this for now I'll do a comparison at the end. But it's just intuitive to realize that this
looks kind of like a triangle because the interest looks almost like a staircase and
the service looks almost like a straight line. So the area of the triangle roughly gives
you the intuition, although we can do an exact calculation.
>> "N" was the total number of users? >> SHAKKOTTAI: "N" is the total interested
users, yes, finally. The service model for peer-to-peer is essentially the same kind
of thing. It's basically, if I'm interested but I don't have it I ask somebody at random
and if they have it I can get it off them. So really, the thing about this is the interest
and the possession coupled into each other. Because only if I'm interested will I ask,
only if I have can I serve, so that is the point. These two can be solved explicitly.
It gives you a green curve which looks like that. The interesting thing here is initially
nothing much happens. All the way I asked people, do you have it? Well, they don't have
it. So, nobody has it, nothing happens. But once a small number of people have it, it
just shoots up exponentially because each one can serve a whole number--a fair number
of other guys. So, basically until about login time, nothing happens and by two login time,
everything is over. So again, we can characterize this area, and the most interesting thing
is the CDN does well initially until the server is overloaded. P2P does well once the installed
user base is large, once some number of people have it. So, how do I combine them? I first
use the server, then I use P2P, and we can show that it gives you very good improvement.
So, I'll give you an example here, supposing I used the server capacity "N" by login. "N"
you may recollect is the total number of users who are eventually interested. The server
alone or the CDN can give you latency of the other login. P2P alone can give you a login.
The hybrid though can give you a log login which for fairly large values is essentially
constant. Further, I use this server only for a very short time, only for login time
in the hybrid case. So, the thesis here is that you must serve for a short time using
your server, at which point your P2P has enough capacity to really ramp up. So, these are
simulations in the space. So, the first one is the server based, again, demand and supply.
This is the peer-to-peer one. You can see there's a significant area between these curves.
Essentially, the peer user average delay was of the arc of 10 seconds here. Here, it's
all of the arc of one second roughly because the area is much lower. I do server first,
then peer-to-peer. And well, then the question rather is, what happens if I do both simultaneously?
The answer is not very much, because if I first asked the peer and then asked the server,
initially everybody gets served by the server. And so this pink line is the amount of server
usage. So, initially everybody is getting from the server. But once you hit a certain
value, nobody much--you know, it just levels off and most people are getting it using the
peers. So again, this just reinforces the idea that an initial boost using a server
is what a peer-to-peer system needs. You do that boost and the peer-to-peer can take over
after that. And this is true of CDNs like ACMA as well, where you might want to do unicasts
to the caches at an initial amount of time, after which, you know, the caches themselves
can handle it. So, I think I should stop here--on and take questions. If you're interested in
peer-to-peer streaming, I'm around, as I said, at Stanford until of next week. So please
do send me an email and we can discuss some of this. Also, I've given a hundred of papers
on the--in the space. And so I will just stop here and say thank you, and take questions.
Thank you. Anything further? Yes? >> One thing in--within one thing to a peer,
I always run the risk that that user randomly decided, "So okay, I'm not interested anymore,
now I'm walking away." >> SHAKKOTTAI: Right.
>> And now I get, even if it's just for a short time, again, the timeout so my service
is worse... >> SHAKKOTTAI: Is worse.
>> ...then if ever would be connected directly. >> SHAKKOTTAI: Right. So, they're very rarely
implementing this or the way they are looking at how it should be kind of implemented is
as a feedback system where I keep track of my, you know, my average rate and the system
as a whole, has a target average rate. If your rate falls below a threshold, the server
should boost you. If your rate is above the threshold, you're fine. So, you might get
timed out, but this is on a per chunk basis. So, in other words, there might be a few chunks
which you might, you know, have a lower rate more but the server will quickly boost you
if your service rate falls below a threshold. So, essentially that was what I was talking
about here, in the case where people aren't leaving, I've done simulations on when people
leave. Even there you get significant input. Here what I was saying was if people don't
leave the server is not used after awhile, it just remains constant. So there's the accumulative.
So, essentially if I looked at the derivative this would go to, you know, zero after a point.
But so, the server is essentially used only initially for this. Now if people start leaving,
again, the number of people who possess, which is this red line, would kind of go down, in
which case the server will be used again in the end. We did some simulations on that for
various different departure rates. Even there, you do get significant improvements in the
amount of server usage. So the server can be multiplex or multiple pieces of content,
rather than, you know, having to serve everybody through the same unicast. So, yes, as long
as there are peers in the system you use them. If at any point it falls below that your service
is bad your feedback loop will take care of it by boosting use in the server. So that's
how you'd want to combine the two to ensure a quality of service guarantees. Anything
further? >> Yes. Have you looked at the social aspect
of peer-to-peers like the people involved that you mentioned?
>> SHAKKOTTAI: Right. >> So, they have this like, you connect to
your social network and the whole factors that you incentivize your social network.
So, like Victor and by default, you know... >> SHAKKOTTAI: Correct.
>> ...has the initial upload rate which is the same for all the peers it is connected
to. >> SHAKKOTTAI: Yes.
>> Whereas I have done some work in those and the people we worked was more extension
of that then. >> SHAKKOTTAI: Yes.
>> We've seen like up to 20% to 30% decrease in the total download time when you use this
specter. >> SHAKKOTTAI: Yes.
>> Have you looked at those? >> SHAKKOTTAI: What we are looking at actually
is this kind of like an exchange kind of situation. Like essentially, I have content, I have bandwidth.
I can sell it to somebody essentially. So, we're looking at it as a distributed market
system. Now we look at the prices there as the best thing I can get. Now if you're my
friend, I might want to subsidize you that is an aspect which I've not considered. But
I've been looking at peer-to-peer as an exchange economy where, you know, you essentially can
exchange your resources for somebody else's resource. So, yes, some work but not in the
context of friends giving to other friends for cheap so, no.
>> There has been some work about making sure that you'd stay on the system. Have you heard
of Big Tyrant? >> SHAKKOTTAI: Yes. So, yeah, there's a lot
on how to incentivise users to stay. One managed Victor and automatically does this you know,
it just kind of slows you down at the end so that you have a little more, so...
>> Yes. >> SHAKKOTTAI: ...I think these incentives
have to be considered. So, if you look at it as I said, as an exchange economy, many
of these incentives are automatic because I'm staying, because I make some currency,
and I can use that later on for something else which I want to download. So, the answer
is, there is a lot of space in this region. But, one question which automatically arises
to is who owns the content? So, if I'm making money out of the content, through kind of
uploading it to somebody else, am I somehow, you know, kind of withdrawing or detracting
from the amount which would be made by the owner of the content. So, I don't know. Some
kind of scheme like the way that Google does it, you know, each publisher can--will get
a small amount of money based on, you know, visits to their Web page and clicking on ads
might be a feasible solution. So, it's unclear how to create such exchanges. I mean, as in
the--how the currency mechanism would actually work. But we can show you, get significant
improvements, if there were such a currency, so it's all...