Uploaded by GoogleTechTalks on 23.11.2009

Transcript:

>>

SHAKKOTTAI: Thank you all for being here, and thanks, Andre, for having me over. My

name is Srinivas Shakkottai and I teach at Texas A&M University. The kind of stuff I've

been doing. So I got my PhD at Illinois Urbana-Champaign in 2007 and I was a post-doc at Stanford for

about a year and I've teaching at Texas A&M since early 2008. So I've been there about

a year and more, about two years now. So the stuff I'm going to talk about today has to

do with latency and cost tradeoffs, in particular, for peer-to-peer assisted content distribution.

But there's a much more general problem here on queries and how you kind of distribute

them, you know, based on the latency that they suffer. So before I get into the talk,

I just want to kind of give a little overview of the kind of stuff I've been doing and where

it fits in with networking in general. So this is a picture of the Internet created

by Brad Huffaker at CAIDA. So CAIDA is the place where I interned a couple of times.

So I've been wandering up there a bit and I spent two or three months in a whole bunch

of places. Now, the picture here basically shows the Internet at what is called the AS

level, the autonomous system level. So basically, what it indicates here is that, you know,

the location, the radial location, of the particular autonomous system is based on the

kind of number of customers it has, the amount of traffic it carries kind of stuff, and the

location on the ring is based on geography. So this is North America, that is the Atlantic

Ocean, Europe, Asia, Pacific Ocean. So, the reason why I put up this picture quite often

is that when you look at this, you see that there's a tremendous amount of interconnectivity,

you know, between multiple different locations. And very often, we don't really exploit this

diversity, you know? What I need might not be just in one location but it might be in

multiple locations. So really, the kind of stuff I've been doing, kind of in the space

of network control and network algorithms. I'm going to talk about content distribution

today. But in general, I've been working, for example, on protocols which take advantage

of the multiple routes between a source and the destination. I've been also looking at

how to provision QS, you know, on a end-to-end basis. I've been looking at access methods.

For example, if I'm sitting around with my laptop, I see multiple different access point.

Can I somehow split my traffic and take advantage of the fact that I might have diversity in

terms of number of channels available to me? Multitask takes the advantage of the idea

that there might be multiple people interested in the same thing. So why should I transfer

multiple Unicast to somebody when lots of people want it together? In particular, the

application we were looking at there was kind of like in a football stadium where people

have smartphones and people want to look at pretty much the same things, scores perhaps,

replays, and that kind of thing. So this is all has to do in the space of network optimization

and control. I also have been working on network economics. How do I price this thing? Each

of these kind of boxes is what--is an autonomous system. Google, for example, is an autonomous

system. The guy in the middle over there is Verizon business which is AS number 701. It's

had a long history. It used to be UUNET and Debian Planet and finally Verizon bought it.

Now, there's a good deal of competition between ISPs which operate at different levels. So,

good pricing models which are stable is a very interesting question, this plays right

into net neutrality because the economics has to do with the QIS as well, because you

cannot have QIS unless somebody is going to pay for it and you can't have anybody to pay

to for it unless you give them QIS. So economics and technology really ties up in the space

of the Internet economics because you need good pricing schemes, you need to know what

ISP competition is likely to lead to and that kind of thing. Finally, and this where Andre

comes in, I've been working a fair bit with CAIDA on network topology. Why this thing

looks this way, how does it evolve with time, understanding the performance of protocols

in the real world, and that kind of thing. So I can, if people are interested in this

space, I can of course spend some time, I'm here till the end of next week. So the thing

I started with just a little while ago was if I want something why should I go to the

other end of the world to fetch it? This here is Houston, which is close to college association

which I'm at, and that is Bombay in India. Now, if I want a piece of content, there is

no reason to ship it from Bombay, you have to get--you can get it from multiple different

locations. Indeed, if you have queries to be processed, you don't need to process it

at one particular server farm or one particular cluster you might have multiple different

locations you might be able to process it. So the question really is, how do we take

advantage of the diversity present in where my query or my content can be serviced from?

So this is a little picture which is generated by Pando Networks, which is a startup with

whom we have had some contact. So they basically illustrate the fact that, you know, when the

Internet was created, like this was a little earlier, but roughly speaking, the Internet

really took off in the early '90s. The big applications there were FTP and email, right?

So the idea was because the Internet was created to ship large datasets between, you know,

big labs like cyclotron data, which is these huge mass of datasets. You ship them from

SMAC in Stanford perhaps to Illinois where there have a supercomputer center. So you

know this FTP was enormous and email has a--was a--was in existence, and then people realized

that, "Hey, we can use this Internet for many other things." In particular, the Web started

growing up. And so Web traffic really took off in the mid to late '90s, but more recently,

the understanding that queries can be processed from multiple locations. So if I have a file

on my machine I can give it to somebody else. If somebody else has it, they can give it

to more people. So in other words, using end hosts as servers has really, really taken

off in the past couple of years. And you noticed that in 2006, for example, a good deal, about

70% of the bytes that these guys measured, various different measurement traces seem

to indicate anywhere between about 60 to 70 odd percent or 75% of the byte from the Internet

seemed to appear for peer-to-peer traffic. Now further, this curve stops at 2006. More

recently, there was a report in NANOG this year on the fact that there is a good deal

of video traffic growing up on the Internet. And so the P2P in North America for file sharing

seems to be going down but peer-to-peer for video distribution is really popular in East

Asia. In China, there are these things called PPLive, TvAnts, QQLive, and so on, all of

which, which you are using now, peer-to-peer ideas for video distribution. So the question,

high level again, is how do we use kind of a distributed resource like peer-to-peer servers

or multiple clusters in a server farm? The two servers might request rather than servicing

it in just one location. So, I'm going to talk about--these two, I'll talk about in

fair detail and the third one will be a little less. So the first question I'm going to deal

with has to do with latency and cost in so-called ISP friendly peer-to-peer. So idea here is

that, you know, I can service my query at multiple locations. There is latency at multiple

locations and there's a cost to get there. This might be a transit fee, it might be roundtrip

time, it might be a bandwidth cost. So the question is, how do am I going to do load

balancing in such a system in a very simple fast and distributed manner? So I need to

do this in a parallel fashion. Now, the second question I'm going to talk about has to do

with file sharing. So very often when things get popular, they get popular very quickly.

This is called a flash crowd. I'll give a little more idea of what it means in a little

while. But really, we're going to understand what constitutes a flash crowd and how do

I service such a crowd. Do I use a server alone or a fixed capacity content distribution

network? Do I use peer-to-peer or do I kind of join them somehow together? And finally,

if I do get time, I'll put up a slide or two on using peer-to-peer for video delivery as

well. So, the first problem has to do with latency versus cost in so-called ISP friendly

peer-to-peer. The motivation is the following, supposing I have multiple different Internet

service providers, say Comcast is one of them, Suddenlink is another, Roadrunner is one more.

To the peer-to-peer, when I asked for a list of peers to communicate with, what it gives

me is a list list of peers from all over the place. So in particular, if I belonged to

Comcast and I get some peers from there, some peers from there, some from here. Now, is

that good or is it bad? Now, it is good in the sense that it gives me the best peers.

I get the peers who are most likely to be able to give me something worthwhile. It is

bad in the sense that Internet service providers have a cost of transit traffic between each

other. So if I have to move traffic from here to here, it costs something. If I have to

move from there to there, it costs something and so on. So Internet service providers had

been kind of trying to kind of tamp down on peer-to-peer traffic due simply because it

increases between ISP costs. But if you just look at cost alone you don't realize that

the latency of users can suffer if you just force them to live in one particular ISP.

So to give a clearer description of what the problem is, we would like a system which takes

into account both transit fees and latency. Now, this is very important to this--the distributed

query processing as well, which I will just mention in just a second. So I can, you know,

reduce transit fees by simply localizing traffic. I just keep it local, no transit fee. I can

reduce latency by choosing the best guy who can serve me. The question is; how do I handle

both of these together? How do we achieve this optimal operating point? So this tussle

between fees and latency is what I want to talk about and there's a bit of work in this

space. Very often it's kind of heuristic. It doesn't quite capture the essence of the

problem in my opinion. But nevertheless, there are some solutions which work fairly well.

And what I'm going to talk about is a little bit about the analytical solution to the problem

and how we implemented a very nice distributed solution for this. So here is my problem.

So supposing, for example, let's say I'm thinking of a query processing location. So, basically

these guys are all front end clusters or basically, when my query shows up, it is handed off to

some particular front end. These requests which are coming down here are all the queries.

We'll use a pen. So, for example, this is a particular location. There's a fair number

of requests which are coming to this front end. Now this front end, usually what we would

think is it is connected or is associated with some backend processing engine. It could

be a cluster of some kind, so all these nodes could be some kind of processors, and this

front end's job perhaps is to assign these queries to that backend cluster. Fair enough.

If this just happens if there is just one front end and one backend available to it.

What happens if I have multiple locations? So here what the objective to this is, I have

lots of request coming in here and a fairly limited processing capability here. Here,

I have fewer requests but a larger amount of processing capability because I have more

nodes there. So really, I should--I would think that, "Hey, I should send some of these

requests over there so that they can be handled quicker." However, the quickness or the latency

of handling is not quite everything because there could potentially be either a roundtrip

end latency between these locations or a bandwidth cost or in my ISP case there could be a transit

fee between. So really, it's not purely about latency it is both about latency and fees.

Now, these--the other is just the same, exactly the same ideas are applied to load balancing

of queries. All queries were chopped, they have be handed off somewhere for processing.

Do I hand them off to my local cluster here? Do I hand them off somewhere else? What do

I do with it? So, that is really the requirement. This business of steady state and transient

refers to the fact that there could be clusters for whom the request rate is less than the

available capacity, while there could be other clusters for whom the request rate is bigger

than the available capacity. So clearly, I should hand off some part of this over here,

but what fraction? How do I do it in a dynamic fashion? How do I measure the latencies in

a feedback fashion? How do I do it in a simple fashion so that it converges quickly and efficiently?

So that is my first problem here. So, they have a little bit of notation here. So, all

I'm saying is that this superscript refers to the backend cluster, while the subscript

will refer to the front end which does the assignment. So really what I'm saying is,

any cluster, we assume the cluster named "i" is assumed to have "C" sup "i" queries per

unit time capacity. So for example, this cluster out here would have "C" sup one, C1 amount

of capacity for processing queries. The request arrived at the front end server, which I'm

calling a multi-track, mTracker, or whatever the front end server is. At some process,

it could be a Poisson process, it could be anything really, with certain parameter as

"X" sub "j" queries per unit time. So to go back to my picture, I might have request coming

in here at "X" sub "1" request per unit time. So the question is, what do I do with this

"X" sub "1" request per unit time? Do I send them local? Do I split them up? What do I

do with them? So the important thing is I actually don't need to know either of these

for my algorithm to work properly. I can work with measured feedback latency signals as

I'll illustrate when I get to the algorithms. So anyway, so we have a kind of time scale

separation argument here, which are the large time scale, the load changes. So in my P2P

system where I've taken several measurements, it's--the things that you get popular get

popular together. So of the order of like 20 to half an hour, 20 minutes to half an

hour, nothing much changes in terms of the number of request per second. So I can think

of each of these as kind of half an hour timeslots. Now, the capacity of the systems, each peer-to-peer

cluster, are--if you're thinking of backend clusters, the clusters themselves could be

dynamic and that some things could go online, some things could go offline, and so on, so

I assume that in the order of minutes the cluster capacity or the peer-to-peer cloud

capacity itself can change. So, this was like half an hour intervals. This could be something

like 10 minute intervals. Finally, I have to do my traffic assignment. Where am I going

to route my traffic to? This I do of the order of seconds. So really, I'm first going to

just deal with this. Looking at capacity as constant, load as constant. And then, I'm

going to go up here, and say, "How do I change the capacity?" Finally, I'm going to talk

about how does the load change? So, the notation which we had it's exactly the same thing that

I had earlier. So we have this "X" sub "j", the request rate at the front end server "j"

in number of queries per unit time and an assignment metrics "X" is the way I assigned

this. So what does this mean? So I have this front end server "j" and it has multiple options,

let's say, he has three options. So then, he has to decide how much of the traffic he's

going to send to option one, which will go to the cluster one, how much to send to two,

how much to send to three and so on. So basically, this is the way the front end server "j" partitions

his load. Now, there are multiple front end servers so each one is doing this partitioning.

We call the whole metrics of all the partitions, the assignment metrics, as capital X. So is

there any question on what this means? Basically to give another illustration of what's going

on, I have an "X" sub "j" coming in here, I have an "X11" going in here, an "X12" going

there, an "X13" going here, potentially. Similarly, this guy can send traffic to each one. So

this, the assignment metrics would be a 9x9 matrix, because each guy has potentially three

options. And so, you can have--some of them could be zero, of course; but potentially,

these are 9x9 assignment matrix here. >> I think you meant 3x3.

>> SHAKKOTTAI: Oh, sorry, 3x3, a total of nine. Yes, sorry. So there we have our assignment

matrix. The question is, what am I going to do with this? So again, to just recollect

the notation, the origin is always a subscript, this is the front end where the query is actually

received, and the superscript is the destination to which the query is routed. So that's the

notation all over the place. So "X" sub "j" sup "2" means that the "j" origin will route

this much amount of load to that particular cluster. And obviously, I cannot drop any

queries so the sum of the partition or the sum of the assignments must add up to a load

at that particular front end. So the question is, how am I going to do this assignment?

Now, in order to understand how to do the assignment I need to have some idea of what

latency is likely to look like. So what is latency? So really, I send you a query and

after a little while you respond. That little while or a long time depends on how many others

are also querying the same thing. So this is a very simple latency function. It comes

from queueing theory, and it's really straightforward to understand what's going on here. So basically,

the idea is, I have an engine which can respond to queries at rate, "C" sup "i" queries per

unit time and I feed in a load at a rate "z" sup "i" number of queries per unit time. These

things go and setup and queue up, and this query engine keeps processing them and throws

them out. The question is, somebody comes in here, he sits there, he kind of goes across,

gets served and gets out, so question is how long is he going to sit there? What kind of

shape of the function is this going to look like? And the answer is it always looks convex

increasing because the idea is, in fact that if you look at that expression, as "z" is

much smaller than "C", you have very low load. As "z" comes close to "C", the denominator

goes to zero, therefore the whole thing goes towards infinite. So the idea is, as I increase

the load on a particular server, the latency, the time between the arrival and departure,

is going to increase and our convex increase as well. And this is true not only of such

function it's true of pretty much any system you choose to measure. Like this is true for

deterministic--for a deterministic job size. It is true, for example, in peer-to-peer systems

which I've got some measurements on. It's pretty much always happens that as you get

closer and closer to capacity the number of the latency involved. In fact, if you look

at the 101 at six o' clock you'll see this in action so, you know, as you get closer

and closer to capacity, you have huge latencies. So, the total latency, which is my unhappiness

of my users, so the per user latency was that, and the number of users per unit time or number

of queries per unit time is "z" sup "i," so the total latency suffered by all the users

getting in here is this. So--well, how does that play into the picture of this distributed

cache--distributed query processing? So, that was my picture right? I had, you know, traffic

coming in or request coming in from multiple locations to a particular front end and that

in turn went to the backend to be processed. So how much traffic is coming in? So, we have

basically, traffic from there, there, and there to be summed up so basically, we have

traffic from front end "r" coming to cluster "i". So on the cluster "i," the total load,

is just the sum of traffic which comes from the various different locations. So in that

particular diagram, the cluster number one is handling X from two to one, from three

to one, and one from one to one itself so that's why I have to sum all the traffic which

comes in. And of course, the total latency has some expression. I'm just using the MM-1

expression here for illustration. So I just stick in "z" is equal to the total load coming

in and this is my total latency experienced at some particular cluster. So that is my

latency. Great. What else do I have? I have this transit fee. As I said, a transit fee

can be roundtrip time on the link, which is potentially propagation delay plus queuing

delay. It could be the bandwidth cost, because I have to pay somebody for bandwidth cost.

In my guess, what I'm saying is that it's a transit fee between ISPs. So whatever it

is, there is a certain fee which is charged or which is the cost incurred for transferring

users from one location to another, and that is charged on a per query basis. So if I transfer

one user from "r" to cluster "i," "r" is the front end where the user actually showed up,

and "i" is the cluster where it finally gets processed; per user, it cost me that much.

So again, we have to find out how much is the total transit fee to get to a particular

cluster. And that we do by simply summing up the total load. So in other words, I have

traffic coming in from two to one, from three to one, and from one directly to one. One

to one cost me nothing, two to one cost me per user something, three to one cost me per

user something, and that is really what is being summed up here. So this is the total

transit fee or the total cost in getting to the cluster, right? So much for that. So what

is my system cost? My system cost that I look at is the total latency plus the total transit

fee. So we developed an expression for the latency which is basically load divided by

"C" minus load. So if you get rid of this, this basically says for any particular cluster

"i," what we have is the latency on that cluster plus the transit fee to get to that cluster.

And then, I have to add up all of my clusters because that is my whole system. So, if I

have multiple locations each location experience is a latency, a total latency. Each location,

there's a cost to get there, that is the total cost of a particular cluster, overall clusters,

is the sum. All right, so we need to design a controller that somehow minimizes this.

This really is the unhappiness of the world as far as I'm concerned. So, how do I minimize

this unhappiness, right? It costs me and latency cost my users because they get unhappy. Transit

fees cost me because I have to pay it all the time. So the question is, how do I design

such a controller that is both simple, distributed and fast? So, these are the three things.

Distributed in particular is very common among network optimization folks simply because,

you know, there are millions and millions of flows on the Internet. There is no way

on Earth you can do an optimization with all those flows sitting in one particular, you

know, central computer. So, a distributed processing of such optimization problems has

been a big question. And there's lots of ways of handling this. Now, so the objective really

is, you know, this is my unhappiness that is the lowest unhappiness that somehow I want

to attain and I want a distributed way of getting there. So this makes me as little

unhappy as feasible. Now, there's a whole bunch of controllers which can do this--oops--so

the question is how do I solve it? And there are lots and lots of optimization techniques,

and I'm going to pick one which is particularly useful this problem. So usually, there are

some which are fairly complex in terms of the computation. You can do centralized computation,

you know, in one big machine. And these methods, although they give you very accurate solutions,

they require a lot of processing in a particular machine. So the other methods which I've put

down here, these are some of them, are basically distributed methods. So there's Newton's method,

there's log-linear learning, best-response which means do the best thing given the state,

gradient descent and so on. So what I'm going to describe is actually a coupling of these

two, which is particularly suited for this state. But really, what I wanted to say here

is there is whole bunch of tools which have been developed in the kind of OR optimization,

the operations research literature, which are useful for such problems. And so very

often, when we solve network optimization problems, you can go and you know, look in

this bag full of controllers and pull out one which is most suited and then show that

it's suitable for your particular application. So, I'll tell you what my controller is going

to be in a second. But before doing that, I need to know what the kind of per unit cost

is going to be. So what does this mean? I told you that the total cost of the system

is this, the total latency plus total transit fee. Now the question is, if I take a deficient

on how much load to send to a particular cluster, how does this cost change? So for example,

I have two clusters and I have an arrival rate of, you know, request coming in. Now

if I move some of the--some to one cluster and some to the other, maybe it's good, maybe

it's bad. The question is how good or how bad is it? So the question is, if I change

this, if I change my assignment on a particulate dimension so I move a little from one to the

other, how does the cost change? Is it going to be simple function? Is it going to be messy?

How do I this? So the answer is, usually, in fact, in this particular example, it's

actually very simple. So, all I need to do is basically, I need to understand the impact

on the cost of changing one dimension. So it recollect "x" over sub "j" of "i" was the

amount of traffic which "j," the front end "j," was sending to cluster "i." So the question

is what impact does that one dimension change have on the overall cost of the system? And

the answer is obtained by partially differentiating one with the other and the answer has a really

nice form here. What it says is, I call this the marginal cost, marginal cost of a cluster,

the front end "j" on the cluster "i," and this marginal cost has three terms, one, is

the marginal latency. I send people there, they become unhappy. So how unhappy do they

become? I have a per user unhappiness. What--the next thing is the transit fee. I send one

guy from one place to the other, it cost you "P-j-i" because it costs you per user that

much. So in other words these two are very intuitive. I send one person from "A" to "B."

If he gets unhappy there's going to be amount of latency experiences at "B." If I send one

guy from "A" to "B," the amounted cost in terms of transit fee is "P-j-i." Great. There's

a third one, which is not quite so obvious, which is my impact on the other's unhappiness.

So in other words, I send a guy from "A" to "B," there are already guys being served out

there so the more I send there the other guys who are being served also get more unhappy

because my impact on their latency has to be considered as well. That is the third term,

which I call a congestion cost. So, these three together is really the impact on the

total cost of the system. So there's a marginal latency which is the user experiences, a marginal

transit fee which I have to pay, and a marginal congestion which is experienced by everybody.

So, this really is how the unhappiness of the world is going to change as I change my

particular dimension. Now these three, mind you, are all measurable. In fact in the system

which I'm going to consider, they don't even know our "C" or the "x" or--we know the "p"

but we don't know anything else. You can measure this, you can measure that, and you know this

value because it's a cost given to you. So the question is, now that I know what my marginal

impact on the system is, how am I going to kind of change my assignment? So that's where

this--the dynamics comes in. So to repeat, I have currently an assignment matrix, who

is getting sent where, I want to change this matrix. I change it on each dimension separately.

So I have to change really the amount that I'm assigning from "j" to "s." So the "j"

is the front end to which the request came, "s" is the cluster where it was served, how

do I change the number of users I'm sending to each one? And the answer in this case turns

out to be a very nice one and this is one called replicator dynamics. This comes from

evolutionary thought. The idea is things which are doing well you should more of them. Things

which are doing badly, do less of them. So that's pretty much all just the same. So supposedly

I was sitting here and I have three options available to me. Option one costs of, let's

say, $10. Option two costs, I don't know, $10 or $20. Option three costs me $5. So clearly

I'm going to send my guys to the $5 option. I keep doing that until the $5 becomes $10.

Now, the 10 and the 10, I have two of them which have exactly the same cost, then perhaps

I'll do it equally until they all hit 20 and then, you know, the system is essentially

stable. So really what I'm looking for is a comparison between the cost per unit for

a particular option from--of sending users from "j" to "s" with the average cost over

all the options available to me. So again in the three option example, I would calculate

my average cost over the three options and I would know the individual cost because I

calculated. If the--a particular option is doing better than average, which means it

has lower cost, I should do more of it. So if this has lower cost than the average, do

more of it. So this minus this will become positive because this has a lower cost than

the average cost so it goes up. If the particular options cost is higher than the average, which

means this is smaller than this because this cost is lower than this, this thing turns

out to be negative I do less of it. So do more of things which work well, do less of

things which work badly. And the--we can prove actually, this is actually stable. It actually

attains this point out here. That if we do this over and over again as a dynamic system

it goes to that lowest cost point which we were looking at, if we use the fee in this

fashion. So again, we have to consider per user latency, per user transit, and the impact

on others, the per user increase in congestion. If you add up these three and push it into

this kind of do more of whatever works for you, it actually can be proved to be optimal.

And how well does it work? Now this is on something called network simulator. So basically

we modified the BitTorrent client there so that you could actually move traffic from

one location to the other. And so the three curves which we are looking at is the following.

So single tracker means that we don't allow you to split, you have those multiple locations.

I say, "I will not allow you to move your traffic from one location to the other." Now

that is of course going to do very well in terms of transit fee because I'm not moving

anything it doesn't cost me anything. But it's horrible in terms of latency because

there are some clusters which have extra load, some clusters which have less load. So the

cluster which have extra load more than their capacity, they do really badly, so the overall

latency goes up significantly. Now the others which I'm comparing here are the following,

so the green one assumes that there is no price--no transit fee, that is. So basically

it says, "Okay. Shall we--move your traffic wherever you feel like. It's your business."

Of course, it does very well in terms of latency, it reduces the latency to bare minimum, but

it does cost you some, a fair bit, in terms of transit fee. The third one, which takes

into account both of these and uses this Replicator Dynamics kind of stuff, what happens to it?

It has a fairly low latency and a fairly low transit fee. We add up the two in the next

diagram. Out here, what we have is the lowest possible latency plus transit fee that can

be achieved with this particular load. So in other words, the assignment matrix here

is the optimal assignment matrix. And the nice thing about this is it converges fairly

fast. So basically, in our peer-to-peer system which we implemented, we basically took these

assignment decisions every eight seconds. So we took a decision, measured for eight

seconds, took a new decision, measured for eight seconds, and so on. So we measured basically

the per user latency which can be done in the cluster and fed back. We measured this

increase in congestion, which is the change in latency for a change in load, fed that

back. And of course the price was known, the transit fee was known so we took these decisions

every eight seconds. And we found in about four--one, two, three, four, five, so roughly

five or so, you know, splits or four or five number of assignment decisions, it converges.

In fact, this has very good convergence properties. So--and as I said earlier, the really nice

thing about this is, it is completely distributed, each front end can take its own decision based

on the cost that it sees so you don't have to solve, you know, centralize optimization

problem. Each one, each front end simply looks at the latencies which it sees in various

things. It sees the cost and it knows the congestion because it--it's fed that back

as well and it takes a decision at each time, based on with what query shows up. Now are

there any questions on this kind of assignment problem here? Yes?

>> You said this was made here from a system that you built, so what did you play it on?

Or how many front ends were there? >> SHAKKOTTAI: Okay. This is actually a network

simulator--simulations on a system, but it's not actually deployed. We're building a test

bed. But that, we don't yet have answers yet. So what this is, it's a network simulator,

simulates the whole protocol stack, and you basically take multiple BitTorrent clients

which are riding on that. So each guy comes, he starts up multiple TCP connections to various

different peers, tries to get it back. And we, on top of that, we built this so-called

tracker system which allows people to be assigned to various different domains and we artificially

assign cost of going to each of these domains. So under that is where our simulation is.

So this is basically we had of the order of something like, I don't know, something like

5,000 clients or so running simultaneously. And the number of times we ran it order to

get these plots, we ran the whole thing about 40 times and averaged out in these or these

standard deviation marks. >> For how many front ends?

>> SHAKKOTTAI: Four. >> Okay.

>> SHAKKOTTAI: Right. >> And that'll affect your convergence time

quite a bit, I would imagine. >> SHAKKOTTAI: That it does. But the thing

is each one has only so many options available. So the question is, how many--if you give

each front end the option of choosing everything on the planet, of course, it'll have lower

convergence time. If each front end has a set which it's allowed to choose from, it'll,

you know, decrease your convergence time significantly. So that's the thing. So, our idea was we don't

want to fetch, for example, going back to my Bombay versus Houston example, I don't

want to interact with a peer-to-peer network in Bombay, I would like to interact with things,

other things, in Texas and possibly California. So, the number of options should not be that

high because in any case my transit fee will get really large if I interact with multiple

faraway options. So, that was the object. >> I have one question. What's the addition

function that you use when you sum up the latency and their transit fee?

>> SHAKKOTTAI: We are just using plain addition. You can pick any convex combination and it

will still work. So we basically thought latency we measured in dollars, transit fee we measured

in dollars and just add them up together. So it's just simple addition here.

>> Okay. >> SHAKKOTTAI: But you can change the constant

in the front end; it will change nothing in the crux of the matter.

>> Can the transit be calculated simply? >> SHAKKOTAI: It could be. If the transit

fee itself is kind of point-to-point link latency, it could also be in that. So it's

there, in which case it's all latencies. >> Okay.

>> SHAKKOTTAI: So, the next question we wanted to answer in this space has to do with admission

control. The idea is, supposing my clusters are of a certain fixed capacity, whatever

it is, and supposing the load is much higher than that, then there is no assignment on

the planet which will already use my latency to an acceptable level. So at that point,

I need know what to drop. How much can I admit, how much do I, you know, drop? So, the kind

of idea expressed here is that given a particular load vector, I know how to minimize the total

unhappiness of the world. But then, this total unhappiness of the world, which is minimized,

might itself be unacceptable because there's simple too much load. In which case I better

start dropping some of them and how do I drop? So, the idea is that each user has a happiness

associated with himself that can be thought of, supposing for example, each user brings

$1 worth of happiness so each user pays me a dollar to be serviced. In which case, the

rate of reducers arrive as "xj," the rate of each money--I mean, each guy brings is

$1, in which case this log won't exist, I will just have dollars times "x." So each

user brings $1 and I try to minimize the happiness minus unhappiness. In a kind of generalized

setting in a network optimization, we often consider concave increasing functions like

log. The idea is that the marginal impact of each user becomes lower and lower. Like,

for example, supposing I have a query which comes as soft, say a request for search results

with images, with ads, and so on. So each marginal part of this, so I guess the search

results are really important, which is a big text to our, you know, a bunch of text which

I need. The images are perhaps slightly less important. Then perhaps the ads, I don't know

how important they are. I presume they are important, but the question is how so at some

point you have to decide which is more important that the other. So this idea here is that

the log tries to indicate that, you know, the marginal impact of the increasing number

of queries, somehow it is important, it does increase your happiness but it doesn't increase

it as much as, you know, in a linear function. So, if I have three queries, my first query

must be handled, the second query perhaps not so good, the third query perhaps even

lesser, you know, that kind of thing. So, this will work of course for linear log and

a whole bunch of concave functions, so. But in general, the high level idea here is you

want to maximize the sum total happiness or minus the sum total unhappiness. So, just

to put this in the right framework, the load assignment problem's job is to just minimize

total unhappiness, whereas the utility maximization problem is to maximize the total happiness

which is the, you know, things which make me feel good and for the things which make

me feel bad. So, the question is, how am I going to do this admission control in a simple

distributed fashion? So essentially what this means is each of these front end servers has

to decide whether or not to admit a query and it needs to do so, understanding what

the impact of that query is going to be on the happiness of the system. So really, if

this is the total happiness of the system, I need to know what the impact of changing

the "x," which is the load which is coming in on the system. In this case, it turns out

it's a very simple function. We just have to differentiate with the respect to "x-j."

And what we can do is use something called a gradient ascent-type controller, which essentially

says if something is, you know, increasing the cost then cut down on it, if something

is decreasing the cost, go up. And we implemented this as well on our NS tool, I think. Oops,

I went backwards. All right, and so, these three colors now are completely different.

Basically, there's just one system with three front ends. So there's a blue front end, a

red front end, and a green front end and they have to take admission decisions and the objective

here is to show that although they take their admission decisions in a distributed fashion,

it does converge and the total system happiness also converges. So the idea is that somehow

in a distributed fashion, I need to minimize cost and maximize happiness. So that is what

is being done here. This interestingly is about 7% from the optimal solution. This is

again being done over a huge number of BitTorrent clients. With actually in this case, it was

three front ends. So, now--so the key insight from this part of the talk is basically that

it is possible to align incentives in terms of end user latency and cost objectives and

it is possible to do it in a distributed fashion, in a fairly simple way, because everything

that you need is measurable. So I don't even need to know this, the capital "C," the capacity,

I don't need to know the load. All I need to know is this marginal impact which can

be measured because I can measure latency, I know cost. I can measure congestion. So

ongoing work in the P2P space, capacity changes in a weird fashion because people come, people

leave, and the question is how do I incentivize people to stay and serve other guys? And also,

we are just working on a very, very simple test bed which we want. We just want three

servers right now, in which--which are connected by real network links, with a certain cost

and want to see if this performs as it should. So that's the current state of that. Any further

questions on this part of the talk? Yes? >> I have a question of what happens when

the prices change over time? >> SHAKKOTTAI: When the prices change over

time. Okay. So if the price is going to change over time, it's...

>> It was 18 probably, like then there's nine times over...

>> SHAKKOTTAI: So then--okay. You--the question. Oh, okay, over that timescale nothing will

change. Because this, the convergence of the innermost loop is of the heart of actually

40 seconds to a minute. So, if your prices change of the order of days, no change whatsoever.

It will converge long before that, if will keep converging. So, any further things?

>> So what happens when you actually drop it? I mean, in a real system, wouldn't the

client then just try again and then... >> SHAKKOTTAI: Right.

>> ...you're still paying that transit cost and everything? I mean...

>> SHAKKOTTAI: If have you drop it, well, you'll drop it again. So basically, once you've

got this value out here, this is really the supportable traffic. So basically this says

once it's converged, that if I want to maximize the happiness of the world, this is the amount

of load I can support. So any further load, I just have to keep dropping.

>> Aren't you though paying that transit fee every time the request is made because you're

not making that decision to accept it until it arrives at the server?

>> SHAKKOTTAI: Right. So, you--okay. The query itself, as far as P2P goes, the query itself

is just this tiny thing saying, "Give me a piece or give me a part of the file." The

transit fee actually has--is in the service itself. So once I've been accepted, shipping

the traffic across will cost... >> It's actually not the--when you say the

cost is a per query basis, that "P-i-J" it's really only the response and not the actual?

>> SHAKKOTTAI: Yes, the service itself, yes. >> Okay.

>> SHAKKOTTAI: If I decide to accept it, otherwise I just drop it, yes. Although, in the admission

control problem, the admission control is done at the front end where it originated

so you won't even--the query won't go across to the other side if I'm not going to accept

it in the first place. So, what I'm trying to get at here is, I've got the--oops. Oh,

this is a good picture. So, here is where I take my admission decision. So I won't actually

send the query there if I think that, you know, it's going to have a bad impact on the

system. So, once you are dropped here, you're dropped. So you might come back, you might

be dropped again. So, it's only if I think that, you know, the system performance is

not going to be massively impacted by you that I accept you and in the inner loop I

decide where you're going to be shipped. So, the inner loop decides where to move users

around, the outer loop decides whether or not to admit them in the first place, if they

have anything there. >> Yes. And I was just thinking about it from

the search problem where the actual transit causes the additional latency.

>> SHAKKOTTAI: Yes. >> And there you're, you know, that's being

paid on the drop. It takes you that much extra longer to decide that your query is not going

to be serviced. >> SHAKKOTTAI: Actually, no, because it's

dropped at source. So, if anything, it'll add to the latency of the query proceeds as

in the front end, this front end, not that. >> Okay.

>> SHAKKOTTAI: So, in other words, you're dropped where your request originated. So,

there's... >> So, when you talk about congestion, do

you also consider the degrees in congestion? Let's say I go from one to two, so you'll

talk about the backend to increase in the congestion? Do you take also like the degrees

in congestion at one? You sum it up or all the...

>> SHAKKOTTAI: It has--that's where this stuff about comparing with the average happens.

Because I compare, so, the query comes here and I compare the latency out here with the

average or all the options available to me. So, I'm inherently comparing what will happen

if I sent it locally, because it's doing better there is why I send it there. If it were to

do better locally, I wouldn't even bother. So yes, there is a comparison there. It's

implicit, there's no explicit, you know, comparison now between the two. All right, I'm actually

running out of time, but nevertheless let me talk about this stuff about flash crowds.

And I--yes, that's out here. Oops, right. So, the idea here is--has to do with how do

I share files when they are very, very popular? So, the question here is the following. This

is a cute little graphic from Pando again. So, Pando basically says, "If I have a CDN

only," you can think of it as a big server, "As I have more and more people who request

the content from the server there's worse and worse performance because the latency

is bound to increase in exactly the same convex fashion." And if I want to service them at

a given quality, I either have to portion a lot, which is higher cost, or those guys

will, you know, be unhappy with me and leave, which again means a decreased revenue. So,

that the point is that, you know, if you can somehow utilize end user resources. And, you

know, you can think of these end users, one possibility is to think of them as caches

which are closer to the end homes, perhaps at the top level or perhaps even at the cable

add-ins, or you can think of them as end users like you and me. So basically, the point is,

if I can somehow utilize the resources lowered down in the hierarchy then more and more users,

or more and more viewers, of this content might mean better and better performance simply

because there are so many more copies of it available. And you can lower your cost because

you don't have to unicast all the way from your server. So the question is how are we

going to combine these two methods, you know, the server-based and the peer-to-peer based?

So before we ask how do we combine it we need to know what kind of load we are likely to

see. So, this is an analytical model for describing such things. What it describes really, is

a viral distribution, a viral propagation of interest. So, I of (t) is the number of

people who are interested in a particular piece of content of some type. So this is

the total population of guys who say, "I want it." Now how does that change with time? So,

this is called a bass diffusion. This is--comes from Frank Bass's work in 1960's, where he

was looking at the propagation of interest in microwave ovens. So there's this new gadget,

this microwave oven. How do people request it? What is the rate of change of interest

in such things? Now this is being used much earlier in population dynamics in viral model

into one. The idea is the following, so there are so many interested people right now, right,

and "N" is the total number of people. So "N" minus "i" are the people who aren't interested

right now. If I advertise, it reaches a constant fraction of the people who aren't interested.

So "K" is a constant which describes my advertising budget. Maybe I do television, billboards,

whatever, and it reaches some constant fraction of the population. On the other hand, people

tell each other things. So, each guy who's interested might tell the guy, "Hey, this

microwave oven is so cool." So, people who are interested pick somebody at random and

say this is interesting. The probability you'll find somebody who is not interested is the

number not interested by "N". So really, this captures two effects, an advertising effect

and a viral effect. Now, very often in the things like YouTube videos for example, there

isn't very much of an advertising effort, it's by and large viral. So, for a file distribution

of this kind we're going to drop the advertising and look it's essentially only at the viral

propagation, although the results which I'm going to talk about, essentially it holds

for that as well--the advertising as well. So, we can solve the differential equation

explicitly. You see this all over the place. This is the number of interested people and

it follows the so-called logistic function. The idea is initially nobody much is interested,

people start telling each other. At some point, suddenly you have this growth. This is an

exponential growth at that point and then it levels off. This is characteristic of what

is called a flash crowd. Essentially the idea is, at some point, you see sudden increases

in interest of something. The question is how am I going to service this interest? Okay,

this is fairly common. I worked with a guy called Mike Friedman, who runs a content distribution

network on Coral called CoralCDN. It's based on PlanetLab which is this universal network

and he hosted a whole bunch of files on it. In particular, tsunami home videos, you know.

Basically, it's when the Asian tsunami struck there were lots of home videos and this is

one example of such, you know, interest growth. You know, you have this sudden spike of interest.

All right, so what am I going to do? I can use the CDN, which I call a bank of servers,

I can use peer-to-peer, or I can use both. And since I'm giving this talk, I'm going

to basically show that hybrid has some phenomenal performance improvements. So, a little bit

of notation again. I(t) is the number of interested people in this file, it's a cumulative demand.

P(t) is the number of people who possess the file, it's the cumulative service. So, I really

need to just work with these two and I'll say why this total latency is the area between

those in just a second. So, I have a server, it has a capacity "C," how am I going to,

kind of, understand the latency experienced? So what happens is this blue curve is the

demand, the I(t), and the green one is the service curve. So, what happens? Initially,

very few people asked me so the rate of request is much lower than the capacity, I can serve

it pretty much instantaneous, so it goes along there. At some point, the rate of requesting

is more than the capacity available. At that point I see this divergence. So the idea here

is--so, if somebody shows up here. So, the blue curve is demand. So, somebody showed

up at this time. He got served at this time. So there is a little delay. So, as time progresses,

since the arrival rate is so large, somebody who shows up here got served here. Shows up

here, got served here. So this area between the curves is the latency or the delay experience

between the demand and service. So, the slope of this curve is the capacity because the

server is running at capacity at this point. It simply cannot keep up. At some point when

the demand slackens off again, it catches up. So great. So basically we have this need

when my server just gives up and eventually it'll catch up and the area between these

curves is what I'm interested. Now, I can actually analytically characterize these exactly.

And the area scaling looks like theta "N" squared over "C". Don't bother to remember

this for now I'll do a comparison at the end. But it's just intuitive to realize that this

looks kind of like a triangle because the interest looks almost like a staircase and

the service looks almost like a straight line. So the area of the triangle roughly gives

you the intuition, although we can do an exact calculation.

>> "N" was the total number of users? >> SHAKKOTTAI: "N" is the total interested

users, yes, finally. The service model for peer-to-peer is essentially the same kind

of thing. It's basically, if I'm interested but I don't have it I ask somebody at random

and if they have it I can get it off them. So really, the thing about this is the interest

and the possession coupled into each other. Because only if I'm interested will I ask,

only if I have can I serve, so that is the point. These two can be solved explicitly.

It gives you a green curve which looks like that. The interesting thing here is initially

nothing much happens. All the way I asked people, do you have it? Well, they don't have

it. So, nobody has it, nothing happens. But once a small number of people have it, it

just shoots up exponentially because each one can serve a whole number--a fair number

of other guys. So, basically until about login time, nothing happens and by two login time,

everything is over. So again, we can characterize this area, and the most interesting thing

is the CDN does well initially until the server is overloaded. P2P does well once the installed

user base is large, once some number of people have it. So, how do I combine them? I first

use the server, then I use P2P, and we can show that it gives you very good improvement.

So, I'll give you an example here, supposing I used the server capacity "N" by login. "N"

you may recollect is the total number of users who are eventually interested. The server

alone or the CDN can give you latency of the other login. P2P alone can give you a login.

The hybrid though can give you a log login which for fairly large values is essentially

constant. Further, I use this server only for a very short time, only for login time

in the hybrid case. So, the thesis here is that you must serve for a short time using

your server, at which point your P2P has enough capacity to really ramp up. So, these are

simulations in the space. So, the first one is the server based, again, demand and supply.

This is the peer-to-peer one. You can see there's a significant area between these curves.

Essentially, the peer user average delay was of the arc of 10 seconds here. Here, it's

all of the arc of one second roughly because the area is much lower. I do server first,

then peer-to-peer. And well, then the question rather is, what happens if I do both simultaneously?

The answer is not very much, because if I first asked the peer and then asked the server,

initially everybody gets served by the server. And so this pink line is the amount of server

usage. So, initially everybody is getting from the server. But once you hit a certain

value, nobody much--you know, it just levels off and most people are getting it using the

peers. So again, this just reinforces the idea that an initial boost using a server

is what a peer-to-peer system needs. You do that boost and the peer-to-peer can take over

after that. And this is true of CDNs like ACMA as well, where you might want to do unicasts

to the caches at an initial amount of time, after which, you know, the caches themselves

can handle it. So, I think I should stop here--on and take questions. If you're interested in

peer-to-peer streaming, I'm around, as I said, at Stanford until of next week. So please

do send me an email and we can discuss some of this. Also, I've given a hundred of papers

on the--in the space. And so I will just stop here and say thank you, and take questions.

Thank you. Anything further? Yes? >> One thing in--within one thing to a peer,

I always run the risk that that user randomly decided, "So okay, I'm not interested anymore,

now I'm walking away." >> SHAKKOTTAI: Right.

>> And now I get, even if it's just for a short time, again, the timeout so my service

is worse... >> SHAKKOTTAI: Is worse.

>> ...then if ever would be connected directly. >> SHAKKOTTAI: Right. So, they're very rarely

implementing this or the way they are looking at how it should be kind of implemented is

as a feedback system where I keep track of my, you know, my average rate and the system

as a whole, has a target average rate. If your rate falls below a threshold, the server

should boost you. If your rate is above the threshold, you're fine. So, you might get

timed out, but this is on a per chunk basis. So, in other words, there might be a few chunks

which you might, you know, have a lower rate more but the server will quickly boost you

if your service rate falls below a threshold. So, essentially that was what I was talking

about here, in the case where people aren't leaving, I've done simulations on when people

leave. Even there you get significant input. Here what I was saying was if people don't

leave the server is not used after awhile, it just remains constant. So there's the accumulative.

So, essentially if I looked at the derivative this would go to, you know, zero after a point.

But so, the server is essentially used only initially for this. Now if people start leaving,

again, the number of people who possess, which is this red line, would kind of go down, in

which case the server will be used again in the end. We did some simulations on that for

various different departure rates. Even there, you do get significant improvements in the

amount of server usage. So the server can be multiplex or multiple pieces of content,

rather than, you know, having to serve everybody through the same unicast. So, yes, as long

as there are peers in the system you use them. If at any point it falls below that your service

is bad your feedback loop will take care of it by boosting use in the server. So that's

how you'd want to combine the two to ensure a quality of service guarantees. Anything

further? >> Yes. Have you looked at the social aspect

of peer-to-peers like the people involved that you mentioned?

>> SHAKKOTTAI: Right. >> So, they have this like, you connect to

your social network and the whole factors that you incentivize your social network.

So, like Victor and by default, you know... >> SHAKKOTTAI: Correct.

>> ...has the initial upload rate which is the same for all the peers it is connected

to. >> SHAKKOTTAI: Yes.

>> Whereas I have done some work in those and the people we worked was more extension

of that then. >> SHAKKOTTAI: Yes.

>> We've seen like up to 20% to 30% decrease in the total download time when you use this

specter. >> SHAKKOTTAI: Yes.

>> Have you looked at those? >> SHAKKOTTAI: What we are looking at actually

is this kind of like an exchange kind of situation. Like essentially, I have content, I have bandwidth.

I can sell it to somebody essentially. So, we're looking at it as a distributed market

system. Now we look at the prices there as the best thing I can get. Now if you're my

friend, I might want to subsidize you that is an aspect which I've not considered. But

I've been looking at peer-to-peer as an exchange economy where, you know, you essentially can

exchange your resources for somebody else's resource. So, yes, some work but not in the

context of friends giving to other friends for cheap so, no.

>> There has been some work about making sure that you'd stay on the system. Have you heard

of Big Tyrant? >> SHAKKOTTAI: Yes. So, yeah, there's a lot

on how to incentivise users to stay. One managed Victor and automatically does this you know,

it just kind of slows you down at the end so that you have a little more, so...

>> Yes. >> SHAKKOTTAI: ...I think these incentives

have to be considered. So, if you look at it as I said, as an exchange economy, many

of these incentives are automatic because I'm staying, because I make some currency,

and I can use that later on for something else which I want to download. So, the answer

is, there is a lot of space in this region. But, one question which automatically arises

to is who owns the content? So, if I'm making money out of the content, through kind of

uploading it to somebody else, am I somehow, you know, kind of withdrawing or detracting

from the amount which would be made by the owner of the content. So, I don't know. Some

kind of scheme like the way that Google does it, you know, each publisher can--will get

a small amount of money based on, you know, visits to their Web page and clicking on ads

might be a feasible solution. So, it's unclear how to create such exchanges. I mean, as in

the--how the currency mechanism would actually work. But we can show you, get significant

improvements, if there were such a currency, so it's all...

SHAKKOTTAI: Thank you all for being here, and thanks, Andre, for having me over. My

name is Srinivas Shakkottai and I teach at Texas A&M University. The kind of stuff I've

been doing. So I got my PhD at Illinois Urbana-Champaign in 2007 and I was a post-doc at Stanford for

about a year and I've teaching at Texas A&M since early 2008. So I've been there about

a year and more, about two years now. So the stuff I'm going to talk about today has to

do with latency and cost tradeoffs, in particular, for peer-to-peer assisted content distribution.

But there's a much more general problem here on queries and how you kind of distribute

them, you know, based on the latency that they suffer. So before I get into the talk,

I just want to kind of give a little overview of the kind of stuff I've been doing and where

it fits in with networking in general. So this is a picture of the Internet created

by Brad Huffaker at CAIDA. So CAIDA is the place where I interned a couple of times.

So I've been wandering up there a bit and I spent two or three months in a whole bunch

of places. Now, the picture here basically shows the Internet at what is called the AS

level, the autonomous system level. So basically, what it indicates here is that, you know,

the location, the radial location, of the particular autonomous system is based on the

kind of number of customers it has, the amount of traffic it carries kind of stuff, and the

location on the ring is based on geography. So this is North America, that is the Atlantic

Ocean, Europe, Asia, Pacific Ocean. So, the reason why I put up this picture quite often

is that when you look at this, you see that there's a tremendous amount of interconnectivity,

you know, between multiple different locations. And very often, we don't really exploit this

diversity, you know? What I need might not be just in one location but it might be in

multiple locations. So really, the kind of stuff I've been doing, kind of in the space

of network control and network algorithms. I'm going to talk about content distribution

today. But in general, I've been working, for example, on protocols which take advantage

of the multiple routes between a source and the destination. I've been also looking at

how to provision QS, you know, on a end-to-end basis. I've been looking at access methods.

For example, if I'm sitting around with my laptop, I see multiple different access point.

Can I somehow split my traffic and take advantage of the fact that I might have diversity in

terms of number of channels available to me? Multitask takes the advantage of the idea

that there might be multiple people interested in the same thing. So why should I transfer

multiple Unicast to somebody when lots of people want it together? In particular, the

application we were looking at there was kind of like in a football stadium where people

have smartphones and people want to look at pretty much the same things, scores perhaps,

replays, and that kind of thing. So this is all has to do in the space of network optimization

and control. I also have been working on network economics. How do I price this thing? Each

of these kind of boxes is what--is an autonomous system. Google, for example, is an autonomous

system. The guy in the middle over there is Verizon business which is AS number 701. It's

had a long history. It used to be UUNET and Debian Planet and finally Verizon bought it.

Now, there's a good deal of competition between ISPs which operate at different levels. So,

good pricing models which are stable is a very interesting question, this plays right

into net neutrality because the economics has to do with the QIS as well, because you

cannot have QIS unless somebody is going to pay for it and you can't have anybody to pay

to for it unless you give them QIS. So economics and technology really ties up in the space

of the Internet economics because you need good pricing schemes, you need to know what

ISP competition is likely to lead to and that kind of thing. Finally, and this where Andre

comes in, I've been working a fair bit with CAIDA on network topology. Why this thing

looks this way, how does it evolve with time, understanding the performance of protocols

in the real world, and that kind of thing. So I can, if people are interested in this

space, I can of course spend some time, I'm here till the end of next week. So the thing

I started with just a little while ago was if I want something why should I go to the

other end of the world to fetch it? This here is Houston, which is close to college association

which I'm at, and that is Bombay in India. Now, if I want a piece of content, there is

no reason to ship it from Bombay, you have to get--you can get it from multiple different

locations. Indeed, if you have queries to be processed, you don't need to process it

at one particular server farm or one particular cluster you might have multiple different

locations you might be able to process it. So the question really is, how do we take

advantage of the diversity present in where my query or my content can be serviced from?

So this is a little picture which is generated by Pando Networks, which is a startup with

whom we have had some contact. So they basically illustrate the fact that, you know, when the

Internet was created, like this was a little earlier, but roughly speaking, the Internet

really took off in the early '90s. The big applications there were FTP and email, right?

So the idea was because the Internet was created to ship large datasets between, you know,

big labs like cyclotron data, which is these huge mass of datasets. You ship them from

SMAC in Stanford perhaps to Illinois where there have a supercomputer center. So you

know this FTP was enormous and email has a--was a--was in existence, and then people realized

that, "Hey, we can use this Internet for many other things." In particular, the Web started

growing up. And so Web traffic really took off in the mid to late '90s, but more recently,

the understanding that queries can be processed from multiple locations. So if I have a file

on my machine I can give it to somebody else. If somebody else has it, they can give it

to more people. So in other words, using end hosts as servers has really, really taken

off in the past couple of years. And you noticed that in 2006, for example, a good deal, about

70% of the bytes that these guys measured, various different measurement traces seem

to indicate anywhere between about 60 to 70 odd percent or 75% of the byte from the Internet

seemed to appear for peer-to-peer traffic. Now further, this curve stops at 2006. More

recently, there was a report in NANOG this year on the fact that there is a good deal

of video traffic growing up on the Internet. And so the P2P in North America for file sharing

seems to be going down but peer-to-peer for video distribution is really popular in East

Asia. In China, there are these things called PPLive, TvAnts, QQLive, and so on, all of

which, which you are using now, peer-to-peer ideas for video distribution. So the question,

high level again, is how do we use kind of a distributed resource like peer-to-peer servers

or multiple clusters in a server farm? The two servers might request rather than servicing

it in just one location. So, I'm going to talk about--these two, I'll talk about in

fair detail and the third one will be a little less. So the first question I'm going to deal

with has to do with latency and cost in so-called ISP friendly peer-to-peer. So idea here is

that, you know, I can service my query at multiple locations. There is latency at multiple

locations and there's a cost to get there. This might be a transit fee, it might be roundtrip

time, it might be a bandwidth cost. So the question is, how do am I going to do load

balancing in such a system in a very simple fast and distributed manner? So I need to

do this in a parallel fashion. Now, the second question I'm going to talk about has to do

with file sharing. So very often when things get popular, they get popular very quickly.

This is called a flash crowd. I'll give a little more idea of what it means in a little

while. But really, we're going to understand what constitutes a flash crowd and how do

I service such a crowd. Do I use a server alone or a fixed capacity content distribution

network? Do I use peer-to-peer or do I kind of join them somehow together? And finally,

if I do get time, I'll put up a slide or two on using peer-to-peer for video delivery as

well. So, the first problem has to do with latency versus cost in so-called ISP friendly

peer-to-peer. The motivation is the following, supposing I have multiple different Internet

service providers, say Comcast is one of them, Suddenlink is another, Roadrunner is one more.

To the peer-to-peer, when I asked for a list of peers to communicate with, what it gives

me is a list list of peers from all over the place. So in particular, if I belonged to

Comcast and I get some peers from there, some peers from there, some from here. Now, is

that good or is it bad? Now, it is good in the sense that it gives me the best peers.

I get the peers who are most likely to be able to give me something worthwhile. It is

bad in the sense that Internet service providers have a cost of transit traffic between each

other. So if I have to move traffic from here to here, it costs something. If I have to

move from there to there, it costs something and so on. So Internet service providers had

been kind of trying to kind of tamp down on peer-to-peer traffic due simply because it

increases between ISP costs. But if you just look at cost alone you don't realize that

the latency of users can suffer if you just force them to live in one particular ISP.

So to give a clearer description of what the problem is, we would like a system which takes

into account both transit fees and latency. Now, this is very important to this--the distributed

query processing as well, which I will just mention in just a second. So I can, you know,

reduce transit fees by simply localizing traffic. I just keep it local, no transit fee. I can

reduce latency by choosing the best guy who can serve me. The question is; how do I handle

both of these together? How do we achieve this optimal operating point? So this tussle

between fees and latency is what I want to talk about and there's a bit of work in this

space. Very often it's kind of heuristic. It doesn't quite capture the essence of the

problem in my opinion. But nevertheless, there are some solutions which work fairly well.

And what I'm going to talk about is a little bit about the analytical solution to the problem

and how we implemented a very nice distributed solution for this. So here is my problem.

So supposing, for example, let's say I'm thinking of a query processing location. So, basically

these guys are all front end clusters or basically, when my query shows up, it is handed off to

some particular front end. These requests which are coming down here are all the queries.

We'll use a pen. So, for example, this is a particular location. There's a fair number

of requests which are coming to this front end. Now this front end, usually what we would

think is it is connected or is associated with some backend processing engine. It could

be a cluster of some kind, so all these nodes could be some kind of processors, and this

front end's job perhaps is to assign these queries to that backend cluster. Fair enough.

If this just happens if there is just one front end and one backend available to it.

What happens if I have multiple locations? So here what the objective to this is, I have

lots of request coming in here and a fairly limited processing capability here. Here,

I have fewer requests but a larger amount of processing capability because I have more

nodes there. So really, I should--I would think that, "Hey, I should send some of these

requests over there so that they can be handled quicker." However, the quickness or the latency

of handling is not quite everything because there could potentially be either a roundtrip

end latency between these locations or a bandwidth cost or in my ISP case there could be a transit

fee between. So really, it's not purely about latency it is both about latency and fees.

Now, these--the other is just the same, exactly the same ideas are applied to load balancing

of queries. All queries were chopped, they have be handed off somewhere for processing.

Do I hand them off to my local cluster here? Do I hand them off somewhere else? What do

I do with it? So, that is really the requirement. This business of steady state and transient

refers to the fact that there could be clusters for whom the request rate is less than the

available capacity, while there could be other clusters for whom the request rate is bigger

than the available capacity. So clearly, I should hand off some part of this over here,

but what fraction? How do I do it in a dynamic fashion? How do I measure the latencies in

a feedback fashion? How do I do it in a simple fashion so that it converges quickly and efficiently?

So that is my first problem here. So, they have a little bit of notation here. So, all

I'm saying is that this superscript refers to the backend cluster, while the subscript

will refer to the front end which does the assignment. So really what I'm saying is,

any cluster, we assume the cluster named "i" is assumed to have "C" sup "i" queries per

unit time capacity. So for example, this cluster out here would have "C" sup one, C1 amount

of capacity for processing queries. The request arrived at the front end server, which I'm

calling a multi-track, mTracker, or whatever the front end server is. At some process,

it could be a Poisson process, it could be anything really, with certain parameter as

"X" sub "j" queries per unit time. So to go back to my picture, I might have request coming

in here at "X" sub "1" request per unit time. So the question is, what do I do with this

"X" sub "1" request per unit time? Do I send them local? Do I split them up? What do I

do with them? So the important thing is I actually don't need to know either of these

for my algorithm to work properly. I can work with measured feedback latency signals as

I'll illustrate when I get to the algorithms. So anyway, so we have a kind of time scale

separation argument here, which are the large time scale, the load changes. So in my P2P

system where I've taken several measurements, it's--the things that you get popular get

popular together. So of the order of like 20 to half an hour, 20 minutes to half an

hour, nothing much changes in terms of the number of request per second. So I can think

of each of these as kind of half an hour timeslots. Now, the capacity of the systems, each peer-to-peer

cluster, are--if you're thinking of backend clusters, the clusters themselves could be

dynamic and that some things could go online, some things could go offline, and so on, so

I assume that in the order of minutes the cluster capacity or the peer-to-peer cloud

capacity itself can change. So, this was like half an hour intervals. This could be something

like 10 minute intervals. Finally, I have to do my traffic assignment. Where am I going

to route my traffic to? This I do of the order of seconds. So really, I'm first going to

just deal with this. Looking at capacity as constant, load as constant. And then, I'm

going to go up here, and say, "How do I change the capacity?" Finally, I'm going to talk

about how does the load change? So, the notation which we had it's exactly the same thing that

I had earlier. So we have this "X" sub "j", the request rate at the front end server "j"

in number of queries per unit time and an assignment metrics "X" is the way I assigned

this. So what does this mean? So I have this front end server "j" and it has multiple options,

let's say, he has three options. So then, he has to decide how much of the traffic he's

going to send to option one, which will go to the cluster one, how much to send to two,

how much to send to three and so on. So basically, this is the way the front end server "j" partitions

his load. Now, there are multiple front end servers so each one is doing this partitioning.

We call the whole metrics of all the partitions, the assignment metrics, as capital X. So is

there any question on what this means? Basically to give another illustration of what's going

on, I have an "X" sub "j" coming in here, I have an "X11" going in here, an "X12" going

there, an "X13" going here, potentially. Similarly, this guy can send traffic to each one. So

this, the assignment metrics would be a 9x9 matrix, because each guy has potentially three

options. And so, you can have--some of them could be zero, of course; but potentially,

these are 9x9 assignment matrix here. >> I think you meant 3x3.

>> SHAKKOTTAI: Oh, sorry, 3x3, a total of nine. Yes, sorry. So there we have our assignment

matrix. The question is, what am I going to do with this? So again, to just recollect

the notation, the origin is always a subscript, this is the front end where the query is actually

received, and the superscript is the destination to which the query is routed. So that's the

notation all over the place. So "X" sub "j" sup "2" means that the "j" origin will route

this much amount of load to that particular cluster. And obviously, I cannot drop any

queries so the sum of the partition or the sum of the assignments must add up to a load

at that particular front end. So the question is, how am I going to do this assignment?

Now, in order to understand how to do the assignment I need to have some idea of what

latency is likely to look like. So what is latency? So really, I send you a query and

after a little while you respond. That little while or a long time depends on how many others

are also querying the same thing. So this is a very simple latency function. It comes

from queueing theory, and it's really straightforward to understand what's going on here. So basically,

the idea is, I have an engine which can respond to queries at rate, "C" sup "i" queries per

unit time and I feed in a load at a rate "z" sup "i" number of queries per unit time. These

things go and setup and queue up, and this query engine keeps processing them and throws

them out. The question is, somebody comes in here, he sits there, he kind of goes across,

gets served and gets out, so question is how long is he going to sit there? What kind of

shape of the function is this going to look like? And the answer is it always looks convex

increasing because the idea is, in fact that if you look at that expression, as "z" is

much smaller than "C", you have very low load. As "z" comes close to "C", the denominator

goes to zero, therefore the whole thing goes towards infinite. So the idea is, as I increase

the load on a particular server, the latency, the time between the arrival and departure,

is going to increase and our convex increase as well. And this is true not only of such

function it's true of pretty much any system you choose to measure. Like this is true for

deterministic--for a deterministic job size. It is true, for example, in peer-to-peer systems

which I've got some measurements on. It's pretty much always happens that as you get

closer and closer to capacity the number of the latency involved. In fact, if you look

at the 101 at six o' clock you'll see this in action so, you know, as you get closer

and closer to capacity, you have huge latencies. So, the total latency, which is my unhappiness

of my users, so the per user latency was that, and the number of users per unit time or number

of queries per unit time is "z" sup "i," so the total latency suffered by all the users

getting in here is this. So--well, how does that play into the picture of this distributed

cache--distributed query processing? So, that was my picture right? I had, you know, traffic

coming in or request coming in from multiple locations to a particular front end and that

in turn went to the backend to be processed. So how much traffic is coming in? So, we have

basically, traffic from there, there, and there to be summed up so basically, we have

traffic from front end "r" coming to cluster "i". So on the cluster "i," the total load,

is just the sum of traffic which comes from the various different locations. So in that

particular diagram, the cluster number one is handling X from two to one, from three

to one, and one from one to one itself so that's why I have to sum all the traffic which

comes in. And of course, the total latency has some expression. I'm just using the MM-1

expression here for illustration. So I just stick in "z" is equal to the total load coming

in and this is my total latency experienced at some particular cluster. So that is my

latency. Great. What else do I have? I have this transit fee. As I said, a transit fee

can be roundtrip time on the link, which is potentially propagation delay plus queuing

delay. It could be the bandwidth cost, because I have to pay somebody for bandwidth cost.

In my guess, what I'm saying is that it's a transit fee between ISPs. So whatever it

is, there is a certain fee which is charged or which is the cost incurred for transferring

users from one location to another, and that is charged on a per query basis. So if I transfer

one user from "r" to cluster "i," "r" is the front end where the user actually showed up,

and "i" is the cluster where it finally gets processed; per user, it cost me that much.

So again, we have to find out how much is the total transit fee to get to a particular

cluster. And that we do by simply summing up the total load. So in other words, I have

traffic coming in from two to one, from three to one, and from one directly to one. One

to one cost me nothing, two to one cost me per user something, three to one cost me per

user something, and that is really what is being summed up here. So this is the total

transit fee or the total cost in getting to the cluster, right? So much for that. So what

is my system cost? My system cost that I look at is the total latency plus the total transit

fee. So we developed an expression for the latency which is basically load divided by

"C" minus load. So if you get rid of this, this basically says for any particular cluster

"i," what we have is the latency on that cluster plus the transit fee to get to that cluster.

And then, I have to add up all of my clusters because that is my whole system. So, if I

have multiple locations each location experience is a latency, a total latency. Each location,

there's a cost to get there, that is the total cost of a particular cluster, overall clusters,

is the sum. All right, so we need to design a controller that somehow minimizes this.

This really is the unhappiness of the world as far as I'm concerned. So, how do I minimize

this unhappiness, right? It costs me and latency cost my users because they get unhappy. Transit

fees cost me because I have to pay it all the time. So the question is, how do I design

such a controller that is both simple, distributed and fast? So, these are the three things.

Distributed in particular is very common among network optimization folks simply because,

you know, there are millions and millions of flows on the Internet. There is no way

on Earth you can do an optimization with all those flows sitting in one particular, you

know, central computer. So, a distributed processing of such optimization problems has

been a big question. And there's lots of ways of handling this. Now, so the objective really

is, you know, this is my unhappiness that is the lowest unhappiness that somehow I want

to attain and I want a distributed way of getting there. So this makes me as little

unhappy as feasible. Now, there's a whole bunch of controllers which can do this--oops--so

the question is how do I solve it? And there are lots and lots of optimization techniques,

and I'm going to pick one which is particularly useful this problem. So usually, there are

some which are fairly complex in terms of the computation. You can do centralized computation,

you know, in one big machine. And these methods, although they give you very accurate solutions,

they require a lot of processing in a particular machine. So the other methods which I've put

down here, these are some of them, are basically distributed methods. So there's Newton's method,

there's log-linear learning, best-response which means do the best thing given the state,

gradient descent and so on. So what I'm going to describe is actually a coupling of these

two, which is particularly suited for this state. But really, what I wanted to say here

is there is whole bunch of tools which have been developed in the kind of OR optimization,

the operations research literature, which are useful for such problems. And so very

often, when we solve network optimization problems, you can go and you know, look in

this bag full of controllers and pull out one which is most suited and then show that

it's suitable for your particular application. So, I'll tell you what my controller is going

to be in a second. But before doing that, I need to know what the kind of per unit cost

is going to be. So what does this mean? I told you that the total cost of the system

is this, the total latency plus total transit fee. Now the question is, if I take a deficient

on how much load to send to a particular cluster, how does this cost change? So for example,

I have two clusters and I have an arrival rate of, you know, request coming in. Now

if I move some of the--some to one cluster and some to the other, maybe it's good, maybe

it's bad. The question is how good or how bad is it? So the question is, if I change

this, if I change my assignment on a particulate dimension so I move a little from one to the

other, how does the cost change? Is it going to be simple function? Is it going to be messy?

How do I this? So the answer is, usually, in fact, in this particular example, it's

actually very simple. So, all I need to do is basically, I need to understand the impact

on the cost of changing one dimension. So it recollect "x" over sub "j" of "i" was the

amount of traffic which "j," the front end "j," was sending to cluster "i." So the question

is what impact does that one dimension change have on the overall cost of the system? And

the answer is obtained by partially differentiating one with the other and the answer has a really

nice form here. What it says is, I call this the marginal cost, marginal cost of a cluster,

the front end "j" on the cluster "i," and this marginal cost has three terms, one, is

the marginal latency. I send people there, they become unhappy. So how unhappy do they

become? I have a per user unhappiness. What--the next thing is the transit fee. I send one

guy from one place to the other, it cost you "P-j-i" because it costs you per user that

much. So in other words these two are very intuitive. I send one person from "A" to "B."

If he gets unhappy there's going to be amount of latency experiences at "B." If I send one

guy from "A" to "B," the amounted cost in terms of transit fee is "P-j-i." Great. There's

a third one, which is not quite so obvious, which is my impact on the other's unhappiness.

So in other words, I send a guy from "A" to "B," there are already guys being served out

there so the more I send there the other guys who are being served also get more unhappy

because my impact on their latency has to be considered as well. That is the third term,

which I call a congestion cost. So, these three together is really the impact on the

total cost of the system. So there's a marginal latency which is the user experiences, a marginal

transit fee which I have to pay, and a marginal congestion which is experienced by everybody.

So, this really is how the unhappiness of the world is going to change as I change my

particular dimension. Now these three, mind you, are all measurable. In fact in the system

which I'm going to consider, they don't even know our "C" or the "x" or--we know the "p"

but we don't know anything else. You can measure this, you can measure that, and you know this

value because it's a cost given to you. So the question is, now that I know what my marginal

impact on the system is, how am I going to kind of change my assignment? So that's where

this--the dynamics comes in. So to repeat, I have currently an assignment matrix, who

is getting sent where, I want to change this matrix. I change it on each dimension separately.

So I have to change really the amount that I'm assigning from "j" to "s." So the "j"

is the front end to which the request came, "s" is the cluster where it was served, how

do I change the number of users I'm sending to each one? And the answer in this case turns

out to be a very nice one and this is one called replicator dynamics. This comes from

evolutionary thought. The idea is things which are doing well you should more of them. Things

which are doing badly, do less of them. So that's pretty much all just the same. So supposedly

I was sitting here and I have three options available to me. Option one costs of, let's

say, $10. Option two costs, I don't know, $10 or $20. Option three costs me $5. So clearly

I'm going to send my guys to the $5 option. I keep doing that until the $5 becomes $10.

Now, the 10 and the 10, I have two of them which have exactly the same cost, then perhaps

I'll do it equally until they all hit 20 and then, you know, the system is essentially

stable. So really what I'm looking for is a comparison between the cost per unit for

a particular option from--of sending users from "j" to "s" with the average cost over

all the options available to me. So again in the three option example, I would calculate

my average cost over the three options and I would know the individual cost because I

calculated. If the--a particular option is doing better than average, which means it

has lower cost, I should do more of it. So if this has lower cost than the average, do

more of it. So this minus this will become positive because this has a lower cost than

the average cost so it goes up. If the particular options cost is higher than the average, which

means this is smaller than this because this cost is lower than this, this thing turns

out to be negative I do less of it. So do more of things which work well, do less of

things which work badly. And the--we can prove actually, this is actually stable. It actually

attains this point out here. That if we do this over and over again as a dynamic system

it goes to that lowest cost point which we were looking at, if we use the fee in this

fashion. So again, we have to consider per user latency, per user transit, and the impact

on others, the per user increase in congestion. If you add up these three and push it into

this kind of do more of whatever works for you, it actually can be proved to be optimal.

And how well does it work? Now this is on something called network simulator. So basically

we modified the BitTorrent client there so that you could actually move traffic from

one location to the other. And so the three curves which we are looking at is the following.

So single tracker means that we don't allow you to split, you have those multiple locations.

I say, "I will not allow you to move your traffic from one location to the other." Now

that is of course going to do very well in terms of transit fee because I'm not moving

anything it doesn't cost me anything. But it's horrible in terms of latency because

there are some clusters which have extra load, some clusters which have less load. So the

cluster which have extra load more than their capacity, they do really badly, so the overall

latency goes up significantly. Now the others which I'm comparing here are the following,

so the green one assumes that there is no price--no transit fee, that is. So basically

it says, "Okay. Shall we--move your traffic wherever you feel like. It's your business."

Of course, it does very well in terms of latency, it reduces the latency to bare minimum, but

it does cost you some, a fair bit, in terms of transit fee. The third one, which takes

into account both of these and uses this Replicator Dynamics kind of stuff, what happens to it?

It has a fairly low latency and a fairly low transit fee. We add up the two in the next

diagram. Out here, what we have is the lowest possible latency plus transit fee that can

be achieved with this particular load. So in other words, the assignment matrix here

is the optimal assignment matrix. And the nice thing about this is it converges fairly

fast. So basically, in our peer-to-peer system which we implemented, we basically took these

assignment decisions every eight seconds. So we took a decision, measured for eight

seconds, took a new decision, measured for eight seconds, and so on. So we measured basically

the per user latency which can be done in the cluster and fed back. We measured this

increase in congestion, which is the change in latency for a change in load, fed that

back. And of course the price was known, the transit fee was known so we took these decisions

every eight seconds. And we found in about four--one, two, three, four, five, so roughly

five or so, you know, splits or four or five number of assignment decisions, it converges.

In fact, this has very good convergence properties. So--and as I said earlier, the really nice

thing about this is, it is completely distributed, each front end can take its own decision based

on the cost that it sees so you don't have to solve, you know, centralize optimization

problem. Each one, each front end simply looks at the latencies which it sees in various

things. It sees the cost and it knows the congestion because it--it's fed that back

as well and it takes a decision at each time, based on with what query shows up. Now are

there any questions on this kind of assignment problem here? Yes?

>> You said this was made here from a system that you built, so what did you play it on?

Or how many front ends were there? >> SHAKKOTTAI: Okay. This is actually a network

simulator--simulations on a system, but it's not actually deployed. We're building a test

bed. But that, we don't yet have answers yet. So what this is, it's a network simulator,

simulates the whole protocol stack, and you basically take multiple BitTorrent clients

which are riding on that. So each guy comes, he starts up multiple TCP connections to various

different peers, tries to get it back. And we, on top of that, we built this so-called

tracker system which allows people to be assigned to various different domains and we artificially

assign cost of going to each of these domains. So under that is where our simulation is.

So this is basically we had of the order of something like, I don't know, something like

5,000 clients or so running simultaneously. And the number of times we ran it order to

get these plots, we ran the whole thing about 40 times and averaged out in these or these

standard deviation marks. >> For how many front ends?

>> SHAKKOTTAI: Four. >> Okay.

>> SHAKKOTTAI: Right. >> And that'll affect your convergence time

quite a bit, I would imagine. >> SHAKKOTTAI: That it does. But the thing

is each one has only so many options available. So the question is, how many--if you give

each front end the option of choosing everything on the planet, of course, it'll have lower

convergence time. If each front end has a set which it's allowed to choose from, it'll,

you know, decrease your convergence time significantly. So that's the thing. So, our idea was we don't

want to fetch, for example, going back to my Bombay versus Houston example, I don't

want to interact with a peer-to-peer network in Bombay, I would like to interact with things,

other things, in Texas and possibly California. So, the number of options should not be that

high because in any case my transit fee will get really large if I interact with multiple

faraway options. So, that was the object. >> I have one question. What's the addition

function that you use when you sum up the latency and their transit fee?

>> SHAKKOTTAI: We are just using plain addition. You can pick any convex combination and it

will still work. So we basically thought latency we measured in dollars, transit fee we measured

in dollars and just add them up together. So it's just simple addition here.

>> Okay. >> SHAKKOTTAI: But you can change the constant

in the front end; it will change nothing in the crux of the matter.

>> Can the transit be calculated simply? >> SHAKKOTAI: It could be. If the transit

fee itself is kind of point-to-point link latency, it could also be in that. So it's

there, in which case it's all latencies. >> Okay.

>> SHAKKOTTAI: So, the next question we wanted to answer in this space has to do with admission

control. The idea is, supposing my clusters are of a certain fixed capacity, whatever

it is, and supposing the load is much higher than that, then there is no assignment on

the planet which will already use my latency to an acceptable level. So at that point,

I need know what to drop. How much can I admit, how much do I, you know, drop? So, the kind

of idea expressed here is that given a particular load vector, I know how to minimize the total

unhappiness of the world. But then, this total unhappiness of the world, which is minimized,

might itself be unacceptable because there's simple too much load. In which case I better

start dropping some of them and how do I drop? So, the idea is that each user has a happiness

associated with himself that can be thought of, supposing for example, each user brings

$1 worth of happiness so each user pays me a dollar to be serviced. In which case, the

rate of reducers arrive as "xj," the rate of each money--I mean, each guy brings is

$1, in which case this log won't exist, I will just have dollars times "x." So each

user brings $1 and I try to minimize the happiness minus unhappiness. In a kind of generalized

setting in a network optimization, we often consider concave increasing functions like

log. The idea is that the marginal impact of each user becomes lower and lower. Like,

for example, supposing I have a query which comes as soft, say a request for search results

with images, with ads, and so on. So each marginal part of this, so I guess the search

results are really important, which is a big text to our, you know, a bunch of text which

I need. The images are perhaps slightly less important. Then perhaps the ads, I don't know

how important they are. I presume they are important, but the question is how so at some

point you have to decide which is more important that the other. So this idea here is that

the log tries to indicate that, you know, the marginal impact of the increasing number

of queries, somehow it is important, it does increase your happiness but it doesn't increase

it as much as, you know, in a linear function. So, if I have three queries, my first query

must be handled, the second query perhaps not so good, the third query perhaps even

lesser, you know, that kind of thing. So, this will work of course for linear log and

a whole bunch of concave functions, so. But in general, the high level idea here is you

want to maximize the sum total happiness or minus the sum total unhappiness. So, just

to put this in the right framework, the load assignment problem's job is to just minimize

total unhappiness, whereas the utility maximization problem is to maximize the total happiness

which is the, you know, things which make me feel good and for the things which make

me feel bad. So, the question is, how am I going to do this admission control in a simple

distributed fashion? So essentially what this means is each of these front end servers has

to decide whether or not to admit a query and it needs to do so, understanding what

the impact of that query is going to be on the happiness of the system. So really, if

this is the total happiness of the system, I need to know what the impact of changing

the "x," which is the load which is coming in on the system. In this case, it turns out

it's a very simple function. We just have to differentiate with the respect to "x-j."

And what we can do is use something called a gradient ascent-type controller, which essentially

says if something is, you know, increasing the cost then cut down on it, if something

is decreasing the cost, go up. And we implemented this as well on our NS tool, I think. Oops,

I went backwards. All right, and so, these three colors now are completely different.

Basically, there's just one system with three front ends. So there's a blue front end, a

red front end, and a green front end and they have to take admission decisions and the objective

here is to show that although they take their admission decisions in a distributed fashion,

it does converge and the total system happiness also converges. So the idea is that somehow

in a distributed fashion, I need to minimize cost and maximize happiness. So that is what

is being done here. This interestingly is about 7% from the optimal solution. This is

again being done over a huge number of BitTorrent clients. With actually in this case, it was

three front ends. So, now--so the key insight from this part of the talk is basically that

it is possible to align incentives in terms of end user latency and cost objectives and

it is possible to do it in a distributed fashion, in a fairly simple way, because everything

that you need is measurable. So I don't even need to know this, the capital "C," the capacity,

I don't need to know the load. All I need to know is this marginal impact which can

be measured because I can measure latency, I know cost. I can measure congestion. So

ongoing work in the P2P space, capacity changes in a weird fashion because people come, people

leave, and the question is how do I incentivize people to stay and serve other guys? And also,

we are just working on a very, very simple test bed which we want. We just want three

servers right now, in which--which are connected by real network links, with a certain cost

and want to see if this performs as it should. So that's the current state of that. Any further

questions on this part of the talk? Yes? >> I have a question of what happens when

the prices change over time? >> SHAKKOTTAI: When the prices change over

time. Okay. So if the price is going to change over time, it's...

>> It was 18 probably, like then there's nine times over...

>> SHAKKOTTAI: So then--okay. You--the question. Oh, okay, over that timescale nothing will

change. Because this, the convergence of the innermost loop is of the heart of actually

40 seconds to a minute. So, if your prices change of the order of days, no change whatsoever.

It will converge long before that, if will keep converging. So, any further things?

>> So what happens when you actually drop it? I mean, in a real system, wouldn't the

client then just try again and then... >> SHAKKOTTAI: Right.

>> ...you're still paying that transit cost and everything? I mean...

>> SHAKKOTTAI: If have you drop it, well, you'll drop it again. So basically, once you've

got this value out here, this is really the supportable traffic. So basically this says

once it's converged, that if I want to maximize the happiness of the world, this is the amount

of load I can support. So any further load, I just have to keep dropping.

>> Aren't you though paying that transit fee every time the request is made because you're

not making that decision to accept it until it arrives at the server?

>> SHAKKOTTAI: Right. So, you--okay. The query itself, as far as P2P goes, the query itself

is just this tiny thing saying, "Give me a piece or give me a part of the file." The

transit fee actually has--is in the service itself. So once I've been accepted, shipping

the traffic across will cost... >> It's actually not the--when you say the

cost is a per query basis, that "P-i-J" it's really only the response and not the actual?

>> SHAKKOTTAI: Yes, the service itself, yes. >> Okay.

>> SHAKKOTTAI: If I decide to accept it, otherwise I just drop it, yes. Although, in the admission

control problem, the admission control is done at the front end where it originated

so you won't even--the query won't go across to the other side if I'm not going to accept

it in the first place. So, what I'm trying to get at here is, I've got the--oops. Oh,

this is a good picture. So, here is where I take my admission decision. So I won't actually

send the query there if I think that, you know, it's going to have a bad impact on the

system. So, once you are dropped here, you're dropped. So you might come back, you might

be dropped again. So, it's only if I think that, you know, the system performance is

not going to be massively impacted by you that I accept you and in the inner loop I

decide where you're going to be shipped. So, the inner loop decides where to move users

around, the outer loop decides whether or not to admit them in the first place, if they

have anything there. >> Yes. And I was just thinking about it from

the search problem where the actual transit causes the additional latency.

>> SHAKKOTTAI: Yes. >> And there you're, you know, that's being

paid on the drop. It takes you that much extra longer to decide that your query is not going

to be serviced. >> SHAKKOTTAI: Actually, no, because it's

dropped at source. So, if anything, it'll add to the latency of the query proceeds as

in the front end, this front end, not that. >> Okay.

>> SHAKKOTTAI: So, in other words, you're dropped where your request originated. So,

there's... >> So, when you talk about congestion, do

you also consider the degrees in congestion? Let's say I go from one to two, so you'll

talk about the backend to increase in the congestion? Do you take also like the degrees

in congestion at one? You sum it up or all the...

>> SHAKKOTTAI: It has--that's where this stuff about comparing with the average happens.

Because I compare, so, the query comes here and I compare the latency out here with the

average or all the options available to me. So, I'm inherently comparing what will happen

if I sent it locally, because it's doing better there is why I send it there. If it were to

do better locally, I wouldn't even bother. So yes, there is a comparison there. It's

implicit, there's no explicit, you know, comparison now between the two. All right, I'm actually

running out of time, but nevertheless let me talk about this stuff about flash crowds.

And I--yes, that's out here. Oops, right. So, the idea here is--has to do with how do

I share files when they are very, very popular? So, the question here is the following. This

is a cute little graphic from Pando again. So, Pando basically says, "If I have a CDN

only," you can think of it as a big server, "As I have more and more people who request

the content from the server there's worse and worse performance because the latency

is bound to increase in exactly the same convex fashion." And if I want to service them at

a given quality, I either have to portion a lot, which is higher cost, or those guys

will, you know, be unhappy with me and leave, which again means a decreased revenue. So,

that the point is that, you know, if you can somehow utilize end user resources. And, you

know, you can think of these end users, one possibility is to think of them as caches

which are closer to the end homes, perhaps at the top level or perhaps even at the cable

add-ins, or you can think of them as end users like you and me. So basically, the point is,

if I can somehow utilize the resources lowered down in the hierarchy then more and more users,

or more and more viewers, of this content might mean better and better performance simply

because there are so many more copies of it available. And you can lower your cost because

you don't have to unicast all the way from your server. So the question is how are we

going to combine these two methods, you know, the server-based and the peer-to-peer based?

So before we ask how do we combine it we need to know what kind of load we are likely to

see. So, this is an analytical model for describing such things. What it describes really, is

a viral distribution, a viral propagation of interest. So, I of (t) is the number of

people who are interested in a particular piece of content of some type. So this is

the total population of guys who say, "I want it." Now how does that change with time? So,

this is called a bass diffusion. This is--comes from Frank Bass's work in 1960's, where he

was looking at the propagation of interest in microwave ovens. So there's this new gadget,

this microwave oven. How do people request it? What is the rate of change of interest

in such things? Now this is being used much earlier in population dynamics in viral model

into one. The idea is the following, so there are so many interested people right now, right,

and "N" is the total number of people. So "N" minus "i" are the people who aren't interested

right now. If I advertise, it reaches a constant fraction of the people who aren't interested.

So "K" is a constant which describes my advertising budget. Maybe I do television, billboards,

whatever, and it reaches some constant fraction of the population. On the other hand, people

tell each other things. So, each guy who's interested might tell the guy, "Hey, this

microwave oven is so cool." So, people who are interested pick somebody at random and

say this is interesting. The probability you'll find somebody who is not interested is the

number not interested by "N". So really, this captures two effects, an advertising effect

and a viral effect. Now, very often in the things like YouTube videos for example, there

isn't very much of an advertising effort, it's by and large viral. So, for a file distribution

of this kind we're going to drop the advertising and look it's essentially only at the viral

propagation, although the results which I'm going to talk about, essentially it holds

for that as well--the advertising as well. So, we can solve the differential equation

explicitly. You see this all over the place. This is the number of interested people and

it follows the so-called logistic function. The idea is initially nobody much is interested,

people start telling each other. At some point, suddenly you have this growth. This is an

exponential growth at that point and then it levels off. This is characteristic of what

is called a flash crowd. Essentially the idea is, at some point, you see sudden increases

in interest of something. The question is how am I going to service this interest? Okay,

this is fairly common. I worked with a guy called Mike Friedman, who runs a content distribution

network on Coral called CoralCDN. It's based on PlanetLab which is this universal network

and he hosted a whole bunch of files on it. In particular, tsunami home videos, you know.

Basically, it's when the Asian tsunami struck there were lots of home videos and this is

one example of such, you know, interest growth. You know, you have this sudden spike of interest.

All right, so what am I going to do? I can use the CDN, which I call a bank of servers,

I can use peer-to-peer, or I can use both. And since I'm giving this talk, I'm going

to basically show that hybrid has some phenomenal performance improvements. So, a little bit

of notation again. I(t) is the number of interested people in this file, it's a cumulative demand.

P(t) is the number of people who possess the file, it's the cumulative service. So, I really

need to just work with these two and I'll say why this total latency is the area between

those in just a second. So, I have a server, it has a capacity "C," how am I going to,

kind of, understand the latency experienced? So what happens is this blue curve is the

demand, the I(t), and the green one is the service curve. So, what happens? Initially,

very few people asked me so the rate of request is much lower than the capacity, I can serve

it pretty much instantaneous, so it goes along there. At some point, the rate of requesting

is more than the capacity available. At that point I see this divergence. So the idea here

is--so, if somebody shows up here. So, the blue curve is demand. So, somebody showed

up at this time. He got served at this time. So there is a little delay. So, as time progresses,

since the arrival rate is so large, somebody who shows up here got served here. Shows up

here, got served here. So this area between the curves is the latency or the delay experience

between the demand and service. So, the slope of this curve is the capacity because the

server is running at capacity at this point. It simply cannot keep up. At some point when

the demand slackens off again, it catches up. So great. So basically we have this need

when my server just gives up and eventually it'll catch up and the area between these

curves is what I'm interested. Now, I can actually analytically characterize these exactly.

And the area scaling looks like theta "N" squared over "C". Don't bother to remember

this for now I'll do a comparison at the end. But it's just intuitive to realize that this

looks kind of like a triangle because the interest looks almost like a staircase and

the service looks almost like a straight line. So the area of the triangle roughly gives

you the intuition, although we can do an exact calculation.

>> "N" was the total number of users? >> SHAKKOTTAI: "N" is the total interested

users, yes, finally. The service model for peer-to-peer is essentially the same kind

of thing. It's basically, if I'm interested but I don't have it I ask somebody at random

and if they have it I can get it off them. So really, the thing about this is the interest

and the possession coupled into each other. Because only if I'm interested will I ask,

only if I have can I serve, so that is the point. These two can be solved explicitly.

It gives you a green curve which looks like that. The interesting thing here is initially

nothing much happens. All the way I asked people, do you have it? Well, they don't have

it. So, nobody has it, nothing happens. But once a small number of people have it, it

just shoots up exponentially because each one can serve a whole number--a fair number

of other guys. So, basically until about login time, nothing happens and by two login time,

everything is over. So again, we can characterize this area, and the most interesting thing

is the CDN does well initially until the server is overloaded. P2P does well once the installed

user base is large, once some number of people have it. So, how do I combine them? I first

use the server, then I use P2P, and we can show that it gives you very good improvement.

So, I'll give you an example here, supposing I used the server capacity "N" by login. "N"

you may recollect is the total number of users who are eventually interested. The server

alone or the CDN can give you latency of the other login. P2P alone can give you a login.

The hybrid though can give you a log login which for fairly large values is essentially

constant. Further, I use this server only for a very short time, only for login time

in the hybrid case. So, the thesis here is that you must serve for a short time using

your server, at which point your P2P has enough capacity to really ramp up. So, these are

simulations in the space. So, the first one is the server based, again, demand and supply.

This is the peer-to-peer one. You can see there's a significant area between these curves.

Essentially, the peer user average delay was of the arc of 10 seconds here. Here, it's

all of the arc of one second roughly because the area is much lower. I do server first,

then peer-to-peer. And well, then the question rather is, what happens if I do both simultaneously?

The answer is not very much, because if I first asked the peer and then asked the server,

initially everybody gets served by the server. And so this pink line is the amount of server

usage. So, initially everybody is getting from the server. But once you hit a certain

value, nobody much--you know, it just levels off and most people are getting it using the

peers. So again, this just reinforces the idea that an initial boost using a server

is what a peer-to-peer system needs. You do that boost and the peer-to-peer can take over

after that. And this is true of CDNs like ACMA as well, where you might want to do unicasts

to the caches at an initial amount of time, after which, you know, the caches themselves

can handle it. So, I think I should stop here--on and take questions. If you're interested in

peer-to-peer streaming, I'm around, as I said, at Stanford until of next week. So please

do send me an email and we can discuss some of this. Also, I've given a hundred of papers

on the--in the space. And so I will just stop here and say thank you, and take questions.

Thank you. Anything further? Yes? >> One thing in--within one thing to a peer,

I always run the risk that that user randomly decided, "So okay, I'm not interested anymore,

now I'm walking away." >> SHAKKOTTAI: Right.

>> And now I get, even if it's just for a short time, again, the timeout so my service

is worse... >> SHAKKOTTAI: Is worse.

>> ...then if ever would be connected directly. >> SHAKKOTTAI: Right. So, they're very rarely

implementing this or the way they are looking at how it should be kind of implemented is

as a feedback system where I keep track of my, you know, my average rate and the system

as a whole, has a target average rate. If your rate falls below a threshold, the server

should boost you. If your rate is above the threshold, you're fine. So, you might get

timed out, but this is on a per chunk basis. So, in other words, there might be a few chunks

which you might, you know, have a lower rate more but the server will quickly boost you

if your service rate falls below a threshold. So, essentially that was what I was talking

about here, in the case where people aren't leaving, I've done simulations on when people

leave. Even there you get significant input. Here what I was saying was if people don't

leave the server is not used after awhile, it just remains constant. So there's the accumulative.

So, essentially if I looked at the derivative this would go to, you know, zero after a point.

But so, the server is essentially used only initially for this. Now if people start leaving,

again, the number of people who possess, which is this red line, would kind of go down, in

which case the server will be used again in the end. We did some simulations on that for

various different departure rates. Even there, you do get significant improvements in the

amount of server usage. So the server can be multiplex or multiple pieces of content,

rather than, you know, having to serve everybody through the same unicast. So, yes, as long

as there are peers in the system you use them. If at any point it falls below that your service

is bad your feedback loop will take care of it by boosting use in the server. So that's

how you'd want to combine the two to ensure a quality of service guarantees. Anything

further? >> Yes. Have you looked at the social aspect

of peer-to-peers like the people involved that you mentioned?

>> SHAKKOTTAI: Right. >> So, they have this like, you connect to

your social network and the whole factors that you incentivize your social network.

So, like Victor and by default, you know... >> SHAKKOTTAI: Correct.

>> ...has the initial upload rate which is the same for all the peers it is connected

to. >> SHAKKOTTAI: Yes.

>> Whereas I have done some work in those and the people we worked was more extension

of that then. >> SHAKKOTTAI: Yes.

>> We've seen like up to 20% to 30% decrease in the total download time when you use this

specter. >> SHAKKOTTAI: Yes.

>> Have you looked at those? >> SHAKKOTTAI: What we are looking at actually

is this kind of like an exchange kind of situation. Like essentially, I have content, I have bandwidth.

I can sell it to somebody essentially. So, we're looking at it as a distributed market

system. Now we look at the prices there as the best thing I can get. Now if you're my

friend, I might want to subsidize you that is an aspect which I've not considered. But

I've been looking at peer-to-peer as an exchange economy where, you know, you essentially can

exchange your resources for somebody else's resource. So, yes, some work but not in the

context of friends giving to other friends for cheap so, no.

>> There has been some work about making sure that you'd stay on the system. Have you heard

of Big Tyrant? >> SHAKKOTTAI: Yes. So, yeah, there's a lot

on how to incentivise users to stay. One managed Victor and automatically does this you know,

it just kind of slows you down at the end so that you have a little more, so...

>> Yes. >> SHAKKOTTAI: ...I think these incentives

have to be considered. So, if you look at it as I said, as an exchange economy, many

of these incentives are automatic because I'm staying, because I make some currency,

and I can use that later on for something else which I want to download. So, the answer

is, there is a lot of space in this region. But, one question which automatically arises

to is who owns the content? So, if I'm making money out of the content, through kind of

uploading it to somebody else, am I somehow, you know, kind of withdrawing or detracting

from the amount which would be made by the owner of the content. So, I don't know. Some

kind of scheme like the way that Google does it, you know, each publisher can--will get

a small amount of money based on, you know, visits to their Web page and clicking on ads

might be a feasible solution. So, it's unclear how to create such exchanges. I mean, as in

the--how the currency mechanism would actually work. But we can show you, get significant

improvements, if there were such a currency, so it's all...