Uploaded by NCState on 17.02.2012

Transcript:

It’s a great pleasure today for me to introduce Tom Dietterich. I’ve known Tom for quite

some time. He…which I brought a prop. He actually was an author on my first machine

learning textbook. And I got to meet him when I was a graduate student at Berkley. And I

guess this was not long after Tom started his first faculty position after getting his

Ph.D. at Stanford. He came down to visit Berkley and I enjoyed getting to meet him then. And

even then as you see with this textbook he was playing a very important role in shaping

the machine learning community. And he’s gone on since then to continue to play a role

in shaping the machine learning community. So he is a Triple AI fellow and ACM fellow.

He’s been program chair for for Triple AI, program chair for NIPS. He’s been very active

in the International Machine Learning Society, and really a mentor in the field to a lot

of young people. And Tom is one of the few people to this day who really sees the entire

field of machine learning. And as the fields have become increasingly specialized, it’s

rare to find people who can appreciate the whole field and take it all in. And that’s

one, the great, many great things that Tom is known for. And today he’s going to be

telling us about a very important application of machine learning, which is to computational

ecology and environmental management.

Thank you very much Ron. So, the work I’m going to be describing today is obviously

a very collaborative interdisciplinary, and the collaborators in particular that I want

to mention is my graduate student Ethan Derigensky, two post docs Rebecca Hutchinson and Dan Shelton,

and then colleagues Wanking Wong who’s in Computer Science, a machine learning person

Clair Montgomery who’s a forest ecologist. And then several folks at the Cornell Lab

of Ornithology.

So, if we look at the earth’s ecosystems or the biosphere, it’s a very complex system.

And I think we can agree that in many ways we have not managed it in a sustainable way.

And so I thought I would start the talk by asking about why is that so, and is there

anything that computer science can do to help? And I think, I mean everybody had their own

views of why this is so. But I think maybe there are three reasons. First of all we don’t

understand the system very well. So, it’s very hard to manage a system when it’s behaving

very unpredictably. And there was a very thought provoking article by a group of authors, first

author was Doak in 2008 where they talk about; they ask the question are ecological surprises

inevitable. Or is the dynamics of ecosystems so complex that we will never really be able

to predict the behavior of the systems reliably. And to sort of support this well, thesis they

go through, I don’t know, fifteen or twenty different examples of situations where either

something completely surprising happened like the population of a species in the Gulf of

Alaska suddenly exploded and then five years later disappeared again, with no one knowing

why. Or examples where we attempted a an intervention in an ecosystem and then it behaved in a,

the outcome was very different from what we had intended. And one example that is very

current right now in the Pacific Northwest is the Northern Spotted Owl. So, during the

late 80’s and 1990’s we had what we call the owl wars in Oregon, because there’s

this species that was listed as an endangered species, the Northern Spotted Owl, and its

preferred habitat was, is old growth forests. And these, most of the old growth forests

on private land had already been cut, and so now there was a lot of logging in the national

forests in the public lands and the conservation community wanted to shut down all that logging.

And obviously the Forest Products Industry which was a very important part of the Oregon

economy was dependent on it to a large extent. And it took you know the President had to

come to the state and bring everybody together. And they came up with this zone called the

Northwest Forest Plan, which by and large did stop logging forests and federal lands,

which had a devastating impact on the economy. And the hope was that this would help the

spotted owl recover. But spotted owl numbers have continued to decline since then. And

partly that’s because there was another species that has come in from the North. The

Canadian Invader, which is known as the Barred Owl. And it turns out it is more reproductively

successful and more aggressive. And it seems to be pushing out the spotted owl. So, that’s

the kind, that’s another example of an ecological surprise, and it’s one of the reasons managing

the ecosystems is so difficult.

I think another reason that we’ve had trouble managing ecosystems is that we’ve often

focused on only a small part of a very large system, because the system is so complicated,

and we’ve focused on only one piece of it. So, that could be a single species like the

Northern Spotted Owl might be an example of that. And we’ve often also ignored some

of the larger contexts. There’s a colleague of mine, Heidi Jo Albers who has studied things

like creating forest reserves in tropical forests. And often these forests, when you

design these reserves, you need to consider what the native people might be using that

forest for. If you don’t take that into account, in her case that meant creating large

buffer zones around the actual forest, you end up with those people making encourageons

into your bio reserve and degrading it in one way or another.

So, having to consider the spatial aspects, the interactions among multiple species, these

are things that are often ignored in a lot of ecology and ecosystem management. And finally,

I think particularly if you look in agriculture, we often deliberately manipulate a system

to simplify it in order to try to manage it. So, in crop agriculture for example we try

to remove all of the other species so we only have to worry about one species. But as a

consequence we have to provide a lot of the support for that species that would normally

be provided by other species, like fertilizers and pest management and so on. We have to

provide those as exogenous inputs. And many of those like I say some of the nutrients

that we’re providing now are becoming expensive. And this is not a sustainable way of managing

those systems.

Well, and I’m sure you could go on and list many other things. What can Computer Science

have to offer? I mean the reason I’m here is because I think there are several things.

First of all if we look at the question of our lack of the knowledge of the function

and structure of the systems, we now have a couple of ways that we can contribute. First

of all you know, we and our colleagues in nanotechnology and electrical engineering,

we’re producing all kinds of novel sensors that we…so we have wireless sensor networks.

We can create thousands of sensors, put them into these systems, and be able to monitor

them much better.

And of course the machine learning community and computational statistics community have

been working on building the modeling technique that can scale up to much larger systems.

Although of course it’s still a challenge, but much more than say was possible twenty

years ago. When it comes to this question about focusing on subsystems, some of the

same story. Obviously with our modeling tools we can now look at the larger system in which

the smaller system is embedded. But I think we also now have tools in say mechanism design

to look at the interactions of different parties that might be competing for a resource or

tools in modern optimization that let us find good solutions to very large and complex optimization

problems.

And again when we come to agriculture it’s a different combination of these three things.

But better sensing, better modeling, and better optimization all have a role to play in allowing

us to model these systems and manage them better.

So, this general field that we’re calling computational sustainability, is one of the

big things in my group is we are joint with Karla Gomez at Cornell have one of the NSF

expeditions in Computer Science projects. So, a ten million dollar grant to try to boldly

go where no computer scientist has gone before, and in particular to look at computational

methods that can contribute to sustainable management of (unintelligible) systems. And

so as a machine learning person I tend to think about the computational challenges that

are here in terms of a pipeline, from data to models to policies. And so what I’m going

to do in this talk is first talk about what I see as some of the work that’s going on

in each of these areas outside of my group briefly, and then drill down on three specific

things that we’re doing in my group that contribute to this area. And so I’m hoping

you’ll get a sense of the range of challenge problems that are here and some of the opportunities

from a Computer Science perspective.

So, the first thing I want to talk about is sensor placements. And Andres Crowlza (sp?)

and his students have been doing some really exciting things there. So, this particular

example is a case where they’re (which I’m not supposed to point with this, I point with

this) where this is a city’s water network. And they want to know where should we place

sensors in this network in order to detect pollutants or maybe an attack, but a chemical

that’s introduced into the system. And their main tool that they use is something called

sub modularity, right, which is the idea that if, that if you have a function, it’s a

function of a set. In this case the set of places that you have put your sensor. And

it exhibits a diminishing returns property that you know once you’ve placed K sensors

the K plus first one is going to give you less benefit than the K one and so on. Then

you can, that’s, you can formulate, if your objective function is sub modular, then the

greedy algorithm in various sophisticated variants of it, give a performance that is

within a constance of optimal or proportional or fraction of optimal. So, you can get very

good results and in fact they won some competitions on for water, water quality monitoring. And

they’ve looked at many other problems as well. So, sensor placements and of course

this has a lot of relationship to the huge literature and experiment design and its existence.

The second thing that comes up is what I call data interpretation for lack of a better word,

which is often the raw data you get from your sensors, is not at the level you want for

your modeling effort. And this is particularly true for image data. So, for the last eight

years I’ve been running a project that we call the Bug ID Project, where we take photos

of moths and arthropods and soil and freshwater larvae. And we want to identify them to the

genous level and ideally to the species level. And this might be for instance input to building

a model of the distributional species in space to tracking bases species. Or even to water

quality monitoring, where you want this histogram by species of how many individuals you had

in a given stream. So, this particular picture here is from a collaborative of mine Qing

Yao, who’s looking at rice pests, and they put out these light traps at night. And moths

wonderfully trap themselves in these traps. And then they spread them out on a glass table,

photograph them from above and below and then they want to account to identify to species

level.

The third problem then I call data integration. I guess that’s an established term. The

problem is with a lot of ecological modeling challenges you have data coming from a wide

variety of sources, and a wide variety of scales in time and in space. And you need

to somehow pull all this together in order to then fit a model to the data. And so in

what we’re doing for instance on bird migration modeling, we’re dealing with data. Everything

from stuff that basically never changes like a digital elevation model of the terrain to

things that are maybe changing on a fifteen minute time scale, like the temperature or

the weather and having to integrate all of these things.

And then we come to the part that you know is really my core competence, which is model

fitting and machine learning. And so there are of course a wide range of models in ecology

that people would like to fit. We’ve been looking really at just three kinds of models.

The first are what are known as species distribution models. And the question there is can we create

a map of where a species is found in the landscape. And so that’s very close to sort of the

core machine learning supervised learning problem. You’re given a site where with

some set of features describing it and then either the species is present there or absent

there.

Another kind of model is something called a Meta-Population Model. And here we imagine

that we have a set of patches arranged in space. And a patch may be occupied by a species

or not. And over time the species may reproduce. It may spread to other patches; it may go

locally extinct and then get re-colonized. So, that’s sort of focusing on space and

looking at what comes in and out of a cell. And then the other (unintelligible) sort of

migration or disbursal models where you follow the organism instead. So, you want to model

the trajectory say that a bird follows or the timing of movement.

And so there’s work in machine learning on all of these. One I want to show is what’s

called a STEM Model that was developed by Daniel Fink at the Cornell Lab of Ornithology.

And so at the Lab of Ornithology they have a big project called Project E-bird, where

if you’re a birder you can go out observing in the morning say and then fill out a checklist

on their webpage and say here’s what I saw and I didn’t see anything else. You can

click a button for that and then upload it. There are a lot of avid birders out there.

So, we’re now getting like a million data points a month from people uploading. And

they exceeded three million points in May, sort of the peak of the breeding season. And

so there’s a lot of data. Unfortunately it’s completely uncontrolled. Right? So,

you have lots of variation expertise. You have no control over where people go. But

you can still do some interesting things. And what Daniel does is fit ensembles of decision

trees to try to predict whether this species, in this case the Indigo Bunting, will be present

or absent at a particular place and time. And so I’m going to show you this movie,

but it’s important to realize this is a series of snapshots. There’s no dynamical

model here. But this species winters down in Central America. And you’ll see the orange

colors. That’s the species is predicted to be present first along the Mississippi

Valley and then sort of spread out through the entire eastern U.S. And then as we move

into September, this is sort of a clock ticking along the bottom you see the species goes

back down to Mississippi and disappears from the U.S. And so this is a really I think a

very nice model. And it was used as part of a something called the State of the Birds

Report to try to estimate what fraction of habitat for each of the something like two

hundred species of birds is publicly owned versus privately owned. And this report came

out late last year.

So, once we have built a model like this then it’s time to say well it’s great that

we have this model of birds but what can we use, how can we use that to make policy decisions

to manage the ecosystem. And I don’t have a good example for management with birds but

with fish John Leathwick who does excellent work in New Zealand, so I don’t know if

you can tell, but see these gray things over there, these are the islands of New Zealand.

And these blue dots are where and red dots correspond to places where fishing trollers

found, did not find or found, the red ones are positive, did harvest a particular species

of fish. The Mora Moro. And the blue line around the outside is the exclusive economic

zone of New Zealand. And so using this data, he fit a species distribution model similar

to the one that I was just describing except that instead of estimating presence or absence

he’s estimating the catch in kilograms, so the biomass of the fish. And so these are

his estimates. The blue areas there are no fish at all and then you can see this pattern.

And then what he wanted to do was then use that to prioritize regions for their conservation

value in terms of supporting that, allowing this population to grow. And the left pot

is prioritizing them if we ignore the fishing industry and just say what would be the places

that would best encourage the growth of the species. But of course you really need to

consider these within an economic context. And so the right diagram re-prioritizes them

now taking into account the cost of the fishing industry. And you can see, I mean the main

lesson here I think is that there’s still a lot of places that we can conserve and yet

also still have the benefit of fishing.

So, but this is a kind of a spatial optimizational problem to solve. And I’ll be talking about

some more of those. So, finally that we have the problem of policy execution, this is usually

of course the chasm to go from a design policy to one that we can convenience people to actually

adopt. And you know at the simplest level we just have a policy where at each time step

we observe the state of the system. And then we choose the action that our policy tells

us to choose. And we go ahead and act. But in practice we’re often called upon to act

in a lot of ecosystem management problems, before we have a very good model of what’s

going on. And so they’re really what we would call a partially observable markup decision

process or worse, where we don’t have a complete understanding of the system we’re

trying to model. I think a challenge here is that, this means that our policy in our

early actions should be designed not only to achieve the ecosystem goal, but also to

help us gather more information about the system so that we can improve our model. So,

we have duel objectives. And these are very difficult to optimize.

And one of the big concerns I think in particularly in light of these ecological surprises is

can we design policies that are robust to our lack of knowledge. Both to the known unknowns

to think that where we know that we’re uncertain and we can model our uncertainty, and then

also to the known unknowns the factors that we forgot to include in the model. And I think

that’s one of the most interesting intellectual questions. I don’t have an answer for it,

but I think that there are some things we might be able to do.

Okay, so that’s the review of the sort of pipeline. And now I’d like to look at, talk

about three specific projects at Oregon State. So, and these will be in data interpretation

and model fitting and in policy optimization. So, the first project is the dissertation

project of my student Ethan Dereszynski. And he’s going to be graduating soon, so he’s

looking for a job. And what he works on is automated data cleaning in sensor networks.

So, Oregon State University operates something called the H.J. Andrews Long Term Ecological

Site. So, NSF funds a collection of these study sites where there have been, they’re

committed to collecting data over long periods of time and doing long term experiments. So,

one of my colleagues Mark Harmon for instance has started an experiment that is going to

last two hundred years that’s called the Roth Experiment. It’s about trees and how

long it takes them to decay. But you know it takes forever to get tenure in this field.

Anyway, in this case we’re looking at these weather stations that are there. And I’m

going to talk mostly about four thermometers. So, this is a weather tower here and these

little L shaped things coming off are have a thermometer on them. And they’re allegedly

at one and a half, two and a half, three and a half, and four and a half meters above the

ground. And we get data from them that looks something like this. So, every fifteen minutes

we get a temperature reading. And you can see on these curves the up and down motion,

this is, those are the daily cycle. The (unintelligible) Cycle. So, it’s warming up in the daytime

and cooling off at night. And it’s kind of fun, because the thermometer that’s nearest

the ground, which is the black line in the, is the one that’s coldest at night and hottest

in the day. So, they, they flip back and forth like this. And the problem is that these sensors

are out in the world and bad things happen to them. And so someone has to do data quality

assurance on these, on the sensor data and clean it up before we try to do any analysis

on it. Now traditionally in the (unintelligible) Forest, we’ve got three of these towers

and then there are many more. But there are three main ones that have been in operation

since the 80’s. And with twelve thermometers it’s not really much of a burden for someone

to go check this data. They just eyeball it and cluster it in various ways and look for

outliers. But of course we now want to, now we’ve now got Wi-Fi over the entire forest

and we want to put out huge networks of things. And if we have a thousand thermometers this

human data cleaning becomes infeasible, unless we can figure out how to make a Capua out

of it and maybe get people to do it.

So, the kinds of things that go wrong, like for instance here this is an instance of what’s

called a broken sun shield. And so, the air temperature sensor is now measuring actually

the surface skin temperature of the thermometer with the sun directly beating down on it.

And so you can see in the daytime it spikes way high, as many as ten degrees higher than

the true air temperature. At night it’s a perfectly good air temperature sensor, but

in the daytime, particularly sunny days, not so good.

Can anyone guess what’s going on in the bottom case here? We have this our 1.5 meter

sensor is flat lining for a while.

Yes. So, the problem here is this is week three. So, that means it’s right about now.

But it was in 1996. We had a big snowstorm. And so this is now a snow temperature sensor,

instead of an air temperature sensor. In some sense the thermometer is still functioning

correctly, it’s just that the metadata is wrong. But there’s a lot more going on here.

So, you notice that the 4.5 meter thermometer is still bouncing up and down rather nicely.

I mean obviously over here It’s quite cold these days, even the nights, even in the daytime

it’s just barely getting above freezing. But then what’s happening over here. We’re

getting, it really warmed up. I mean it’s almost in incident fifties at the top of the

thermometer tower. And at right around 3500 here it’s starting to rain. And so the snow

temperature goes moves up to sort of the triple point of water for a while. And now the snow

is melting and we’re having…and the university is closed right around 4500 because we had

such a huge flood that you couldn’t get to campus. So, this is, this is how you get

a big flood in Oregon is to have what’s called a rain on snow event. And this was,

this was one of them.

So, we like to detect these things also you know because they’re interesting, but we

don’t want to assume that this thermometer is measuring air temperature during this entire

period.

So, how can we do this? Well, we’d like a data cleaning system to do really two functions.

The first is we’d like it to mark every data value that we think is anomalous. And

so in this case this is a different set of data. But we’ve put the little red what

they call the rug, right a little red tick underneath each data point that our model

predicts is incorrect as something wrong.

And then the other thing you’d like it to do is to impute, or predict, fill in the missing

values, what the, what the thermometer should have been reading if it had been working correctly.

And we’re going to do this, we’re going to do both of these things using probabilistic

model.

So, the basic probabilistic model we’re going to use though these are you know Bayesian

Networks or probabilistic graphical models is the following. We’re going to have one

node here for each of our variables of interest and the one that is gray, that’s an observe

node. So, this is the observe temperature at time t. And then there is a hidden node,

which is our true temperature that we wish we could observe directly. Then up here is

our sensor state variable. And I’ve made it a box to indicate that it’s a discrete,

whereas these are continuous variables. And the idea is a very simple sensor model that

says when the sensor state is one, that is normal or working, then the observe temperature

has a (unintelligible) distribution who’s mean is the true temperature x but with some

small variance around that. But when the thermometer is broken and so the state is zero then the

observe temperature has a mean of zero and a gigantic variance. So, basically what we’re

saying is completely unrelated to the true temperature. So, this is a very simple model,

and why do we adopt this kind of model? Well you could try to think about this kind of

data as if it were a diagnosis problem that the sensor has various fault modes and failure

modes and you want to predict what they are. And so you could do a kind of Bayesian diagnosis

where you could say well given the sensor readings and my expectations it looks like

it’s a broken sunshield or it looks like it’s a flat line because of a communications

failure or something like this. But the trouble is we were not confident that we could enumerate

an advance all the ways a sensor could fail. We wanted to have an open ended set. So, the

idea here is to treat it more as an anomaly detection problem where we model the normal

behavior of the sensor as accurately as we can. And then anything that is a serious departure

from normal, this model, the normal model will the first line will give it very low

likelihood and it’ll instead get picked up by this sort of very generic failure model.

So, that’s the idea here.

So, we can do anomaly detection then by doing probabilistic inference. We ask the query

you know what is the most likely value of the state of this sensor at time t. And that’s

just the argmax over the possible states of the probability of the states given the observation.

And we can also do imputation by asking instead what’s the most likely temperature given

the observed temperature. So, basic probabilistic inference techniques work just fine. But of

course this is a very bad model of the sensor here. So, the next thing we want to do is

add some sort of Markov Model so that we can look at the history of the sensor. Because

we’d like to say well sensors, if it was working fifteen minutes ago, it’s probably

still working now. And if it was broken fifteen minutes ago, it’s very likely it’s still

broken now. So, we’d like to do that. And similarly of course the actual real temperature

doesn’t change that drastically either. So, we’d like to have some model of the

true temperature changes over time. So this gives us now a Markov version of this. And

now we can ask a query like what’s the most likely state of this sensor this time given

the entire observation history. And that also can be reasonably calculated. But we can go

even further than this if we have multiple sensors as we do on these towers. We could

build a separate copy of the model for each of them and then couple those somehow. So

we could say that you know if we know the temperature of the sensor at the bottom of

the tower then we should be able to predict with reasonable accuracy the sensor next up

on the tower. And so this is the kind of thing we do. In general we learn a sparse joint

grousing distribution among all of the t variables. And then so that we have a connected model.

Unfortunately probabilistic inference in these models starts to become intractable. So, even

in the single sensor model, which it with the Markovian independence, you would think

that that would not be a problem. But it is because of our observed variable. That would

be true. If all the variables were discrete then we could solve that very easily. A simple

message passing algorithm will do it. But because our variables are continuous so there

are conditional grousing’s when you marginalize away the history it gives you a mixture of

grousing’s that grows exponentially with the number of time steps. And so, so it becomes

impractical to do you know more than just a few time steps before that. That won’t

work. So, what we do is basically a forward filtering process where we at each time step

we ask what’s the most likely state of my sensor. And then we say okay we’ll believe

it. We’ll adopt that stage and treat it as evidence and then at time two we’ll ask

okay what’s the most likely state at time two given that I already committed to the

state of time one. And we do this. And so now, and we also have to bound the variance

on the true temperature. Just because if you have a whole a long string of sequences where

the sensor is bad the true temperature becomes extremely uncertain. And you can’t let that

grow too far.

Probabilistic inferences also infeasible in the Multiple Sensor Model, even if you follow

this step by step commitment strategy. And so the solution we’re using right now which

seems to work best is something we’re calling Search MAP, which at each time step you start

by assuming that all of the sensors are working. And you score how well that accounts for the

observations. And then you ask can I improve that score by breaking one of the sensors.

And you do this in a greedy algorithm basically hill climbing to try to find a map solution.

You don’t always find the true maximum, because there are local (unintelligible).

But even the simple greedy algorithm is takes a polynomial time that’s quite substantial

in the number of sensors.

Yeah?

(Unintelligible) working even if in the previous times commitment you decided one of em was

broken.

That’s what we’re doing right now. But we could start with yeah with our map guess

from the previous time step too. And you can also consider a variation where having broken

one sensor you might reconsider your previous decision in which case you can do I don’t

know sometimes called floating backward you know floating greedy algorithm, which takes

even longer but gives you better solutions. And we’ve tried a whole bunch of other things

you know, various kinds of expectation propagation and the whole bag of tricks in the machine

learning probabilistic modeling area, but…actually one thing we haven’t tried yet is particle

filters. He’s working on that right now. (Unintelligible). Rob (unintelligible).

Well here are single sensor results. So, on the broken sunshield you can see that it,

the bottom curve is the data again, the bottom plot is the data the top plot is the predicted

temperature of just the thermometer of the one, the one that’s closest to the ground.

And then along our periphery curve, we color code it with, our domain experts wanted us

to not just have broken or working but to actually just have four levels of performance

from very good, good, bad, and very bad. So, very bad would be black and there are just

a couple of spots at the peaks of these days when there are some black spots there. But

otherwise it’s mostly marked things as red for bad. And at night, of course it’s still

very; it’s a very good sensor. So, we’re able to do using just a single sensor model,

and there’s a lot more in the single sensor model we build a baseline expectation based

on previous years so that we certainly know what week six looks like in general.

And then for the Multi Sensor Case, Ethan did an internship at EPFL in Switzerland and

there they put out these short term deployments of sensor networks and he learns conditional,

well in this case yeah conditional grousing basing network over the true temperatures

and then fix that combined model. And so these are the results. And you can see it’s doing

quite well in some cases. It’s picking out a lot of these things where we have like a

extremely bad spiky sensors. But in these long flat lines it’s doing okay, except

sometimes when the dash line here is the imputed valued, when the predicted value happens to

coincide with the flat line it said oh the sensor’s working again. So, this is a case

where we probably really should have a flat line model, because these flat lines happen

when the data link is lost and so.

Okay. And there are many other challenges. I mean we’re working to single time step,

but of course it really should be multiple scales. And we’re also working on integrating

more heterogeneous sensors than just temperature.

Okay. Well, so that’s an example of this automated data cleaning work. The next problem

is model fitting with an explicit detection model. And this is worked by a post-doc of

mine, Rebecca Hutchinson, who’s wrapping up her post-doc later this spring.

And I already talked about species distribution modeling. Often, particularly with birds and

wildlife in general, when you go out an do a wildlife survey the species could be there,

but you just fail to detect it. And this is a well-known problem in ecology. So, imagine

that there’s some landscape and we’ve chosen some set of these black squares that

we’re going to go survey, but when we go out there it turns out some of the birds are

in the vegetation and we don’t see them. So, although there were every one of those

squares was occupied by our species we only see it twice. What can we do about that? Well,

one solution is to make repeated visits that are close enough together in time that you

think the birds have not moved around. Like during, when they’re sitting on their nests

or something. But far enough in time that you think you’re getting independent measurements

of this, of their hiding behavior. So, if we go back another day maybe you know now

we see the bird from the first cell, but the bird in the second cell is hiding. The third

one we still think is unoccupied, because that bird was hiding the whole time and so

on. So, this is one strategy that you can use. And if you look at the kind of data you

get, you get what are called detection histories. So, suppose we have four different sites.

Three of them are in forests, and one is in grassland. And suppose that there is this

true occupancy, which we say is a latent or hidden variable, right. And the first three

sites are occupied and the fourth one is unoccupied. But we don’t know that. That’s hidden

from us. So, on the first day we go out and it turns out it’s a rainy day and we’re

going out at lunch time, and we don’t see any birds. So, we have all zero’s here.

Now another day, we go out early in the morning. It’s a very good time to go birding and

it’s a clear day. And we detect the birds in the first two sites, but we don’t detect

this guy here in site three, and of course we don’t detect anything at site four. So,

we’re going to assume no false detections here, no hallucinations. Although, that’s

not always a safe assumption. And then the third day, it’s a clear day, but we’re

a little late getting out. So, we only see the, we only detect the bird in the first

site. So, these, a thing like 0, 1, 1 or 0, 1, 0 is called the detection history. And

from the detection histories you can estimate if you assume there’s there are independent

trials of your detecting ability. You can get a naïve estimate of you detection probability.

So, in this case we know from our data that sites A and B are occupied by the species.

And we know we had six opportunities to detect the birds, three at each site. We did, we

succeeded three times. So, our naïve estimate of our detection probability would be point

five. But in fact we really had nine chances to observe this species, which we only saw

it three times. So, our true detection probability, at least a maximum likelihood, the estimate

thereof would be point three, or one third.

So, the big challenge is how can we tell the difference between an all zero’s history

that is due to our (unintelligible) to detect versus an all zero’s history that’s due

to the fact that the site is unoccupied. And the answer of course is to build a probabilistic

model. And so this is a plate style model. And for those of you, who aren’t familiar

with the notation, think of these dash boxes as being four loops. So, we have a loop where

we iterate over the site. So, i index is a site. And x (unintelligible) is some set of

features that describes the site. Like it’s a forest and it’s at three hundred meters

of elevation. And at each site then based on its features or its properties there’s

going to be some occupancy probability (unintelligible). And we’re going to assume that birds toss

a coin with the probability of heads (unintelligible) to decide whether to occupy a site. And z

(unintelligible) is their true occupancy status of that site, either a zero or one. Now the

variable t is going to index over our visits to that site when we go observing. So, if

wit is some description of say it was 6 a.m. and it was sunny that are, that might influence

or account for our detection probability. And so then yit is the actual report, the

data that we get. So, we actually observe x, w, and y, when we really want z. So, we’d

like to extract out of this zi, which is the species distribution model, the probability

of the site being occupied given the properties of that site. And we’ll call a function;

I’m going to name the probability of that function f. So, f of xi is going to be the

occupancy probability. And we’d love to plot that on a map. But then we have this

nuisance model, which is our observation model and we’ll let dit be the value of this function

g that is our detection probability. And so we can say our probability of reporting a

1 at, that we saw the bird is the product of z, which will be 1 if the bird is there,

and dit which is the probability with detectors. So, that’s the model. And this was developed

by a group McKenzie Adolf from the USGS, but is a very nice and well established model.

But I’m a machine learning person. And you know in machine learning there is sort of

two parallel communities. There’s the community that loves probabilistic models and there’s

the community that loves non-parametric kind of decision models like support vector machines

and decision trees. And these two communities, well they’re people like me that have one

foot in both camps. But they really have very different outlooks.

Why do we like probabilistic graphical models? Well, it’s a terrific language for expressing

our models. And we have wonderful machinery using probabilistic inference for reasoning

about them. So, we know what the semantics of the models are at least what they’re

intended to be. And we can also write down models that have hidden variables, latent

variables that describe some hidden process that we’re trying to make inferences about.

So, probabilistic graphical models are kind of like the declarative representation of

machine learning. But there are some disadvantages, particularly when you’re exploring in a

new domain and you don’t understand the system well. Because you as the designer have

to choose the parametric form of each of the probability distributions in the mode and

you need to decide if you think there are interactions among the variables and you need

to include those interactions in the model. The data typically have to be pretreated to

be scaled and so on if you assumed linearity in your model you may need to transform you

data so that the model, it will have a linear relationship. And one of the most important

things we’ve learned in machine learning is the importance of adapting the complexity

of your model to the complexity of the data. And it’s difficult to adapt the complexity

of a parametric model. I mean there’s some things you can do with regulization, but it’s

not as flexible as using the sort of flexible machine learning models. So, you know back

at that very first machine learning workshop from which that book came out Ross Quinlan

gave a talk about a classification tree method that he was developing. And it was about a

couple of years later that Leo Bryman and Company published the book on CART.

So, classification and regression trees are a very powerful kind of exploratory non-parametric

method. And one of the beauties is that you can just use them off the shelf. Right? You

don’t have to design your model. You don’t have to pre-process or transform your data.

If they automatically discover interactions if they’re there, and sometimes even if

they’re not there. And they can achieve higher accuracy if you use em in ensembles.

So, boosting and bagging and random force type techniques.

And then of course since then support vector machine kind of revolution has swept through

machine learning. And these still require the same data preprocessing and transformation

steps, but by using kernels you can introduce the non-linarites in an extremely flexible

way. And there are very powerful ways of tuning the model complexity to match the complexity

of the problem. So, they work remarkably well also without a lot of carful design work.

So, a challenge is can we have our cake and eat it too? Can we write down probabilistic

graphical models with latent variables in them that describe processes we care about

and yet also have the benefits of these non-parametric methods? And this is a major open problem

in machine learning. And there are several efforts. There’s been a lot of work recently

in the SBM family. There’s Basing non-parametrics that use mixture models. The approach we’re

exploring is boosted regression tree.

So, I don’t really have a lot of time to describe booster regression trees. But they

grew out of boosting work in machine learning. And then first Mason and then Friedman Jerry

Friedman and Sanford noticed that there, that these could really be viewed as part of a

generic algorithm schema where you’re going to fit a weighted sum of regression trees

to data. And so he develop this thing called boosted tree regression or tree boosting.

So, the standard approach in these occupancy models is to represent these functions f and

g as log linear models or linear, logistic regressions. What we’re going to do is replace

those functions f and g with non-parametric flexible models, boosted regression trees.

And this can be done using this algorithm schema called functional gradient descent

or you could do functional EM actually also. And we had a paper at Triple AI last summer

that describes the method. So, I’ll just give you a little flavor for the results.

Of course there are methodological problems for studying latent variable models. And that

is that you don’t know the true values of those variables. They’re hidden from you.

So, how do you know whether you’re doing well? And so I’m going to describe results

for one synthetic bird species where we simulate a species using real data but faked occupancy

and faked things. So, we made this model additive, but non-linear. And this is a scatter plot

showing on the horizontal axis the true occupancy probabilities for this simulated species.

And on the vertical axis what different families of models predict. So, the left column is

models that are trained without latent variables treating it as a supervised learning problem.

And you can see that they systematically underestimate the true occupancy probabilities because they

assume the only positive examples they saw were the cases when you actually detected

the bird, which is obviously an underestimate of what’s really going on. In the right

hand column are ones that are using this latent variable model, the Occupancy Detection Model,

the OD Model. And then the top row are where we’re using logistic aggression as our peramitization.

And you can see that on the top right, it’s more or less unbiased. So, the true probabilities

and the predicted ones more or less lie on that diagonal line which is where they should

be. But there’s a lot of scatter and that’s because the true model is non-linear and we’re

fitting a linear model. Whereas if we use the booster regression trees on the bottom

we’re doing a lot better. We’re much closer to the line. I’d like to omit a couple of

the points that are far from the line. But otherwise we’re pretty happy with that fit.

And so, in general this is what we find is that we can train these flexible booster regression

tree models within a graphical model’s framework and get more accurate results. And so we’ve

been applying this to several bird species data.

So, looks like I’m running tight on time here. So, let me briefly just describe the

final problem which is managing fire in Eastern Oregon. Conveniently this is the problem where

we don’t have any results yet. So, I shouldn’t have said anything, you wouldn’t notice.

But, so this is now a policy problem, not really a data problem.

So, you know since the late 1910’s, 1920’s the U.S. Forest Service had a policy of suppressing

all fires essentially. It was part of the kind of political argument that was used to

sell the creation of the Forest Service was that we will prevent these terrible catastrophic

wildfires. Of course it turns out you can’t prevent them. You can only postpone them.

And that’s now coming to pass that our forests are filled with, we believe that the sort

of natural state of forests particularly in eastern Oregon, we should look something like

this where we have very large Ponderosa Pines, and then what’s called an open understory,

so just very small vegetation on the ground.

I don’t have a picture for it, but what we have right now is because fire has been

suppressed for a long time we have all kinds of vegetation on the forest floor. And we

have small trees of all different sizes, logical pines in particular that’s grown up among

these Ponderosa Pines. And so when you have an open ground like that and a fire happens

it burns through the ground and actually maintains that openness. But the Ponderosa Pines have

this big, thick fire resistant bark. And they’re actually happy with this fire coming through

and getting rid of some of their competitors. But what’s happened, since that hasn’t

happened now when a fire happens it is able to climb up the smaller vegetation, reach

the crown, and actually destroy the forest, kill all the trees. And you end up with the

really very intense catastrophic fires. And so one question is, is there anything we can

do to manage this landscape. And so we have a steady area in eastern Oregon, that’s

divided up into about 4,000 cells. They’re irregular shaped. They’re based on homogeneity

of the landscape there. And there are four things you can do to each of these cells each

year. You can do nothing. You can do what’s called mechanical fuel treatment. So, you

send people in and they cut down a lot of that small vegetation and card it out. You

can do clear cutting where you harvest the trees, but you leave behind a lot of debris,

and that actually while it gives you timber value it actually increases fire risk. Or

you can do clear cutting and fuel treatment and then fire just can’t burn at all in

that area at least for a few years.

So, the question is how should we position these treatments in the landscape if we want

to say minimize the risk of big catastrophic fires and maybe maximize the probability of

these low intensity ground fires. Well we can think about this as kind of a game against

nature. In each time step we can observe the current state of the landscape. Maybe this

is like a fire risk map. And then we choose an action. We have to choose an action, which

is actually a vector of actions. One action in each cell. And then nature takes, so these

are the actions maybe we choose to treat these particular cells. And then nature has its

turn and it lights fires and burns them. And then it’s our turn again.

And so we can model this as a big markup decision process. But unfortunately it’s a markup

decision process with an exponentially large state space. So, if each of these cells in

my landscape has five tree ages and five fuel levels then I have twenty-five to the four

thousandth power of possible states of the landscape, which is not going to fit into

memory very easily. And similarly, each time I take an action, I have an action vector

that has got four thousand elements and each with four possibilities in each position.

So, I have four of the four thousandth possible actions to consider. Even with all the cleverness

of the reinforcement learning community and approximate dynamic programming we don’t

know how to solve these problems.

There’s been a little bit of work. There was a paper by Wei, et al. a couple of years

ago when they looked at just a one year planning problem. So, if I just had one year to make

treatments and then there’s going to be fires in a hundred years, where should I put

my treatments? And they were able to formulate and solve a mixed integer program for this

optimal one-shot solution. They were just completely trying to prevent fire, which is

really not the right problem. But any case, we’re trying now to see whether we can build

on that work or come up with some method where we can solve this MDP over a hundred year

horizon.

Okay. So, in summary I’ve talked about this pipeline for the ways computation could help

in addressing problems in ecology and ecosystem management. I’ve talked about automated

data cleaning, about fitting these flexible models within a latent variable modeling framework.

And then very briefly about policy optimizations. And as I mentioned this is part of our larger

effort in what we call computational sustainability. And there are many other opportunities to

contribute to. You know I haven’t talked about energy. I haven’t talked about sustainable

development or smart cities or any of these things. But there are lots of computational

problems there as well.

I’d like to point out that the Computing Community Consortium I think the CCC is funding

some travel grants and prizes for papers in this area at several AI Conferences. I know

about the ICML and Triple AI, but I think there are some other conferences where they’re

doing this this year. So, there’s a special track for that that you could submit to. And

my joint grant with Cornell, we have created something called the Institute for Computational

Sustainability. And we have a website with all kinds of information about what’s going

on, not just in our own research, but throughout the computer science community.

And I’ll just thank the people that I mentioned at the start of the project. On the fire project

there are two other graduate students Rachel Houtman and Sean McGregor who have been working

there and of course the National Science Foundation that has been very generous here.

Well, thank you for your attention and I’ll answer questions. So, how does this work local

versus remote? So, what we usually do is give the remote sites a chance to go first, because

they might lose the connection later on.

Okay. Remote sites? Go ahead.

(Question being asked)

Okay. Yeah. So, what they do is they run several thousand fires, simulated fires. And try to

calculate for each cell in their landscape the probability that it will burn. And they

decompose that into the probability that it will burn because the fire ignited in that

cell. Or the probability that it will burn because fire propagated from one of its neighbors.

So, they can basically build a sort of probabilistic flow model that says the probability that

this cell will burn conditioned on whether its neighbors burned. And then they can model

a fuel treatment, which they model simply as if I treat this cell then no fire will

be able to propagate through that cell. Okay, And so with a couple of other approximations

they can turn this into a flow problem basically that we want to prevent flow from sort of

the total flow we want to minimize subject to some budget constraint about how many cells

we can afford to treat. And so that, they basically then have one integer variable for

each cell, and they have an objective and then they can solve it. I mean in our case

there would be four thousand integer variables, which would be a little bit scary. Their problem

I think had more like nine hundred cells though. So, it’s still quite substantial. But you

know sea plex is a wonderful thing. And so it was able to find the solution to that.

(Question being asked)

Uh huh. Okay. Right. Well this was one chat as opposed to sequential decision making.

So here we just get one, we just get one time step at which we’re allowed to take actions.

And then from there on out nature just gets all the moves in the game. So, that’s the

sense in which it’s a single decision, single one-shot plan, totally upfront planning in

other words. And there are a lot of problems in ecology where we end up having to take

that view that we’re just going to say we want to buy all the following territory. So,

we’ve looked at some, there’s an endangered species called the Red Cockaded Woodpecker

that I believe is here in North Carolina. And my post doc Dan Sheldon did some very

nice work where the question was there are two pockets of this species, one I think at

Camp Legune and the other in the Palmetto Palm Reserve or something like this. And the

question was could they buy a series of intermediate sites to encourage those two species to mix

and have some genetic flow between them. So, it’s a problem of basically trying to encourage

flow instead of trying to prevent flow. And they were able to also formulate this and

solve it for the one-shot case in terms of building a network that would maximize flow

subject to budget constraints.

But the real problem, you can’t buy all the property all at once. You don’t have

the money and it isn’t all available. So, you really need to have be online and every

year take some actions that you can afford to take to keep moving toward that objective.

So, turning that into a Markov decision problem or what’s often called active management

in the you know environmental literature that’s still an open problem. We don’t know how

to do that.

(Question being asked)

That, yeah that is a good question. And we do wonder is there some way we could come

up with some set of sort of spatial basis function that would let us for instance represent…suppose

that we had an optimal policy for laying out treatments in landscape could we, but we could

only compute it for a particular fixed landscape, could we somehow generalize from that to a

more general policy and maybe some kind of set of spatial basis functions would allow

us to do that. And the same is true for looking at yeah the sort of structure of the landscape.

There’s certainly a lot of work done in atmospheric sciences and weather where they

basically use PCA to create a set of basis functions that they can use then to approximate

a lot of things. So, it’s something we’d like to explore more.

(Question being asked)

I’m sorry. Right. Well particularly here we’re intervening in the system, and so

yeah the trouble is that we, you have this research base where if I take these actions

then these fires will burn. If I take these actions something else will happen. And you

end up having to do exponentially many simulations just to simulate one set of circuitry. And

so obviously we have to rely on some kind of sampling or some kind of way of capturing

the spatial scale where we beyond which we can ignore the spatial components. It’s

not clear really how to proceed.

(Question being asked)

Well that is a very good question. Right now we’ve mostly been looking at just this one

site. And we have the weather data and all the data about the sites, which we need to

be able to do the work. And it’s a good question whether they are generalizable lessons

that you could take away from this. One, I also have some projects in evasive species

management and we’re asking the same question there. And often it’s kind of disturbing.

I mean you get a solution like this big map here wherever it was that says well these

are the places that are the optimal places for me, but how do you, is there any pattern

to that? Is there any way that we could explain that as a sort of a set of rules that we could

apply to a different situation? How could we generalize from this particular landscape?

And we need, we need to do that just to explain it to our domain experts. And obviously policy

makers are not going to be happy just being told well it’s optimal. Our algorithm said

so. Particularly because we won’t be able to say that. We’ll have to say it’s approximately

optimal, but we don’t know how bad or something like that. And so we’re really going to

need to be able to give them something qualitative understanding and let them be able to play

with it, and modify it, and explore, and understand you know how good it is. And that’s a huge

challenge to just explain you know once you’ve done ten million simulations what lesson can

you take away from it. Okay.

So, I’ve got a question that maybe ducktails with that.

Uh huh.

So, to what extent do you feel like these techniques and the recommendations or policies

that you’re producing using these techniques are getting traction with the people who are

actually implementing policy decisions and you know is it something where you feel like

you’re having impact now, you feel like maybe it’ll be five years, ten years, how,

you know what time scale are we talking about here?

Um I would guess in five to ten years. I mean we’re very fortunate with the forest situation

that we have some of the Forest Service people on our team. And a lot of them are former

students of Claire Montgomery who was on the team. And so we have a nice working relationship

with them. And, but the question is, that is a good question whether they would ever

be able to execute our particular policies. I think one of the main things we’re trying

to do is give them backup ammunition for being able to support the actions that they are

taking. Right now the idea that they might want to treat the landscape in a particular

way or in a related problem they might want to let a fire burn instead of suppressing

it, that’s an extremely controversial politically difficult decision. If we could provide some

analysis that shows that yes, under a wide variety of scenarios that would be a, it would

be better to let this fire burn, or it’s better to treat this than those other things.

And that might help them persuade their stakeholders to go along with it. Of course another thing

that would help them persuade their stakeholders is if we could say well and for these small

communities that have timber mills we can also guarantee you a certain economic benefit

from doing this. And so there’s a whole set of economic objectives perhaps that we

would like to have. We don’t maybe also like to have a whole bunch of endangered species

habitat objectives. So, the real problem you know gets messier and messier. But we won’t

be able to attack any of those unless we really can come up with a methodology that works

for these problems.

What you just laid out is a hard scenario for any algorithm to work you know, a procedure

to optimize. But as it is it has to be optimized by humans. I mean in other worlds there are

people actually making decisions about whether or not to let a fire burn. And they have to

process all of it. So…

Right.

I mean.

Well mostly they are not letting fires burn, because it’s just too risky and plus the

firefighting money doesn’t come out of their budget. It’s somebody else’s budget. So,

there’s not really an incentive for them. For the fuel treatment though you’re right.

Right now they are making some guesses about where to treat, trying to balance all of these

issues, and I would say they’re not very happy with that. They would like some more

rational way, basis for making those decisions.

Yeah. I guess my point was you may not have to get optimal. You may just have to do better

than humans guessing.

Right. Well, but we have to convince them that we are doing better yeah. And that comes

into a lot of this broader contextual thing as well.

Yes.

You sort of apply the basic approach. Could you just take the particular plans or policies

that they are using or thinking of using as a prior and then go from there and simplify

you’re model because you’re, you’re working from a targeted assumption…

Uh. Huh.

base.

Oh that’s an interesting idea, yeah, would be to see if we could in some sense model

what they’re doing and then ask locally how could we improve it, without maybe without

walking too far away from it so it doesn’t look so strange or threatening. No we hadn’t

thought about that, but that’s an interesting idea. Okay.

Thanks.

Well thank you very much. My pleasure.

some time. He…which I brought a prop. He actually was an author on my first machine

learning textbook. And I got to meet him when I was a graduate student at Berkley. And I

guess this was not long after Tom started his first faculty position after getting his

Ph.D. at Stanford. He came down to visit Berkley and I enjoyed getting to meet him then. And

even then as you see with this textbook he was playing a very important role in shaping

the machine learning community. And he’s gone on since then to continue to play a role

in shaping the machine learning community. So he is a Triple AI fellow and ACM fellow.

He’s been program chair for for Triple AI, program chair for NIPS. He’s been very active

in the International Machine Learning Society, and really a mentor in the field to a lot

of young people. And Tom is one of the few people to this day who really sees the entire

field of machine learning. And as the fields have become increasingly specialized, it’s

rare to find people who can appreciate the whole field and take it all in. And that’s

one, the great, many great things that Tom is known for. And today he’s going to be

telling us about a very important application of machine learning, which is to computational

ecology and environmental management.

Thank you very much Ron. So, the work I’m going to be describing today is obviously

a very collaborative interdisciplinary, and the collaborators in particular that I want

to mention is my graduate student Ethan Derigensky, two post docs Rebecca Hutchinson and Dan Shelton,

and then colleagues Wanking Wong who’s in Computer Science, a machine learning person

Clair Montgomery who’s a forest ecologist. And then several folks at the Cornell Lab

of Ornithology.

So, if we look at the earth’s ecosystems or the biosphere, it’s a very complex system.

And I think we can agree that in many ways we have not managed it in a sustainable way.

And so I thought I would start the talk by asking about why is that so, and is there

anything that computer science can do to help? And I think, I mean everybody had their own

views of why this is so. But I think maybe there are three reasons. First of all we don’t

understand the system very well. So, it’s very hard to manage a system when it’s behaving

very unpredictably. And there was a very thought provoking article by a group of authors, first

author was Doak in 2008 where they talk about; they ask the question are ecological surprises

inevitable. Or is the dynamics of ecosystems so complex that we will never really be able

to predict the behavior of the systems reliably. And to sort of support this well, thesis they

go through, I don’t know, fifteen or twenty different examples of situations where either

something completely surprising happened like the population of a species in the Gulf of

Alaska suddenly exploded and then five years later disappeared again, with no one knowing

why. Or examples where we attempted a an intervention in an ecosystem and then it behaved in a,

the outcome was very different from what we had intended. And one example that is very

current right now in the Pacific Northwest is the Northern Spotted Owl. So, during the

late 80’s and 1990’s we had what we call the owl wars in Oregon, because there’s

this species that was listed as an endangered species, the Northern Spotted Owl, and its

preferred habitat was, is old growth forests. And these, most of the old growth forests

on private land had already been cut, and so now there was a lot of logging in the national

forests in the public lands and the conservation community wanted to shut down all that logging.

And obviously the Forest Products Industry which was a very important part of the Oregon

economy was dependent on it to a large extent. And it took you know the President had to

come to the state and bring everybody together. And they came up with this zone called the

Northwest Forest Plan, which by and large did stop logging forests and federal lands,

which had a devastating impact on the economy. And the hope was that this would help the

spotted owl recover. But spotted owl numbers have continued to decline since then. And

partly that’s because there was another species that has come in from the North. The

Canadian Invader, which is known as the Barred Owl. And it turns out it is more reproductively

successful and more aggressive. And it seems to be pushing out the spotted owl. So, that’s

the kind, that’s another example of an ecological surprise, and it’s one of the reasons managing

the ecosystems is so difficult.

I think another reason that we’ve had trouble managing ecosystems is that we’ve often

focused on only a small part of a very large system, because the system is so complicated,

and we’ve focused on only one piece of it. So, that could be a single species like the

Northern Spotted Owl might be an example of that. And we’ve often also ignored some

of the larger contexts. There’s a colleague of mine, Heidi Jo Albers who has studied things

like creating forest reserves in tropical forests. And often these forests, when you

design these reserves, you need to consider what the native people might be using that

forest for. If you don’t take that into account, in her case that meant creating large

buffer zones around the actual forest, you end up with those people making encourageons

into your bio reserve and degrading it in one way or another.

So, having to consider the spatial aspects, the interactions among multiple species, these

are things that are often ignored in a lot of ecology and ecosystem management. And finally,

I think particularly if you look in agriculture, we often deliberately manipulate a system

to simplify it in order to try to manage it. So, in crop agriculture for example we try

to remove all of the other species so we only have to worry about one species. But as a

consequence we have to provide a lot of the support for that species that would normally

be provided by other species, like fertilizers and pest management and so on. We have to

provide those as exogenous inputs. And many of those like I say some of the nutrients

that we’re providing now are becoming expensive. And this is not a sustainable way of managing

those systems.

Well, and I’m sure you could go on and list many other things. What can Computer Science

have to offer? I mean the reason I’m here is because I think there are several things.

First of all if we look at the question of our lack of the knowledge of the function

and structure of the systems, we now have a couple of ways that we can contribute. First

of all you know, we and our colleagues in nanotechnology and electrical engineering,

we’re producing all kinds of novel sensors that we…so we have wireless sensor networks.

We can create thousands of sensors, put them into these systems, and be able to monitor

them much better.

And of course the machine learning community and computational statistics community have

been working on building the modeling technique that can scale up to much larger systems.

Although of course it’s still a challenge, but much more than say was possible twenty

years ago. When it comes to this question about focusing on subsystems, some of the

same story. Obviously with our modeling tools we can now look at the larger system in which

the smaller system is embedded. But I think we also now have tools in say mechanism design

to look at the interactions of different parties that might be competing for a resource or

tools in modern optimization that let us find good solutions to very large and complex optimization

problems.

And again when we come to agriculture it’s a different combination of these three things.

But better sensing, better modeling, and better optimization all have a role to play in allowing

us to model these systems and manage them better.

So, this general field that we’re calling computational sustainability, is one of the

big things in my group is we are joint with Karla Gomez at Cornell have one of the NSF

expeditions in Computer Science projects. So, a ten million dollar grant to try to boldly

go where no computer scientist has gone before, and in particular to look at computational

methods that can contribute to sustainable management of (unintelligible) systems. And

so as a machine learning person I tend to think about the computational challenges that

are here in terms of a pipeline, from data to models to policies. And so what I’m going

to do in this talk is first talk about what I see as some of the work that’s going on

in each of these areas outside of my group briefly, and then drill down on three specific

things that we’re doing in my group that contribute to this area. And so I’m hoping

you’ll get a sense of the range of challenge problems that are here and some of the opportunities

from a Computer Science perspective.

So, the first thing I want to talk about is sensor placements. And Andres Crowlza (sp?)

and his students have been doing some really exciting things there. So, this particular

example is a case where they’re (which I’m not supposed to point with this, I point with

this) where this is a city’s water network. And they want to know where should we place

sensors in this network in order to detect pollutants or maybe an attack, but a chemical

that’s introduced into the system. And their main tool that they use is something called

sub modularity, right, which is the idea that if, that if you have a function, it’s a

function of a set. In this case the set of places that you have put your sensor. And

it exhibits a diminishing returns property that you know once you’ve placed K sensors

the K plus first one is going to give you less benefit than the K one and so on. Then

you can, that’s, you can formulate, if your objective function is sub modular, then the

greedy algorithm in various sophisticated variants of it, give a performance that is

within a constance of optimal or proportional or fraction of optimal. So, you can get very

good results and in fact they won some competitions on for water, water quality monitoring. And

they’ve looked at many other problems as well. So, sensor placements and of course

this has a lot of relationship to the huge literature and experiment design and its existence.

The second thing that comes up is what I call data interpretation for lack of a better word,

which is often the raw data you get from your sensors, is not at the level you want for

your modeling effort. And this is particularly true for image data. So, for the last eight

years I’ve been running a project that we call the Bug ID Project, where we take photos

of moths and arthropods and soil and freshwater larvae. And we want to identify them to the

genous level and ideally to the species level. And this might be for instance input to building

a model of the distributional species in space to tracking bases species. Or even to water

quality monitoring, where you want this histogram by species of how many individuals you had

in a given stream. So, this particular picture here is from a collaborative of mine Qing

Yao, who’s looking at rice pests, and they put out these light traps at night. And moths

wonderfully trap themselves in these traps. And then they spread them out on a glass table,

photograph them from above and below and then they want to account to identify to species

level.

The third problem then I call data integration. I guess that’s an established term. The

problem is with a lot of ecological modeling challenges you have data coming from a wide

variety of sources, and a wide variety of scales in time and in space. And you need

to somehow pull all this together in order to then fit a model to the data. And so in

what we’re doing for instance on bird migration modeling, we’re dealing with data. Everything

from stuff that basically never changes like a digital elevation model of the terrain to

things that are maybe changing on a fifteen minute time scale, like the temperature or

the weather and having to integrate all of these things.

And then we come to the part that you know is really my core competence, which is model

fitting and machine learning. And so there are of course a wide range of models in ecology

that people would like to fit. We’ve been looking really at just three kinds of models.

The first are what are known as species distribution models. And the question there is can we create

a map of where a species is found in the landscape. And so that’s very close to sort of the

core machine learning supervised learning problem. You’re given a site where with

some set of features describing it and then either the species is present there or absent

there.

Another kind of model is something called a Meta-Population Model. And here we imagine

that we have a set of patches arranged in space. And a patch may be occupied by a species

or not. And over time the species may reproduce. It may spread to other patches; it may go

locally extinct and then get re-colonized. So, that’s sort of focusing on space and

looking at what comes in and out of a cell. And then the other (unintelligible) sort of

migration or disbursal models where you follow the organism instead. So, you want to model

the trajectory say that a bird follows or the timing of movement.

And so there’s work in machine learning on all of these. One I want to show is what’s

called a STEM Model that was developed by Daniel Fink at the Cornell Lab of Ornithology.

And so at the Lab of Ornithology they have a big project called Project E-bird, where

if you’re a birder you can go out observing in the morning say and then fill out a checklist

on their webpage and say here’s what I saw and I didn’t see anything else. You can

click a button for that and then upload it. There are a lot of avid birders out there.

So, we’re now getting like a million data points a month from people uploading. And

they exceeded three million points in May, sort of the peak of the breeding season. And

so there’s a lot of data. Unfortunately it’s completely uncontrolled. Right? So,

you have lots of variation expertise. You have no control over where people go. But

you can still do some interesting things. And what Daniel does is fit ensembles of decision

trees to try to predict whether this species, in this case the Indigo Bunting, will be present

or absent at a particular place and time. And so I’m going to show you this movie,

but it’s important to realize this is a series of snapshots. There’s no dynamical

model here. But this species winters down in Central America. And you’ll see the orange

colors. That’s the species is predicted to be present first along the Mississippi

Valley and then sort of spread out through the entire eastern U.S. And then as we move

into September, this is sort of a clock ticking along the bottom you see the species goes

back down to Mississippi and disappears from the U.S. And so this is a really I think a

very nice model. And it was used as part of a something called the State of the Birds

Report to try to estimate what fraction of habitat for each of the something like two

hundred species of birds is publicly owned versus privately owned. And this report came

out late last year.

So, once we have built a model like this then it’s time to say well it’s great that

we have this model of birds but what can we use, how can we use that to make policy decisions

to manage the ecosystem. And I don’t have a good example for management with birds but

with fish John Leathwick who does excellent work in New Zealand, so I don’t know if

you can tell, but see these gray things over there, these are the islands of New Zealand.

And these blue dots are where and red dots correspond to places where fishing trollers

found, did not find or found, the red ones are positive, did harvest a particular species

of fish. The Mora Moro. And the blue line around the outside is the exclusive economic

zone of New Zealand. And so using this data, he fit a species distribution model similar

to the one that I was just describing except that instead of estimating presence or absence

he’s estimating the catch in kilograms, so the biomass of the fish. And so these are

his estimates. The blue areas there are no fish at all and then you can see this pattern.

And then what he wanted to do was then use that to prioritize regions for their conservation

value in terms of supporting that, allowing this population to grow. And the left pot

is prioritizing them if we ignore the fishing industry and just say what would be the places

that would best encourage the growth of the species. But of course you really need to

consider these within an economic context. And so the right diagram re-prioritizes them

now taking into account the cost of the fishing industry. And you can see, I mean the main

lesson here I think is that there’s still a lot of places that we can conserve and yet

also still have the benefit of fishing.

So, but this is a kind of a spatial optimizational problem to solve. And I’ll be talking about

some more of those. So, finally that we have the problem of policy execution, this is usually

of course the chasm to go from a design policy to one that we can convenience people to actually

adopt. And you know at the simplest level we just have a policy where at each time step

we observe the state of the system. And then we choose the action that our policy tells

us to choose. And we go ahead and act. But in practice we’re often called upon to act

in a lot of ecosystem management problems, before we have a very good model of what’s

going on. And so they’re really what we would call a partially observable markup decision

process or worse, where we don’t have a complete understanding of the system we’re

trying to model. I think a challenge here is that, this means that our policy in our

early actions should be designed not only to achieve the ecosystem goal, but also to

help us gather more information about the system so that we can improve our model. So,

we have duel objectives. And these are very difficult to optimize.

And one of the big concerns I think in particularly in light of these ecological surprises is

can we design policies that are robust to our lack of knowledge. Both to the known unknowns

to think that where we know that we’re uncertain and we can model our uncertainty, and then

also to the known unknowns the factors that we forgot to include in the model. And I think

that’s one of the most interesting intellectual questions. I don’t have an answer for it,

but I think that there are some things we might be able to do.

Okay, so that’s the review of the sort of pipeline. And now I’d like to look at, talk

about three specific projects at Oregon State. So, and these will be in data interpretation

and model fitting and in policy optimization. So, the first project is the dissertation

project of my student Ethan Dereszynski. And he’s going to be graduating soon, so he’s

looking for a job. And what he works on is automated data cleaning in sensor networks.

So, Oregon State University operates something called the H.J. Andrews Long Term Ecological

Site. So, NSF funds a collection of these study sites where there have been, they’re

committed to collecting data over long periods of time and doing long term experiments. So,

one of my colleagues Mark Harmon for instance has started an experiment that is going to

last two hundred years that’s called the Roth Experiment. It’s about trees and how

long it takes them to decay. But you know it takes forever to get tenure in this field.

Anyway, in this case we’re looking at these weather stations that are there. And I’m

going to talk mostly about four thermometers. So, this is a weather tower here and these

little L shaped things coming off are have a thermometer on them. And they’re allegedly

at one and a half, two and a half, three and a half, and four and a half meters above the

ground. And we get data from them that looks something like this. So, every fifteen minutes

we get a temperature reading. And you can see on these curves the up and down motion,

this is, those are the daily cycle. The (unintelligible) Cycle. So, it’s warming up in the daytime

and cooling off at night. And it’s kind of fun, because the thermometer that’s nearest

the ground, which is the black line in the, is the one that’s coldest at night and hottest

in the day. So, they, they flip back and forth like this. And the problem is that these sensors

are out in the world and bad things happen to them. And so someone has to do data quality

assurance on these, on the sensor data and clean it up before we try to do any analysis

on it. Now traditionally in the (unintelligible) Forest, we’ve got three of these towers

and then there are many more. But there are three main ones that have been in operation

since the 80’s. And with twelve thermometers it’s not really much of a burden for someone

to go check this data. They just eyeball it and cluster it in various ways and look for

outliers. But of course we now want to, now we’ve now got Wi-Fi over the entire forest

and we want to put out huge networks of things. And if we have a thousand thermometers this

human data cleaning becomes infeasible, unless we can figure out how to make a Capua out

of it and maybe get people to do it.

So, the kinds of things that go wrong, like for instance here this is an instance of what’s

called a broken sun shield. And so, the air temperature sensor is now measuring actually

the surface skin temperature of the thermometer with the sun directly beating down on it.

And so you can see in the daytime it spikes way high, as many as ten degrees higher than

the true air temperature. At night it’s a perfectly good air temperature sensor, but

in the daytime, particularly sunny days, not so good.

Can anyone guess what’s going on in the bottom case here? We have this our 1.5 meter

sensor is flat lining for a while.

Yes. So, the problem here is this is week three. So, that means it’s right about now.

But it was in 1996. We had a big snowstorm. And so this is now a snow temperature sensor,

instead of an air temperature sensor. In some sense the thermometer is still functioning

correctly, it’s just that the metadata is wrong. But there’s a lot more going on here.

So, you notice that the 4.5 meter thermometer is still bouncing up and down rather nicely.

I mean obviously over here It’s quite cold these days, even the nights, even in the daytime

it’s just barely getting above freezing. But then what’s happening over here. We’re

getting, it really warmed up. I mean it’s almost in incident fifties at the top of the

thermometer tower. And at right around 3500 here it’s starting to rain. And so the snow

temperature goes moves up to sort of the triple point of water for a while. And now the snow

is melting and we’re having…and the university is closed right around 4500 because we had

such a huge flood that you couldn’t get to campus. So, this is, this is how you get

a big flood in Oregon is to have what’s called a rain on snow event. And this was,

this was one of them.

So, we like to detect these things also you know because they’re interesting, but we

don’t want to assume that this thermometer is measuring air temperature during this entire

period.

So, how can we do this? Well, we’d like a data cleaning system to do really two functions.

The first is we’d like it to mark every data value that we think is anomalous. And

so in this case this is a different set of data. But we’ve put the little red what

they call the rug, right a little red tick underneath each data point that our model

predicts is incorrect as something wrong.

And then the other thing you’d like it to do is to impute, or predict, fill in the missing

values, what the, what the thermometer should have been reading if it had been working correctly.

And we’re going to do this, we’re going to do both of these things using probabilistic

model.

So, the basic probabilistic model we’re going to use though these are you know Bayesian

Networks or probabilistic graphical models is the following. We’re going to have one

node here for each of our variables of interest and the one that is gray, that’s an observe

node. So, this is the observe temperature at time t. And then there is a hidden node,

which is our true temperature that we wish we could observe directly. Then up here is

our sensor state variable. And I’ve made it a box to indicate that it’s a discrete,

whereas these are continuous variables. And the idea is a very simple sensor model that

says when the sensor state is one, that is normal or working, then the observe temperature

has a (unintelligible) distribution who’s mean is the true temperature x but with some

small variance around that. But when the thermometer is broken and so the state is zero then the

observe temperature has a mean of zero and a gigantic variance. So, basically what we’re

saying is completely unrelated to the true temperature. So, this is a very simple model,

and why do we adopt this kind of model? Well you could try to think about this kind of

data as if it were a diagnosis problem that the sensor has various fault modes and failure

modes and you want to predict what they are. And so you could do a kind of Bayesian diagnosis

where you could say well given the sensor readings and my expectations it looks like

it’s a broken sunshield or it looks like it’s a flat line because of a communications

failure or something like this. But the trouble is we were not confident that we could enumerate

an advance all the ways a sensor could fail. We wanted to have an open ended set. So, the

idea here is to treat it more as an anomaly detection problem where we model the normal

behavior of the sensor as accurately as we can. And then anything that is a serious departure

from normal, this model, the normal model will the first line will give it very low

likelihood and it’ll instead get picked up by this sort of very generic failure model.

So, that’s the idea here.

So, we can do anomaly detection then by doing probabilistic inference. We ask the query

you know what is the most likely value of the state of this sensor at time t. And that’s

just the argmax over the possible states of the probability of the states given the observation.

And we can also do imputation by asking instead what’s the most likely temperature given

the observed temperature. So, basic probabilistic inference techniques work just fine. But of

course this is a very bad model of the sensor here. So, the next thing we want to do is

add some sort of Markov Model so that we can look at the history of the sensor. Because

we’d like to say well sensors, if it was working fifteen minutes ago, it’s probably

still working now. And if it was broken fifteen minutes ago, it’s very likely it’s still

broken now. So, we’d like to do that. And similarly of course the actual real temperature

doesn’t change that drastically either. So, we’d like to have some model of the

true temperature changes over time. So this gives us now a Markov version of this. And

now we can ask a query like what’s the most likely state of this sensor this time given

the entire observation history. And that also can be reasonably calculated. But we can go

even further than this if we have multiple sensors as we do on these towers. We could

build a separate copy of the model for each of them and then couple those somehow. So

we could say that you know if we know the temperature of the sensor at the bottom of

the tower then we should be able to predict with reasonable accuracy the sensor next up

on the tower. And so this is the kind of thing we do. In general we learn a sparse joint

grousing distribution among all of the t variables. And then so that we have a connected model.

Unfortunately probabilistic inference in these models starts to become intractable. So, even

in the single sensor model, which it with the Markovian independence, you would think

that that would not be a problem. But it is because of our observed variable. That would

be true. If all the variables were discrete then we could solve that very easily. A simple

message passing algorithm will do it. But because our variables are continuous so there

are conditional grousing’s when you marginalize away the history it gives you a mixture of

grousing’s that grows exponentially with the number of time steps. And so, so it becomes

impractical to do you know more than just a few time steps before that. That won’t

work. So, what we do is basically a forward filtering process where we at each time step

we ask what’s the most likely state of my sensor. And then we say okay we’ll believe

it. We’ll adopt that stage and treat it as evidence and then at time two we’ll ask

okay what’s the most likely state at time two given that I already committed to the

state of time one. And we do this. And so now, and we also have to bound the variance

on the true temperature. Just because if you have a whole a long string of sequences where

the sensor is bad the true temperature becomes extremely uncertain. And you can’t let that

grow too far.

Probabilistic inferences also infeasible in the Multiple Sensor Model, even if you follow

this step by step commitment strategy. And so the solution we’re using right now which

seems to work best is something we’re calling Search MAP, which at each time step you start

by assuming that all of the sensors are working. And you score how well that accounts for the

observations. And then you ask can I improve that score by breaking one of the sensors.

And you do this in a greedy algorithm basically hill climbing to try to find a map solution.

You don’t always find the true maximum, because there are local (unintelligible).

But even the simple greedy algorithm is takes a polynomial time that’s quite substantial

in the number of sensors.

Yeah?

(Unintelligible) working even if in the previous times commitment you decided one of em was

broken.

That’s what we’re doing right now. But we could start with yeah with our map guess

from the previous time step too. And you can also consider a variation where having broken

one sensor you might reconsider your previous decision in which case you can do I don’t

know sometimes called floating backward you know floating greedy algorithm, which takes

even longer but gives you better solutions. And we’ve tried a whole bunch of other things

you know, various kinds of expectation propagation and the whole bag of tricks in the machine

learning probabilistic modeling area, but…actually one thing we haven’t tried yet is particle

filters. He’s working on that right now. (Unintelligible). Rob (unintelligible).

Well here are single sensor results. So, on the broken sunshield you can see that it,

the bottom curve is the data again, the bottom plot is the data the top plot is the predicted

temperature of just the thermometer of the one, the one that’s closest to the ground.

And then along our periphery curve, we color code it with, our domain experts wanted us

to not just have broken or working but to actually just have four levels of performance

from very good, good, bad, and very bad. So, very bad would be black and there are just

a couple of spots at the peaks of these days when there are some black spots there. But

otherwise it’s mostly marked things as red for bad. And at night, of course it’s still

very; it’s a very good sensor. So, we’re able to do using just a single sensor model,

and there’s a lot more in the single sensor model we build a baseline expectation based

on previous years so that we certainly know what week six looks like in general.

And then for the Multi Sensor Case, Ethan did an internship at EPFL in Switzerland and

there they put out these short term deployments of sensor networks and he learns conditional,

well in this case yeah conditional grousing basing network over the true temperatures

and then fix that combined model. And so these are the results. And you can see it’s doing

quite well in some cases. It’s picking out a lot of these things where we have like a

extremely bad spiky sensors. But in these long flat lines it’s doing okay, except

sometimes when the dash line here is the imputed valued, when the predicted value happens to

coincide with the flat line it said oh the sensor’s working again. So, this is a case

where we probably really should have a flat line model, because these flat lines happen

when the data link is lost and so.

Okay. And there are many other challenges. I mean we’re working to single time step,

but of course it really should be multiple scales. And we’re also working on integrating

more heterogeneous sensors than just temperature.

Okay. Well, so that’s an example of this automated data cleaning work. The next problem

is model fitting with an explicit detection model. And this is worked by a post-doc of

mine, Rebecca Hutchinson, who’s wrapping up her post-doc later this spring.

And I already talked about species distribution modeling. Often, particularly with birds and

wildlife in general, when you go out an do a wildlife survey the species could be there,

but you just fail to detect it. And this is a well-known problem in ecology. So, imagine

that there’s some landscape and we’ve chosen some set of these black squares that

we’re going to go survey, but when we go out there it turns out some of the birds are

in the vegetation and we don’t see them. So, although there were every one of those

squares was occupied by our species we only see it twice. What can we do about that? Well,

one solution is to make repeated visits that are close enough together in time that you

think the birds have not moved around. Like during, when they’re sitting on their nests

or something. But far enough in time that you think you’re getting independent measurements

of this, of their hiding behavior. So, if we go back another day maybe you know now

we see the bird from the first cell, but the bird in the second cell is hiding. The third

one we still think is unoccupied, because that bird was hiding the whole time and so

on. So, this is one strategy that you can use. And if you look at the kind of data you

get, you get what are called detection histories. So, suppose we have four different sites.

Three of them are in forests, and one is in grassland. And suppose that there is this

true occupancy, which we say is a latent or hidden variable, right. And the first three

sites are occupied and the fourth one is unoccupied. But we don’t know that. That’s hidden

from us. So, on the first day we go out and it turns out it’s a rainy day and we’re

going out at lunch time, and we don’t see any birds. So, we have all zero’s here.

Now another day, we go out early in the morning. It’s a very good time to go birding and

it’s a clear day. And we detect the birds in the first two sites, but we don’t detect

this guy here in site three, and of course we don’t detect anything at site four. So,

we’re going to assume no false detections here, no hallucinations. Although, that’s

not always a safe assumption. And then the third day, it’s a clear day, but we’re

a little late getting out. So, we only see the, we only detect the bird in the first

site. So, these, a thing like 0, 1, 1 or 0, 1, 0 is called the detection history. And

from the detection histories you can estimate if you assume there’s there are independent

trials of your detecting ability. You can get a naïve estimate of you detection probability.

So, in this case we know from our data that sites A and B are occupied by the species.

And we know we had six opportunities to detect the birds, three at each site. We did, we

succeeded three times. So, our naïve estimate of our detection probability would be point

five. But in fact we really had nine chances to observe this species, which we only saw

it three times. So, our true detection probability, at least a maximum likelihood, the estimate

thereof would be point three, or one third.

So, the big challenge is how can we tell the difference between an all zero’s history

that is due to our (unintelligible) to detect versus an all zero’s history that’s due

to the fact that the site is unoccupied. And the answer of course is to build a probabilistic

model. And so this is a plate style model. And for those of you, who aren’t familiar

with the notation, think of these dash boxes as being four loops. So, we have a loop where

we iterate over the site. So, i index is a site. And x (unintelligible) is some set of

features that describes the site. Like it’s a forest and it’s at three hundred meters

of elevation. And at each site then based on its features or its properties there’s

going to be some occupancy probability (unintelligible). And we’re going to assume that birds toss

a coin with the probability of heads (unintelligible) to decide whether to occupy a site. And z

(unintelligible) is their true occupancy status of that site, either a zero or one. Now the

variable t is going to index over our visits to that site when we go observing. So, if

wit is some description of say it was 6 a.m. and it was sunny that are, that might influence

or account for our detection probability. And so then yit is the actual report, the

data that we get. So, we actually observe x, w, and y, when we really want z. So, we’d

like to extract out of this zi, which is the species distribution model, the probability

of the site being occupied given the properties of that site. And we’ll call a function;

I’m going to name the probability of that function f. So, f of xi is going to be the

occupancy probability. And we’d love to plot that on a map. But then we have this

nuisance model, which is our observation model and we’ll let dit be the value of this function

g that is our detection probability. And so we can say our probability of reporting a

1 at, that we saw the bird is the product of z, which will be 1 if the bird is there,

and dit which is the probability with detectors. So, that’s the model. And this was developed

by a group McKenzie Adolf from the USGS, but is a very nice and well established model.

But I’m a machine learning person. And you know in machine learning there is sort of

two parallel communities. There’s the community that loves probabilistic models and there’s

the community that loves non-parametric kind of decision models like support vector machines

and decision trees. And these two communities, well they’re people like me that have one

foot in both camps. But they really have very different outlooks.

Why do we like probabilistic graphical models? Well, it’s a terrific language for expressing

our models. And we have wonderful machinery using probabilistic inference for reasoning

about them. So, we know what the semantics of the models are at least what they’re

intended to be. And we can also write down models that have hidden variables, latent

variables that describe some hidden process that we’re trying to make inferences about.

So, probabilistic graphical models are kind of like the declarative representation of

machine learning. But there are some disadvantages, particularly when you’re exploring in a

new domain and you don’t understand the system well. Because you as the designer have

to choose the parametric form of each of the probability distributions in the mode and

you need to decide if you think there are interactions among the variables and you need

to include those interactions in the model. The data typically have to be pretreated to

be scaled and so on if you assumed linearity in your model you may need to transform you

data so that the model, it will have a linear relationship. And one of the most important

things we’ve learned in machine learning is the importance of adapting the complexity

of your model to the complexity of the data. And it’s difficult to adapt the complexity

of a parametric model. I mean there’s some things you can do with regulization, but it’s

not as flexible as using the sort of flexible machine learning models. So, you know back

at that very first machine learning workshop from which that book came out Ross Quinlan

gave a talk about a classification tree method that he was developing. And it was about a

couple of years later that Leo Bryman and Company published the book on CART.

So, classification and regression trees are a very powerful kind of exploratory non-parametric

method. And one of the beauties is that you can just use them off the shelf. Right? You

don’t have to design your model. You don’t have to pre-process or transform your data.

If they automatically discover interactions if they’re there, and sometimes even if

they’re not there. And they can achieve higher accuracy if you use em in ensembles.

So, boosting and bagging and random force type techniques.

And then of course since then support vector machine kind of revolution has swept through

machine learning. And these still require the same data preprocessing and transformation

steps, but by using kernels you can introduce the non-linarites in an extremely flexible

way. And there are very powerful ways of tuning the model complexity to match the complexity

of the problem. So, they work remarkably well also without a lot of carful design work.

So, a challenge is can we have our cake and eat it too? Can we write down probabilistic

graphical models with latent variables in them that describe processes we care about

and yet also have the benefits of these non-parametric methods? And this is a major open problem

in machine learning. And there are several efforts. There’s been a lot of work recently

in the SBM family. There’s Basing non-parametrics that use mixture models. The approach we’re

exploring is boosted regression tree.

So, I don’t really have a lot of time to describe booster regression trees. But they

grew out of boosting work in machine learning. And then first Mason and then Friedman Jerry

Friedman and Sanford noticed that there, that these could really be viewed as part of a

generic algorithm schema where you’re going to fit a weighted sum of regression trees

to data. And so he develop this thing called boosted tree regression or tree boosting.

So, the standard approach in these occupancy models is to represent these functions f and

g as log linear models or linear, logistic regressions. What we’re going to do is replace

those functions f and g with non-parametric flexible models, boosted regression trees.

And this can be done using this algorithm schema called functional gradient descent

or you could do functional EM actually also. And we had a paper at Triple AI last summer

that describes the method. So, I’ll just give you a little flavor for the results.

Of course there are methodological problems for studying latent variable models. And that

is that you don’t know the true values of those variables. They’re hidden from you.

So, how do you know whether you’re doing well? And so I’m going to describe results

for one synthetic bird species where we simulate a species using real data but faked occupancy

and faked things. So, we made this model additive, but non-linear. And this is a scatter plot

showing on the horizontal axis the true occupancy probabilities for this simulated species.

And on the vertical axis what different families of models predict. So, the left column is

models that are trained without latent variables treating it as a supervised learning problem.

And you can see that they systematically underestimate the true occupancy probabilities because they

assume the only positive examples they saw were the cases when you actually detected

the bird, which is obviously an underestimate of what’s really going on. In the right

hand column are ones that are using this latent variable model, the Occupancy Detection Model,

the OD Model. And then the top row are where we’re using logistic aggression as our peramitization.

And you can see that on the top right, it’s more or less unbiased. So, the true probabilities

and the predicted ones more or less lie on that diagonal line which is where they should

be. But there’s a lot of scatter and that’s because the true model is non-linear and we’re

fitting a linear model. Whereas if we use the booster regression trees on the bottom

we’re doing a lot better. We’re much closer to the line. I’d like to omit a couple of

the points that are far from the line. But otherwise we’re pretty happy with that fit.

And so, in general this is what we find is that we can train these flexible booster regression

tree models within a graphical model’s framework and get more accurate results. And so we’ve

been applying this to several bird species data.

So, looks like I’m running tight on time here. So, let me briefly just describe the

final problem which is managing fire in Eastern Oregon. Conveniently this is the problem where

we don’t have any results yet. So, I shouldn’t have said anything, you wouldn’t notice.

But, so this is now a policy problem, not really a data problem.

So, you know since the late 1910’s, 1920’s the U.S. Forest Service had a policy of suppressing

all fires essentially. It was part of the kind of political argument that was used to

sell the creation of the Forest Service was that we will prevent these terrible catastrophic

wildfires. Of course it turns out you can’t prevent them. You can only postpone them.

And that’s now coming to pass that our forests are filled with, we believe that the sort

of natural state of forests particularly in eastern Oregon, we should look something like

this where we have very large Ponderosa Pines, and then what’s called an open understory,

so just very small vegetation on the ground.

I don’t have a picture for it, but what we have right now is because fire has been

suppressed for a long time we have all kinds of vegetation on the forest floor. And we

have small trees of all different sizes, logical pines in particular that’s grown up among

these Ponderosa Pines. And so when you have an open ground like that and a fire happens

it burns through the ground and actually maintains that openness. But the Ponderosa Pines have

this big, thick fire resistant bark. And they’re actually happy with this fire coming through

and getting rid of some of their competitors. But what’s happened, since that hasn’t

happened now when a fire happens it is able to climb up the smaller vegetation, reach

the crown, and actually destroy the forest, kill all the trees. And you end up with the

really very intense catastrophic fires. And so one question is, is there anything we can

do to manage this landscape. And so we have a steady area in eastern Oregon, that’s

divided up into about 4,000 cells. They’re irregular shaped. They’re based on homogeneity

of the landscape there. And there are four things you can do to each of these cells each

year. You can do nothing. You can do what’s called mechanical fuel treatment. So, you

send people in and they cut down a lot of that small vegetation and card it out. You

can do clear cutting where you harvest the trees, but you leave behind a lot of debris,

and that actually while it gives you timber value it actually increases fire risk. Or

you can do clear cutting and fuel treatment and then fire just can’t burn at all in

that area at least for a few years.

So, the question is how should we position these treatments in the landscape if we want

to say minimize the risk of big catastrophic fires and maybe maximize the probability of

these low intensity ground fires. Well we can think about this as kind of a game against

nature. In each time step we can observe the current state of the landscape. Maybe this

is like a fire risk map. And then we choose an action. We have to choose an action, which

is actually a vector of actions. One action in each cell. And then nature takes, so these

are the actions maybe we choose to treat these particular cells. And then nature has its

turn and it lights fires and burns them. And then it’s our turn again.

And so we can model this as a big markup decision process. But unfortunately it’s a markup

decision process with an exponentially large state space. So, if each of these cells in

my landscape has five tree ages and five fuel levels then I have twenty-five to the four

thousandth power of possible states of the landscape, which is not going to fit into

memory very easily. And similarly, each time I take an action, I have an action vector

that has got four thousand elements and each with four possibilities in each position.

So, I have four of the four thousandth possible actions to consider. Even with all the cleverness

of the reinforcement learning community and approximate dynamic programming we don’t

know how to solve these problems.

There’s been a little bit of work. There was a paper by Wei, et al. a couple of years

ago when they looked at just a one year planning problem. So, if I just had one year to make

treatments and then there’s going to be fires in a hundred years, where should I put

my treatments? And they were able to formulate and solve a mixed integer program for this

optimal one-shot solution. They were just completely trying to prevent fire, which is

really not the right problem. But any case, we’re trying now to see whether we can build

on that work or come up with some method where we can solve this MDP over a hundred year

horizon.

Okay. So, in summary I’ve talked about this pipeline for the ways computation could help

in addressing problems in ecology and ecosystem management. I’ve talked about automated

data cleaning, about fitting these flexible models within a latent variable modeling framework.

And then very briefly about policy optimizations. And as I mentioned this is part of our larger

effort in what we call computational sustainability. And there are many other opportunities to

contribute to. You know I haven’t talked about energy. I haven’t talked about sustainable

development or smart cities or any of these things. But there are lots of computational

problems there as well.

I’d like to point out that the Computing Community Consortium I think the CCC is funding

some travel grants and prizes for papers in this area at several AI Conferences. I know

about the ICML and Triple AI, but I think there are some other conferences where they’re

doing this this year. So, there’s a special track for that that you could submit to. And

my joint grant with Cornell, we have created something called the Institute for Computational

Sustainability. And we have a website with all kinds of information about what’s going

on, not just in our own research, but throughout the computer science community.

And I’ll just thank the people that I mentioned at the start of the project. On the fire project

there are two other graduate students Rachel Houtman and Sean McGregor who have been working

there and of course the National Science Foundation that has been very generous here.

Well, thank you for your attention and I’ll answer questions. So, how does this work local

versus remote? So, what we usually do is give the remote sites a chance to go first, because

they might lose the connection later on.

Okay. Remote sites? Go ahead.

(Question being asked)

Okay. Yeah. So, what they do is they run several thousand fires, simulated fires. And try to

calculate for each cell in their landscape the probability that it will burn. And they

decompose that into the probability that it will burn because the fire ignited in that

cell. Or the probability that it will burn because fire propagated from one of its neighbors.

So, they can basically build a sort of probabilistic flow model that says the probability that

this cell will burn conditioned on whether its neighbors burned. And then they can model

a fuel treatment, which they model simply as if I treat this cell then no fire will

be able to propagate through that cell. Okay, And so with a couple of other approximations

they can turn this into a flow problem basically that we want to prevent flow from sort of

the total flow we want to minimize subject to some budget constraint about how many cells

we can afford to treat. And so that, they basically then have one integer variable for

each cell, and they have an objective and then they can solve it. I mean in our case

there would be four thousand integer variables, which would be a little bit scary. Their problem

I think had more like nine hundred cells though. So, it’s still quite substantial. But you

know sea plex is a wonderful thing. And so it was able to find the solution to that.

(Question being asked)

Uh huh. Okay. Right. Well this was one chat as opposed to sequential decision making.

So here we just get one, we just get one time step at which we’re allowed to take actions.

And then from there on out nature just gets all the moves in the game. So, that’s the

sense in which it’s a single decision, single one-shot plan, totally upfront planning in

other words. And there are a lot of problems in ecology where we end up having to take

that view that we’re just going to say we want to buy all the following territory. So,

we’ve looked at some, there’s an endangered species called the Red Cockaded Woodpecker

that I believe is here in North Carolina. And my post doc Dan Sheldon did some very

nice work where the question was there are two pockets of this species, one I think at

Camp Legune and the other in the Palmetto Palm Reserve or something like this. And the

question was could they buy a series of intermediate sites to encourage those two species to mix

and have some genetic flow between them. So, it’s a problem of basically trying to encourage

flow instead of trying to prevent flow. And they were able to also formulate this and

solve it for the one-shot case in terms of building a network that would maximize flow

subject to budget constraints.

But the real problem, you can’t buy all the property all at once. You don’t have

the money and it isn’t all available. So, you really need to have be online and every

year take some actions that you can afford to take to keep moving toward that objective.

So, turning that into a Markov decision problem or what’s often called active management

in the you know environmental literature that’s still an open problem. We don’t know how

to do that.

(Question being asked)

That, yeah that is a good question. And we do wonder is there some way we could come

up with some set of sort of spatial basis function that would let us for instance represent…suppose

that we had an optimal policy for laying out treatments in landscape could we, but we could

only compute it for a particular fixed landscape, could we somehow generalize from that to a

more general policy and maybe some kind of set of spatial basis functions would allow

us to do that. And the same is true for looking at yeah the sort of structure of the landscape.

There’s certainly a lot of work done in atmospheric sciences and weather where they

basically use PCA to create a set of basis functions that they can use then to approximate

a lot of things. So, it’s something we’d like to explore more.

(Question being asked)

I’m sorry. Right. Well particularly here we’re intervening in the system, and so

yeah the trouble is that we, you have this research base where if I take these actions

then these fires will burn. If I take these actions something else will happen. And you

end up having to do exponentially many simulations just to simulate one set of circuitry. And

so obviously we have to rely on some kind of sampling or some kind of way of capturing

the spatial scale where we beyond which we can ignore the spatial components. It’s

not clear really how to proceed.

(Question being asked)

Well that is a very good question. Right now we’ve mostly been looking at just this one

site. And we have the weather data and all the data about the sites, which we need to

be able to do the work. And it’s a good question whether they are generalizable lessons

that you could take away from this. One, I also have some projects in evasive species

management and we’re asking the same question there. And often it’s kind of disturbing.

I mean you get a solution like this big map here wherever it was that says well these

are the places that are the optimal places for me, but how do you, is there any pattern

to that? Is there any way that we could explain that as a sort of a set of rules that we could

apply to a different situation? How could we generalize from this particular landscape?

And we need, we need to do that just to explain it to our domain experts. And obviously policy

makers are not going to be happy just being told well it’s optimal. Our algorithm said

so. Particularly because we won’t be able to say that. We’ll have to say it’s approximately

optimal, but we don’t know how bad or something like that. And so we’re really going to

need to be able to give them something qualitative understanding and let them be able to play

with it, and modify it, and explore, and understand you know how good it is. And that’s a huge

challenge to just explain you know once you’ve done ten million simulations what lesson can

you take away from it. Okay.

So, I’ve got a question that maybe ducktails with that.

Uh huh.

So, to what extent do you feel like these techniques and the recommendations or policies

that you’re producing using these techniques are getting traction with the people who are

actually implementing policy decisions and you know is it something where you feel like

you’re having impact now, you feel like maybe it’ll be five years, ten years, how,

you know what time scale are we talking about here?

Um I would guess in five to ten years. I mean we’re very fortunate with the forest situation

that we have some of the Forest Service people on our team. And a lot of them are former

students of Claire Montgomery who was on the team. And so we have a nice working relationship

with them. And, but the question is, that is a good question whether they would ever

be able to execute our particular policies. I think one of the main things we’re trying

to do is give them backup ammunition for being able to support the actions that they are

taking. Right now the idea that they might want to treat the landscape in a particular

way or in a related problem they might want to let a fire burn instead of suppressing

it, that’s an extremely controversial politically difficult decision. If we could provide some

analysis that shows that yes, under a wide variety of scenarios that would be a, it would

be better to let this fire burn, or it’s better to treat this than those other things.

And that might help them persuade their stakeholders to go along with it. Of course another thing

that would help them persuade their stakeholders is if we could say well and for these small

communities that have timber mills we can also guarantee you a certain economic benefit

from doing this. And so there’s a whole set of economic objectives perhaps that we

would like to have. We don’t maybe also like to have a whole bunch of endangered species

habitat objectives. So, the real problem you know gets messier and messier. But we won’t

be able to attack any of those unless we really can come up with a methodology that works

for these problems.

What you just laid out is a hard scenario for any algorithm to work you know, a procedure

to optimize. But as it is it has to be optimized by humans. I mean in other worlds there are

people actually making decisions about whether or not to let a fire burn. And they have to

process all of it. So…

Right.

I mean.

Well mostly they are not letting fires burn, because it’s just too risky and plus the

firefighting money doesn’t come out of their budget. It’s somebody else’s budget. So,

there’s not really an incentive for them. For the fuel treatment though you’re right.

Right now they are making some guesses about where to treat, trying to balance all of these

issues, and I would say they’re not very happy with that. They would like some more

rational way, basis for making those decisions.

Yeah. I guess my point was you may not have to get optimal. You may just have to do better

than humans guessing.

Right. Well, but we have to convince them that we are doing better yeah. And that comes

into a lot of this broader contextual thing as well.

Yes.

You sort of apply the basic approach. Could you just take the particular plans or policies

that they are using or thinking of using as a prior and then go from there and simplify

you’re model because you’re, you’re working from a targeted assumption…

Uh. Huh.

base.

Oh that’s an interesting idea, yeah, would be to see if we could in some sense model

what they’re doing and then ask locally how could we improve it, without maybe without

walking too far away from it so it doesn’t look so strange or threatening. No we hadn’t

thought about that, but that’s an interesting idea. Okay.

Thanks.

Well thank you very much. My pleasure.