It’s a great pleasure today for me to introduce Tom Dietterich. I’ve known Tom for quite
some time. He…which I brought a prop. He actually was an author on my first machine
learning textbook. And I got to meet him when I was a graduate student at Berkley. And I
guess this was not long after Tom started his first faculty position after getting his
Ph.D. at Stanford. He came down to visit Berkley and I enjoyed getting to meet him then. And
even then as you see with this textbook he was playing a very important role in shaping
the machine learning community. And he’s gone on since then to continue to play a role
in shaping the machine learning community. So he is a Triple AI fellow and ACM fellow.
He’s been program chair for for Triple AI, program chair for NIPS. He’s been very active
in the International Machine Learning Society, and really a mentor in the field to a lot
of young people. And Tom is one of the few people to this day who really sees the entire
field of machine learning. And as the fields have become increasingly specialized, it’s
rare to find people who can appreciate the whole field and take it all in. And that’s
one, the great, many great things that Tom is known for. And today he’s going to be
telling us about a very important application of machine learning, which is to computational
ecology and environmental management.
Thank you very much Ron. So, the work I’m going to be describing today is obviously
a very collaborative interdisciplinary, and the collaborators in particular that I want
to mention is my graduate student Ethan Derigensky, two post docs Rebecca Hutchinson and Dan Shelton,
and then colleagues Wanking Wong who’s in Computer Science, a machine learning person
Clair Montgomery who’s a forest ecologist. And then several folks at the Cornell Lab
of Ornithology.
So, if we look at the earth’s ecosystems or the biosphere, it’s a very complex system.
And I think we can agree that in many ways we have not managed it in a sustainable way.
And so I thought I would start the talk by asking about why is that so, and is there
anything that computer science can do to help? And I think, I mean everybody had their own
views of why this is so. But I think maybe there are three reasons. First of all we don’t
understand the system very well. So, it’s very hard to manage a system when it’s behaving
very unpredictably. And there was a very thought provoking article by a group of authors, first
author was Doak in 2008 where they talk about; they ask the question are ecological surprises
inevitable. Or is the dynamics of ecosystems so complex that we will never really be able
to predict the behavior of the systems reliably. And to sort of support this well, thesis they
go through, I don’t know, fifteen or twenty different examples of situations where either
something completely surprising happened like the population of a species in the Gulf of
Alaska suddenly exploded and then five years later disappeared again, with no one knowing
why. Or examples where we attempted a an intervention in an ecosystem and then it behaved in a,
the outcome was very different from what we had intended. And one example that is very
current right now in the Pacific Northwest is the Northern Spotted Owl. So, during the
late 80’s and 1990’s we had what we call the owl wars in Oregon, because there’s
this species that was listed as an endangered species, the Northern Spotted Owl, and its
preferred habitat was, is old growth forests. And these, most of the old growth forests
on private land had already been cut, and so now there was a lot of logging in the national
forests in the public lands and the conservation community wanted to shut down all that logging.
And obviously the Forest Products Industry which was a very important part of the Oregon
economy was dependent on it to a large extent. And it took you know the President had to
come to the state and bring everybody together. And they came up with this zone called the
Northwest Forest Plan, which by and large did stop logging forests and federal lands,
which had a devastating impact on the economy. And the hope was that this would help the
spotted owl recover. But spotted owl numbers have continued to decline since then. And
partly that’s because there was another species that has come in from the North. The
Canadian Invader, which is known as the Barred Owl. And it turns out it is more reproductively
successful and more aggressive. And it seems to be pushing out the spotted owl. So, that’s
the kind, that’s another example of an ecological surprise, and it’s one of the reasons managing
the ecosystems is so difficult.
I think another reason that we’ve had trouble managing ecosystems is that we’ve often
focused on only a small part of a very large system, because the system is so complicated,
and we’ve focused on only one piece of it. So, that could be a single species like the
Northern Spotted Owl might be an example of that. And we’ve often also ignored some
of the larger contexts. There’s a colleague of mine, Heidi Jo Albers who has studied things
like creating forest reserves in tropical forests. And often these forests, when you
design these reserves, you need to consider what the native people might be using that
forest for. If you don’t take that into account, in her case that meant creating large
buffer zones around the actual forest, you end up with those people making encourageons
into your bio reserve and degrading it in one way or another.
So, having to consider the spatial aspects, the interactions among multiple species, these
are things that are often ignored in a lot of ecology and ecosystem management. And finally,
I think particularly if you look in agriculture, we often deliberately manipulate a system
to simplify it in order to try to manage it. So, in crop agriculture for example we try
to remove all of the other species so we only have to worry about one species. But as a
consequence we have to provide a lot of the support for that species that would normally
be provided by other species, like fertilizers and pest management and so on. We have to
provide those as exogenous inputs. And many of those like I say some of the nutrients
that we’re providing now are becoming expensive. And this is not a sustainable way of managing
those systems.
Well, and I’m sure you could go on and list many other things. What can Computer Science
have to offer? I mean the reason I’m here is because I think there are several things.
First of all if we look at the question of our lack of the knowledge of the function
and structure of the systems, we now have a couple of ways that we can contribute. First
of all you know, we and our colleagues in nanotechnology and electrical engineering,
we’re producing all kinds of novel sensors that we…so we have wireless sensor networks.
We can create thousands of sensors, put them into these systems, and be able to monitor
them much better.
And of course the machine learning community and computational statistics community have
been working on building the modeling technique that can scale up to much larger systems.
Although of course it’s still a challenge, but much more than say was possible twenty
years ago. When it comes to this question about focusing on subsystems, some of the
same story. Obviously with our modeling tools we can now look at the larger system in which
the smaller system is embedded. But I think we also now have tools in say mechanism design
to look at the interactions of different parties that might be competing for a resource or
tools in modern optimization that let us find good solutions to very large and complex optimization
problems.
And again when we come to agriculture it’s a different combination of these three things.
But better sensing, better modeling, and better optimization all have a role to play in allowing
us to model these systems and manage them better.
So, this general field that we’re calling computational sustainability, is one of the
big things in my group is we are joint with Karla Gomez at Cornell have one of the NSF
expeditions in Computer Science projects. So, a ten million dollar grant to try to boldly
go where no computer scientist has gone before, and in particular to look at computational
methods that can contribute to sustainable management of (unintelligible) systems. And
so as a machine learning person I tend to think about the computational challenges that
are here in terms of a pipeline, from data to models to policies. And so what I’m going
to do in this talk is first talk about what I see as some of the work that’s going on
in each of these areas outside of my group briefly, and then drill down on three specific
things that we’re doing in my group that contribute to this area. And so I’m hoping
you’ll get a sense of the range of challenge problems that are here and some of the opportunities
from a Computer Science perspective.
So, the first thing I want to talk about is sensor placements. And Andres Crowlza (sp?)
and his students have been doing some really exciting things there. So, this particular
example is a case where they’re (which I’m not supposed to point with this, I point with
this) where this is a city’s water network. And they want to know where should we place
sensors in this network in order to detect pollutants or maybe an attack, but a chemical
that’s introduced into the system. And their main tool that they use is something called
sub modularity, right, which is the idea that if, that if you have a function, it’s a
function of a set. In this case the set of places that you have put your sensor. And
it exhibits a diminishing returns property that you know once you’ve placed K sensors
the K plus first one is going to give you less benefit than the K one and so on. Then
you can, that’s, you can formulate, if your objective function is sub modular, then the
greedy algorithm in various sophisticated variants of it, give a performance that is
within a constance of optimal or proportional or fraction of optimal. So, you can get very
good results and in fact they won some competitions on for water, water quality monitoring. And
they’ve looked at many other problems as well. So, sensor placements and of course
this has a lot of relationship to the huge literature and experiment design and its existence.
The second thing that comes up is what I call data interpretation for lack of a better word,
which is often the raw data you get from your sensors, is not at the level you want for
your modeling effort. And this is particularly true for image data. So, for the last eight
years I’ve been running a project that we call the Bug ID Project, where we take photos
of moths and arthropods and soil and freshwater larvae. And we want to identify them to the
genous level and ideally to the species level. And this might be for instance input to building
a model of the distributional species in space to tracking bases species. Or even to water
quality monitoring, where you want this histogram by species of how many individuals you had
in a given stream. So, this particular picture here is from a collaborative of mine Qing
Yao, who’s looking at rice pests, and they put out these light traps at night. And moths
wonderfully trap themselves in these traps. And then they spread them out on a glass table,
photograph them from above and below and then they want to account to identify to species
level.
The third problem then I call data integration. I guess that’s an established term. The
problem is with a lot of ecological modeling challenges you have data coming from a wide
variety of sources, and a wide variety of scales in time and in space. And you need
to somehow pull all this together in order to then fit a model to the data. And so in
what we’re doing for instance on bird migration modeling, we’re dealing with data. Everything
from stuff that basically never changes like a digital elevation model of the terrain to
things that are maybe changing on a fifteen minute time scale, like the temperature or
the weather and having to integrate all of these things.
And then we come to the part that you know is really my core competence, which is model
fitting and machine learning. And so there are of course a wide range of models in ecology
that people would like to fit. We’ve been looking really at just three kinds of models.
The first are what are known as species distribution models. And the question there is can we create
a map of where a species is found in the landscape. And so that’s very close to sort of the
core machine learning supervised learning problem. You’re given a site where with
some set of features describing it and then either the species is present there or absent
there.
Another kind of model is something called a Meta-Population Model. And here we imagine
that we have a set of patches arranged in space. And a patch may be occupied by a species
or not. And over time the species may reproduce. It may spread to other patches; it may go
locally extinct and then get re-colonized. So, that’s sort of focusing on space and
looking at what comes in and out of a cell. And then the other (unintelligible) sort of
migration or disbursal models where you follow the organism instead. So, you want to model
the trajectory say that a bird follows or the timing of movement.
And so there’s work in machine learning on all of these. One I want to show is what’s
called a STEM Model that was developed by Daniel Fink at the Cornell Lab of Ornithology.
And so at the Lab of Ornithology they have a big project called Project E-bird, where
if you’re a birder you can go out observing in the morning say and then fill out a checklist
on their webpage and say here’s what I saw and I didn’t see anything else. You can
click a button for that and then upload it. There are a lot of avid birders out there.
So, we’re now getting like a million data points a month from people uploading. And
they exceeded three million points in May, sort of the peak of the breeding season. And
so there’s a lot of data. Unfortunately it’s completely uncontrolled. Right? So,
you have lots of variation expertise. You have no control over where people go. But
you can still do some interesting things. And what Daniel does is fit ensembles of decision
trees to try to predict whether this species, in this case the Indigo Bunting, will be present
or absent at a particular place and time. And so I’m going to show you this movie,
but it’s important to realize this is a series of snapshots. There’s no dynamical
model here. But this species winters down in Central America. And you’ll see the orange
colors. That’s the species is predicted to be present first along the Mississippi
Valley and then sort of spread out through the entire eastern U.S. And then as we move
into September, this is sort of a clock ticking along the bottom you see the species goes
back down to Mississippi and disappears from the U.S. And so this is a really I think a
very nice model. And it was used as part of a something called the State of the Birds
Report to try to estimate what fraction of habitat for each of the something like two
hundred species of birds is publicly owned versus privately owned. And this report came
out late last year.
So, once we have built a model like this then it’s time to say well it’s great that
we have this model of birds but what can we use, how can we use that to make policy decisions
to manage the ecosystem. And I don’t have a good example for management with birds but
with fish John Leathwick who does excellent work in New Zealand, so I don’t know if
you can tell, but see these gray things over there, these are the islands of New Zealand.
And these blue dots are where and red dots correspond to places where fishing trollers
found, did not find or found, the red ones are positive, did harvest a particular species
of fish. The Mora Moro. And the blue line around the outside is the exclusive economic
zone of New Zealand. And so using this data, he fit a species distribution model similar
to the one that I was just describing except that instead of estimating presence or absence
he’s estimating the catch in kilograms, so the biomass of the fish. And so these are
his estimates. The blue areas there are no fish at all and then you can see this pattern.
And then what he wanted to do was then use that to prioritize regions for their conservation
value in terms of supporting that, allowing this population to grow. And the left pot
is prioritizing them if we ignore the fishing industry and just say what would be the places
that would best encourage the growth of the species. But of course you really need to
consider these within an economic context. And so the right diagram re-prioritizes them
now taking into account the cost of the fishing industry. And you can see, I mean the main
lesson here I think is that there’s still a lot of places that we can conserve and yet
also still have the benefit of fishing.
So, but this is a kind of a spatial optimizational problem to solve. And I’ll be talking about
some more of those. So, finally that we have the problem of policy execution, this is usually
of course the chasm to go from a design policy to one that we can convenience people to actually
adopt. And you know at the simplest level we just have a policy where at each time step
we observe the state of the system. And then we choose the action that our policy tells
us to choose. And we go ahead and act. But in practice we’re often called upon to act
in a lot of ecosystem management problems, before we have a very good model of what’s
going on. And so they’re really what we would call a partially observable markup decision
process or worse, where we don’t have a complete understanding of the system we’re
trying to model. I think a challenge here is that, this means that our policy in our
early actions should be designed not only to achieve the ecosystem goal, but also to
help us gather more information about the system so that we can improve our model. So,
we have duel objectives. And these are very difficult to optimize.
And one of the big concerns I think in particularly in light of these ecological surprises is
can we design policies that are robust to our lack of knowledge. Both to the known unknowns
to think that where we know that we’re uncertain and we can model our uncertainty, and then
also to the known unknowns the factors that we forgot to include in the model. And I think
that’s one of the most interesting intellectual questions. I don’t have an answer for it,
but I think that there are some things we might be able to do.
Okay, so that’s the review of the sort of pipeline. And now I’d like to look at, talk
about three specific projects at Oregon State. So, and these will be in data interpretation
and model fitting and in policy optimization. So, the first project is the dissertation
project of my student Ethan Dereszynski. And he’s going to be graduating soon, so he’s
looking for a job. And what he works on is automated data cleaning in sensor networks.
So, Oregon State University operates something called the H.J. Andrews Long Term Ecological
Site. So, NSF funds a collection of these study sites where there have been, they’re
committed to collecting data over long periods of time and doing long term experiments. So,
one of my colleagues Mark Harmon for instance has started an experiment that is going to
last two hundred years that’s called the Roth Experiment. It’s about trees and how
long it takes them to decay. But you know it takes forever to get tenure in this field.
Anyway, in this case we’re looking at these weather stations that are there. And I’m
going to talk mostly about four thermometers. So, this is a weather tower here and these
little L shaped things coming off are have a thermometer on them. And they’re allegedly
at one and a half, two and a half, three and a half, and four and a half meters above the
ground. And we get data from them that looks something like this. So, every fifteen minutes
we get a temperature reading. And you can see on these curves the up and down motion,
this is, those are the daily cycle. The (unintelligible) Cycle. So, it’s warming up in the daytime
and cooling off at night. And it’s kind of fun, because the thermometer that’s nearest
the ground, which is the black line in the, is the one that’s coldest at night and hottest
in the day. So, they, they flip back and forth like this. And the problem is that these sensors
are out in the world and bad things happen to them. And so someone has to do data quality
assurance on these, on the sensor data and clean it up before we try to do any analysis
on it. Now traditionally in the (unintelligible) Forest, we’ve got three of these towers
and then there are many more. But there are three main ones that have been in operation
since the 80’s. And with twelve thermometers it’s not really much of a burden for someone
to go check this data. They just eyeball it and cluster it in various ways and look for
outliers. But of course we now want to, now we’ve now got Wi-Fi over the entire forest
and we want to put out huge networks of things. And if we have a thousand thermometers this
human data cleaning becomes infeasible, unless we can figure out how to make a Capua out
of it and maybe get people to do it.
So, the kinds of things that go wrong, like for instance here this is an instance of what’s
called a broken sun shield. And so, the air temperature sensor is now measuring actually
the surface skin temperature of the thermometer with the sun directly beating down on it.
And so you can see in the daytime it spikes way high, as many as ten degrees higher than
the true air temperature. At night it’s a perfectly good air temperature sensor, but
in the daytime, particularly sunny days, not so good.
Can anyone guess what’s going on in the bottom case here? We have this our 1.5 meter
sensor is flat lining for a while.
Yes. So, the problem here is this is week three. So, that means it’s right about now.
But it was in 1996. We had a big snowstorm. And so this is now a snow temperature sensor,
instead of an air temperature sensor. In some sense the thermometer is still functioning
correctly, it’s just that the metadata is wrong. But there’s a lot more going on here.
So, you notice that the 4.5 meter thermometer is still bouncing up and down rather nicely.
I mean obviously over here It’s quite cold these days, even the nights, even in the daytime
it’s just barely getting above freezing. But then what’s happening over here. We’re
getting, it really warmed up. I mean it’s almost in incident fifties at the top of the
thermometer tower. And at right around 3500 here it’s starting to rain. And so the snow
temperature goes moves up to sort of the triple point of water for a while. And now the snow
is melting and we’re having…and the university is closed right around 4500 because we had
such a huge flood that you couldn’t get to campus. So, this is, this is how you get
a big flood in Oregon is to have what’s called a rain on snow event. And this was,
this was one of them.
So, we like to detect these things also you know because they’re interesting, but we
don’t want to assume that this thermometer is measuring air temperature during this entire
period.
So, how can we do this? Well, we’d like a data cleaning system to do really two functions.
The first is we’d like it to mark every data value that we think is anomalous. And
so in this case this is a different set of data. But we’ve put the little red what
they call the rug, right a little red tick underneath each data point that our model
predicts is incorrect as something wrong.
And then the other thing you’d like it to do is to impute, or predict, fill in the missing
values, what the, what the thermometer should have been reading if it had been working correctly.
And we’re going to do this, we’re going to do both of these things using probabilistic
model.
So, the basic probabilistic model we’re going to use though these are you know Bayesian
Networks or probabilistic graphical models is the following. We’re going to have one
node here for each of our variables of interest and the one that is gray, that’s an observe
node. So, this is the observe temperature at time t. And then there is a hidden node,
which is our true temperature that we wish we could observe directly. Then up here is
our sensor state variable. And I’ve made it a box to indicate that it’s a discrete,
whereas these are continuous variables. And the idea is a very simple sensor model that
says when the sensor state is one, that is normal or working, then the observe temperature
has a (unintelligible) distribution who’s mean is the true temperature x but with some
small variance around that. But when the thermometer is broken and so the state is zero then the
observe temperature has a mean of zero and a gigantic variance. So, basically what we’re
saying is completely unrelated to the true temperature. So, this is a very simple model,
and why do we adopt this kind of model? Well you could try to think about this kind of
data as if it were a diagnosis problem that the sensor has various fault modes and failure
modes and you want to predict what they are. And so you could do a kind of Bayesian diagnosis
where you could say well given the sensor readings and my expectations it looks like
it’s a broken sunshield or it looks like it’s a flat line because of a communications
failure or something like this. But the trouble is we were not confident that we could enumerate
an advance all the ways a sensor could fail. We wanted to have an open ended set. So, the
idea here is to treat it more as an anomaly detection problem where we model the normal
behavior of the sensor as accurately as we can. And then anything that is a serious departure
from normal, this model, the normal model will the first line will give it very low
likelihood and it’ll instead get picked up by this sort of very generic failure model.
So, that’s the idea here.
So, we can do anomaly detection then by doing probabilistic inference. We ask the query
you know what is the most likely value of the state of this sensor at time t. And that’s
just the argmax over the possible states of the probability of the states given the observation.
And we can also do imputation by asking instead what’s the most likely temperature given
the observed temperature. So, basic probabilistic inference techniques work just fine. But of
course this is a very bad model of the sensor here. So, the next thing we want to do is
add some sort of Markov Model so that we can look at the history of the sensor. Because
we’d like to say well sensors, if it was working fifteen minutes ago, it’s probably
still working now. And if it was broken fifteen minutes ago, it’s very likely it’s still
broken now. So, we’d like to do that. And similarly of course the actual real temperature
doesn’t change that drastically either. So, we’d like to have some model of the
true temperature changes over time. So this gives us now a Markov version of this. And
now we can ask a query like what’s the most likely state of this sensor this time given
the entire observation history. And that also can be reasonably calculated. But we can go
even further than this if we have multiple sensors as we do on these towers. We could
build a separate copy of the model for each of them and then couple those somehow. So
we could say that you know if we know the temperature of the sensor at the bottom of
the tower then we should be able to predict with reasonable accuracy the sensor next up
on the tower. And so this is the kind of thing we do. In general we learn a sparse joint
grousing distribution among all of the t variables. And then so that we have a connected model.
Unfortunately probabilistic inference in these models starts to become intractable. So, even
in the single sensor model, which it with the Markovian independence, you would think
that that would not be a problem. But it is because of our observed variable. That would
be true. If all the variables were discrete then we could solve that very easily. A simple
message passing algorithm will do it. But because our variables are continuous so there
are conditional grousing’s when you marginalize away the history it gives you a mixture of
grousing’s that grows exponentially with the number of time steps. And so, so it becomes
impractical to do you know more than just a few time steps before that. That won’t
work. So, what we do is basically a forward filtering process where we at each time step
we ask what’s the most likely state of my sensor. And then we say okay we’ll believe
it. We’ll adopt that stage and treat it as evidence and then at time two we’ll ask
okay what’s the most likely state at time two given that I already committed to the
state of time one. And we do this. And so now, and we also have to bound the variance
on the true temperature. Just because if you have a whole a long string of sequences where
the sensor is bad the true temperature becomes extremely uncertain. And you can’t let that
grow too far.
Probabilistic inferences also infeasible in the Multiple Sensor Model, even if you follow
this step by step commitment strategy. And so the solution we’re using right now which
seems to work best is something we’re calling Search MAP, which at each time step you start
by assuming that all of the sensors are working. And you score how well that accounts for the
observations. And then you ask can I improve that score by breaking one of the sensors.
And you do this in a greedy algorithm basically hill climbing to try to find a map solution.
You don’t always find the true maximum, because there are local (unintelligible).
But even the simple greedy algorithm is takes a polynomial time that’s quite substantial
in the number of sensors.
Yeah?
(Unintelligible) working even if in the previous times commitment you decided one of em was
broken.
That’s what we’re doing right now. But we could start with yeah with our map guess
from the previous time step too. And you can also consider a variation where having broken
one sensor you might reconsider your previous decision in which case you can do I don’t
know sometimes called floating backward you know floating greedy algorithm, which takes
even longer but gives you better solutions. And we’ve tried a whole bunch of other things
you know, various kinds of expectation propagation and the whole bag of tricks in the machine
learning probabilistic modeling area, but…actually one thing we haven’t tried yet is particle
filters. He’s working on that right now. (Unintelligible). Rob (unintelligible).
Well here are single sensor results. So, on the broken sunshield you can see that it,
the bottom curve is the data again, the bottom plot is the data the top plot is the predicted
temperature of just the thermometer of the one, the one that’s closest to the ground.
And then along our periphery curve, we color code it with, our domain experts wanted us
to not just have broken or working but to actually just have four levels of performance
from very good, good, bad, and very bad. So, very bad would be black and there are just
a couple of spots at the peaks of these days when there are some black spots there. But
otherwise it’s mostly marked things as red for bad. And at night, of course it’s still
very; it’s a very good sensor. So, we’re able to do using just a single sensor model,
and there’s a lot more in the single sensor model we build a baseline expectation based
on previous years so that we certainly know what week six looks like in general.
And then for the Multi Sensor Case, Ethan did an internship at EPFL in Switzerland and
there they put out these short term deployments of sensor networks and he learns conditional,
well in this case yeah conditional grousing basing network over the true temperatures
and then fix that combined model. And so these are the results. And you can see it’s doing
quite well in some cases. It’s picking out a lot of these things where we have like a
extremely bad spiky sensors. But in these long flat lines it’s doing okay, except
sometimes when the dash line here is the imputed valued, when the predicted value happens to
coincide with the flat line it said oh the sensor’s working again. So, this is a case
where we probably really should have a flat line model, because these flat lines happen
when the data link is lost and so.
Okay. And there are many other challenges. I mean we’re working to single time step,
but of course it really should be multiple scales. And we’re also working on integrating
more heterogeneous sensors than just temperature.
Okay. Well, so that’s an example of this automated data cleaning work. The next problem
is model fitting with an explicit detection model. And this is worked by a post-doc of
mine, Rebecca Hutchinson, who’s wrapping up her post-doc later this spring.
And I already talked about species distribution modeling. Often, particularly with birds and
wildlife in general, when you go out an do a wildlife survey the species could be there,
but you just fail to detect it. And this is a well-known problem in ecology. So, imagine
that there’s some landscape and we’ve chosen some set of these black squares that
we’re going to go survey, but when we go out there it turns out some of the birds are
in the vegetation and we don’t see them. So, although there were every one of those
squares was occupied by our species we only see it twice. What can we do about that? Well,
one solution is to make repeated visits that are close enough together in time that you
think the birds have not moved around. Like during, when they’re sitting on their nests
or something. But far enough in time that you think you’re getting independent measurements
of this, of their hiding behavior. So, if we go back another day maybe you know now
we see the bird from the first cell, but the bird in the second cell is hiding. The third
one we still think is unoccupied, because that bird was hiding the whole time and so
on. So, this is one strategy that you can use. And if you look at the kind of data you
get, you get what are called detection histories. So, suppose we have four different sites.
Three of them are in forests, and one is in grassland. And suppose that there is this
true occupancy, which we say is a latent or hidden variable, right. And the first three
sites are occupied and the fourth one is unoccupied. But we don’t know that. That’s hidden
from us. So, on the first day we go out and it turns out it’s a rainy day and we’re
going out at lunch time, and we don’t see any birds. So, we have all zero’s here.
Now another day, we go out early in the morning. It’s a very good time to go birding and
it’s a clear day. And we detect the birds in the first two sites, but we don’t detect
this guy here in site three, and of course we don’t detect anything at site four. So,
we’re going to assume no false detections here, no hallucinations. Although, that’s
not always a safe assumption. And then the third day, it’s a clear day, but we’re
a little late getting out. So, we only see the, we only detect the bird in the first
site. So, these, a thing like 0, 1, 1 or 0, 1, 0 is called the detection history. And
from the detection histories you can estimate if you assume there’s there are independent
trials of your detecting ability. You can get a naïve estimate of you detection probability.
So, in this case we know from our data that sites A and B are occupied by the species.
And we know we had six opportunities to detect the birds, three at each site. We did, we
succeeded three times. So, our naïve estimate of our detection probability would be point
five. But in fact we really had nine chances to observe this species, which we only saw
it three times. So, our true detection probability, at least a maximum likelihood, the estimate
thereof would be point three, or one third.
So, the big challenge is how can we tell the difference between an all zero’s history
that is due to our (unintelligible) to detect versus an all zero’s history that’s due
to the fact that the site is unoccupied. And the answer of course is to build a probabilistic
model. And so this is a plate style model. And for those of you, who aren’t familiar
with the notation, think of these dash boxes as being four loops. So, we have a loop where
we iterate over the site. So, i index is a site. And x (unintelligible) is some set of
features that describes the site. Like it’s a forest and it’s at three hundred meters
of elevation. And at each site then based on its features or its properties there’s
going to be some occupancy probability (unintelligible). And we’re going to assume that birds toss
a coin with the probability of heads (unintelligible) to decide whether to occupy a site. And z
(unintelligible) is their true occupancy status of that site, either a zero or one. Now the
variable t is going to index over our visits to that site when we go observing. So, if
wit is some description of say it was 6 a.m. and it was sunny that are, that might influence
or account for our detection probability. And so then yit is the actual report, the
data that we get. So, we actually observe x, w, and y, when we really want z. So, we’d
like to extract out of this zi, which is the species distribution model, the probability
of the site being occupied given the properties of that site. And we’ll call a function;
I’m going to name the probability of that function f. So, f of xi is going to be the
occupancy probability. And we’d love to plot that on a map. But then we have this
nuisance model, which is our observation model and we’ll let dit be the value of this function
g that is our detection probability. And so we can say our probability of reporting a
1 at, that we saw the bird is the product of z, which will be 1 if the bird is there,
and dit which is the probability with detectors. So, that’s the model. And this was developed
by a group McKenzie Adolf from the USGS, but is a very nice and well established model.
But I’m a machine learning person. And you know in machine learning there is sort of
two parallel communities. There’s the community that loves probabilistic models and there’s
the community that loves non-parametric kind of decision models like support vector machines
and decision trees. And these two communities, well they’re people like me that have one
foot in both camps. But they really have very different outlooks.
Why do we like probabilistic graphical models? Well, it’s a terrific language for expressing
our models. And we have wonderful machinery using probabilistic inference for reasoning
about them. So, we know what the semantics of the models are at least what they’re
intended to be. And we can also write down models that have hidden variables, latent
variables that describe some hidden process that we’re trying to make inferences about.
So, probabilistic graphical models are kind of like the declarative representation of
machine learning. But there are some disadvantages, particularly when you’re exploring in a
new domain and you don’t understand the system well. Because you as the designer have
to choose the parametric form of each of the probability distributions in the mode and
you need to decide if you think there are interactions among the variables and you need
to include those interactions in the model. The data typically have to be pretreated to
be scaled and so on if you assumed linearity in your model you may need to transform you
data so that the model, it will have a linear relationship. And one of the most important
things we’ve learned in machine learning is the importance of adapting the complexity
of your model to the complexity of the data. And it’s difficult to adapt the complexity
of a parametric model. I mean there’s some things you can do with regulization, but it’s
not as flexible as using the sort of flexible machine learning models. So, you know back
at that very first machine learning workshop from which that book came out Ross Quinlan
gave a talk about a classification tree method that he was developing. And it was about a
couple of years later that Leo Bryman and Company published the book on CART.
So, classification and regression trees are a very powerful kind of exploratory non-parametric
method. And one of the beauties is that you can just use them off the shelf. Right? You
don’t have to design your model. You don’t have to pre-process or transform your data.
If they automatically discover interactions if they’re there, and sometimes even if
they’re not there. And they can achieve higher accuracy if you use em in ensembles.
So, boosting and bagging and random force type techniques.
And then of course since then support vector machine kind of revolution has swept through
machine learning. And these still require the same data preprocessing and transformation
steps, but by using kernels you can introduce the non-linarites in an extremely flexible
way. And there are very powerful ways of tuning the model complexity to match the complexity
of the problem. So, they work remarkably well also without a lot of carful design work.
So, a challenge is can we have our cake and eat it too? Can we write down probabilistic
graphical models with latent variables in them that describe processes we care about
and yet also have the benefits of these non-parametric methods? And this is a major open problem
in machine learning. And there are several efforts. There’s been a lot of work recently
in the SBM family. There’s Basing non-parametrics that use mixture models. The approach we’re
exploring is boosted regression tree.
So, I don’t really have a lot of time to describe booster regression trees. But they
grew out of boosting work in machine learning. And then first Mason and then Friedman Jerry
Friedman and Sanford noticed that there, that these could really be viewed as part of a
generic algorithm schema where you’re going to fit a weighted sum of regression trees
to data. And so he develop this thing called boosted tree regression or tree boosting.
So, the standard approach in these occupancy models is to represent these functions f and
g as log linear models or linear, logistic regressions. What we’re going to do is replace
those functions f and g with non-parametric flexible models, boosted regression trees.
And this can be done using this algorithm schema called functional gradient descent
or you could do functional EM actually also. And we had a paper at Triple AI last summer
that describes the method. So, I’ll just give you a little flavor for the results.
Of course there are methodological problems for studying latent variable models. And that
is that you don’t know the true values of those variables. They’re hidden from you.
So, how do you know whether you’re doing well? And so I’m going to describe results
for one synthetic bird species where we simulate a species using real data but faked occupancy
and faked things. So, we made this model additive, but non-linear. And this is a scatter plot
showing on the horizontal axis the true occupancy probabilities for this simulated species.
And on the vertical axis what different families of models predict. So, the left column is
models that are trained without latent variables treating it as a supervised learning problem.
And you can see that they systematically underestimate the true occupancy probabilities because they
assume the only positive examples they saw were the cases when you actually detected
the bird, which is obviously an underestimate of what’s really going on. In the right
hand column are ones that are using this latent variable model, the Occupancy Detection Model,
the OD Model. And then the top row are where we’re using logistic aggression as our peramitization.
And you can see that on the top right, it’s more or less unbiased. So, the true probabilities
and the predicted ones more or less lie on that diagonal line which is where they should
be. But there’s a lot of scatter and that’s because the true model is non-linear and we’re
fitting a linear model. Whereas if we use the booster regression trees on the bottom
we’re doing a lot better. We’re much closer to the line. I’d like to omit a couple of
the points that are far from the line. But otherwise we’re pretty happy with that fit.
And so, in general this is what we find is that we can train these flexible booster regression
tree models within a graphical model’s framework and get more accurate results. And so we’ve
been applying this to several bird species data.
So, looks like I’m running tight on time here. So, let me briefly just describe the
final problem which is managing fire in Eastern Oregon. Conveniently this is the problem where
we don’t have any results yet. So, I shouldn’t have said anything, you wouldn’t notice.
But, so this is now a policy problem, not really a data problem.
So, you know since the late 1910’s, 1920’s the U.S. Forest Service had a policy of suppressing
all fires essentially. It was part of the kind of political argument that was used to
sell the creation of the Forest Service was that we will prevent these terrible catastrophic
wildfires. Of course it turns out you can’t prevent them. You can only postpone them.
And that’s now coming to pass that our forests are filled with, we believe that the sort
of natural state of forests particularly in eastern Oregon, we should look something like
this where we have very large Ponderosa Pines, and then what’s called an open understory,
so just very small vegetation on the ground.
I don’t have a picture for it, but what we have right now is because fire has been
suppressed for a long time we have all kinds of vegetation on the forest floor. And we
have small trees of all different sizes, logical pines in particular that’s grown up among
these Ponderosa Pines. And so when you have an open ground like that and a fire happens
it burns through the ground and actually maintains that openness. But the Ponderosa Pines have
this big, thick fire resistant bark. And they’re actually happy with this fire coming through
and getting rid of some of their competitors. But what’s happened, since that hasn’t
happened now when a fire happens it is able to climb up the smaller vegetation, reach
the crown, and actually destroy the forest, kill all the trees. And you end up with the
really very intense catastrophic fires. And so one question is, is there anything we can
do to manage this landscape. And so we have a steady area in eastern Oregon, that’s
divided up into about 4,000 cells. They’re irregular shaped. They’re based on homogeneity
of the landscape there. And there are four things you can do to each of these cells each
year. You can do nothing. You can do what’s called mechanical fuel treatment. So, you
send people in and they cut down a lot of that small vegetation and card it out. You
can do clear cutting where you harvest the trees, but you leave behind a lot of debris,
and that actually while it gives you timber value it actually increases fire risk. Or
you can do clear cutting and fuel treatment and then fire just can’t burn at all in
that area at least for a few years.
So, the question is how should we position these treatments in the landscape if we want
to say minimize the risk of big catastrophic fires and maybe maximize the probability of
these low intensity ground fires. Well we can think about this as kind of a game against
nature. In each time step we can observe the current state of the landscape. Maybe this
is like a fire risk map. And then we choose an action. We have to choose an action, which
is actually a vector of actions. One action in each cell. And then nature takes, so these
are the actions maybe we choose to treat these particular cells. And then nature has its
turn and it lights fires and burns them. And then it’s our turn again.
And so we can model this as a big markup decision process. But unfortunately it’s a markup
decision process with an exponentially large state space. So, if each of these cells in
my landscape has five tree ages and five fuel levels then I have twenty-five to the four
thousandth power of possible states of the landscape, which is not going to fit into
memory very easily. And similarly, each time I take an action, I have an action vector
that has got four thousand elements and each with four possibilities in each position.
So, I have four of the four thousandth possible actions to consider. Even with all the cleverness
of the reinforcement learning community and approximate dynamic programming we don’t
know how to solve these problems.
There’s been a little bit of work. There was a paper by Wei, et al. a couple of years
ago when they looked at just a one year planning problem. So, if I just had one year to make
treatments and then there’s going to be fires in a hundred years, where should I put
my treatments? And they were able to formulate and solve a mixed integer program for this
optimal one-shot solution. They were just completely trying to prevent fire, which is
really not the right problem. But any case, we’re trying now to see whether we can build
on that work or come up with some method where we can solve this MDP over a hundred year
horizon.
Okay. So, in summary I’ve talked about this pipeline for the ways computation could help
in addressing problems in ecology and ecosystem management. I’ve talked about automated
data cleaning, about fitting these flexible models within a latent variable modeling framework.
And then very briefly about policy optimizations. And as I mentioned this is part of our larger
effort in what we call computational sustainability. And there are many other opportunities to
contribute to. You know I haven’t talked about energy. I haven’t talked about sustainable
development or smart cities or any of these things. But there are lots of computational
problems there as well.
I’d like to point out that the Computing Community Consortium I think the CCC is funding
some travel grants and prizes for papers in this area at several AI Conferences. I know
about the ICML and Triple AI, but I think there are some other conferences where they’re
doing this this year. So, there’s a special track for that that you could submit to. And
my joint grant with Cornell, we have created something called the Institute for Computational
Sustainability. And we have a website with all kinds of information about what’s going
on, not just in our own research, but throughout the computer science community.
And I’ll just thank the people that I mentioned at the start of the project. On the fire project
there are two other graduate students Rachel Houtman and Sean McGregor who have been working
there and of course the National Science Foundation that has been very generous here.
Well, thank you for your attention and I’ll answer questions. So, how does this work local
versus remote? So, what we usually do is give the remote sites a chance to go first, because
they might lose the connection later on.
Okay. Remote sites? Go ahead.
(Question being asked)
Okay. Yeah. So, what they do is they run several thousand fires, simulated fires. And try to
calculate for each cell in their landscape the probability that it will burn. And they
decompose that into the probability that it will burn because the fire ignited in that
cell. Or the probability that it will burn because fire propagated from one of its neighbors.
So, they can basically build a sort of probabilistic flow model that says the probability that
this cell will burn conditioned on whether its neighbors burned. And then they can model
a fuel treatment, which they model simply as if I treat this cell then no fire will
be able to propagate through that cell. Okay, And so with a couple of other approximations
they can turn this into a flow problem basically that we want to prevent flow from sort of
the total flow we want to minimize subject to some budget constraint about how many cells
we can afford to treat. And so that, they basically then have one integer variable for
each cell, and they have an objective and then they can solve it. I mean in our case
there would be four thousand integer variables, which would be a little bit scary. Their problem
I think had more like nine hundred cells though. So, it’s still quite substantial. But you
know sea plex is a wonderful thing. And so it was able to find the solution to that.
(Question being asked)
Uh huh. Okay. Right. Well this was one chat as opposed to sequential decision making.
So here we just get one, we just get one time step at which we’re allowed to take actions.
And then from there on out nature just gets all the moves in the game. So, that’s the
sense in which it’s a single decision, single one-shot plan, totally upfront planning in
other words. And there are a lot of problems in ecology where we end up having to take
that view that we’re just going to say we want to buy all the following territory. So,
we’ve looked at some, there’s an endangered species called the Red Cockaded Woodpecker
that I believe is here in North Carolina. And my post doc Dan Sheldon did some very
nice work where the question was there are two pockets of this species, one I think at
Camp Legune and the other in the Palmetto Palm Reserve or something like this. And the
question was could they buy a series of intermediate sites to encourage those two species to mix
and have some genetic flow between them. So, it’s a problem of basically trying to encourage
flow instead of trying to prevent flow. And they were able to also formulate this and
solve it for the one-shot case in terms of building a network that would maximize flow
subject to budget constraints.
But the real problem, you can’t buy all the property all at once. You don’t have
the money and it isn’t all available. So, you really need to have be online and every
year take some actions that you can afford to take to keep moving toward that objective.
So, turning that into a Markov decision problem or what’s often called active management
in the you know environmental literature that’s still an open problem. We don’t know how
to do that.
(Question being asked)
That, yeah that is a good question. And we do wonder is there some way we could come
up with some set of sort of spatial basis function that would let us for instance represent…suppose
that we had an optimal policy for laying out treatments in landscape could we, but we could
only compute it for a particular fixed landscape, could we somehow generalize from that to a
more general policy and maybe some kind of set of spatial basis functions would allow
us to do that. And the same is true for looking at yeah the sort of structure of the landscape.
There’s certainly a lot of work done in atmospheric sciences and weather where they
basically use PCA to create a set of basis functions that they can use then to approximate
a lot of things. So, it’s something we’d like to explore more.
(Question being asked)
I’m sorry. Right. Well particularly here we’re intervening in the system, and so
yeah the trouble is that we, you have this research base where if I take these actions
then these fires will burn. If I take these actions something else will happen. And you
end up having to do exponentially many simulations just to simulate one set of circuitry. And
so obviously we have to rely on some kind of sampling or some kind of way of capturing
the spatial scale where we beyond which we can ignore the spatial components. It’s
not clear really how to proceed.
(Question being asked)
Well that is a very good question. Right now we’ve mostly been looking at just this one
site. And we have the weather data and all the data about the sites, which we need to
be able to do the work. And it’s a good question whether they are generalizable lessons
that you could take away from this. One, I also have some projects in evasive species
management and we’re asking the same question there. And often it’s kind of disturbing.
I mean you get a solution like this big map here wherever it was that says well these
are the places that are the optimal places for me, but how do you, is there any pattern
to that? Is there any way that we could explain that as a sort of a set of rules that we could
apply to a different situation? How could we generalize from this particular landscape?
And we need, we need to do that just to explain it to our domain experts. And obviously policy
makers are not going to be happy just being told well it’s optimal. Our algorithm said
so. Particularly because we won’t be able to say that. We’ll have to say it’s approximately
optimal, but we don’t know how bad or something like that. And so we’re really going to
need to be able to give them something qualitative understanding and let them be able to play
with it, and modify it, and explore, and understand you know how good it is. And that’s a huge
challenge to just explain you know once you’ve done ten million simulations what lesson can
you take away from it. Okay.
So, I’ve got a question that maybe ducktails with that.
Uh huh.
So, to what extent do you feel like these techniques and the recommendations or policies
that you’re producing using these techniques are getting traction with the people who are
actually implementing policy decisions and you know is it something where you feel like
you’re having impact now, you feel like maybe it’ll be five years, ten years, how,
you know what time scale are we talking about here?
Um I would guess in five to ten years. I mean we’re very fortunate with the forest situation
that we have some of the Forest Service people on our team. And a lot of them are former
students of Claire Montgomery who was on the team. And so we have a nice working relationship
with them. And, but the question is, that is a good question whether they would ever
be able to execute our particular policies. I think one of the main things we’re trying
to do is give them backup ammunition for being able to support the actions that they are
taking. Right now the idea that they might want to treat the landscape in a particular
way or in a related problem they might want to let a fire burn instead of suppressing
it, that’s an extremely controversial politically difficult decision. If we could provide some
analysis that shows that yes, under a wide variety of scenarios that would be a, it would
be better to let this fire burn, or it’s better to treat this than those other things.
And that might help them persuade their stakeholders to go along with it. Of course another thing
that would help them persuade their stakeholders is if we could say well and for these small
communities that have timber mills we can also guarantee you a certain economic benefit
from doing this. And so there’s a whole set of economic objectives perhaps that we
would like to have. We don’t maybe also like to have a whole bunch of endangered species
habitat objectives. So, the real problem you know gets messier and messier. But we won’t
be able to attack any of those unless we really can come up with a methodology that works
for these problems.
What you just laid out is a hard scenario for any algorithm to work you know, a procedure
to optimize. But as it is it has to be optimized by humans. I mean in other worlds there are
people actually making decisions about whether or not to let a fire burn. And they have to
process all of it. So…
Right.
I mean.
Well mostly they are not letting fires burn, because it’s just too risky and plus the
firefighting money doesn’t come out of their budget. It’s somebody else’s budget. So,
there’s not really an incentive for them. For the fuel treatment though you’re right.
Right now they are making some guesses about where to treat, trying to balance all of these
issues, and I would say they’re not very happy with that. They would like some more
rational way, basis for making those decisions.
Yeah. I guess my point was you may not have to get optimal. You may just have to do better
than humans guessing.
Right. Well, but we have to convince them that we are doing better yeah. And that comes
into a lot of this broader contextual thing as well.
Yes.
You sort of apply the basic approach. Could you just take the particular plans or policies
that they are using or thinking of using as a prior and then go from there and simplify
you’re model because you’re, you’re working from a targeted assumption…
Uh. Huh.
base.
Oh that’s an interesting idea, yeah, would be to see if we could in some sense model
what they’re doing and then ask locally how could we improve it, without maybe without
walking too far away from it so it doesn’t look so strange or threatening. No we hadn’t
thought about that, but that’s an interesting idea. Okay.
Thanks.
Well thank you very much. My pleasure.