This video treats a topic called absorbing Markov chains. An absorbing Markov chain is
a chain where’s there’s at least one state you get stuck if you get there.
A state like that is called an absorbing state. It’s a state that has the property that
once you get into the state it’s impossible to leave it.
Here’s an example of an absorbing Markov chain where state number one is an absorbing
state. If you look at the transition probabilities corresponding to the first row you see that
when you are in state one the probability is one that you’ll remain is state one and
zero that you’ll go to either one of the other two states. In this example state number
one is an absorbing state.
Let’s think about tracking students through a university. We have the four standard grade
classifications (freshman, sophomore, junior, and senior). When a student successfully completes
school they enter the graduated state and, of course, some students drop out. We’ll
have the four states corresponding to the four class years, we’ll have a graduated
state, and we’ll have a dropped out state. The data that we’re going to use in our
example is what’s shown right here in the transition diagram. Let’s look at the transition
diagram a bit just to make sure we’re interpreting the numbers correctly. For someone who enters
as a freshman, during their first year there’s a ten percent chance that they will fail their
year and remain classified as a freshman the following year. There’s an eighty percent
chance they’ll successfully complete their freshman year during that year’s time and
move on to become a sophomore. Or there’s a ten percent chance that they’ll drop out.
Notice that we don’t have an arrow going from freshman to the graduated state because
you can’t graduate in one year. Nobody that’s a freshman is graduated at the end of one
year. Read through the rest of the transition diagram and make sure that you’re clear
on all the transition probabilities that are shown here. With seniors, for example, ninety
percent of seniors graduate, five percent drop out, and five percent of them have to
repeat their senior year. Notice also that nobody ever moves backward. Ninety percent
of juniors become seniors, but of course once you’re a senior you never go back to being
a junior again. Now what about the graduated and dropped out states? We’re going to treat
both of those as absorbing states. If you look at the graduated state we have a one
on the arrow pointing from G back to G (which says that if somebody has graduated then at
the end of the next year they remain graduated). We’re going to treat dropouts the same way.
If a student drops out then they’re still dropped out at the end of the next year. In
other words, they don’t get readmitted to school. That’s a bit unusual. Most schools
of course don’t really have a policy like that, but for purposes of our example we’re
going to treat both graduated and dropped out as absorbing states.
That leads to a transition matrix that has the pattern shown here. I’m listing the
two absorbing states (graduated and dropped out) as states one and two, and then the other
states appear after that. Notice that the matrix has a particular pattern to it. If
we list our absorbing states first, then we get a little identity matrix up in the top
left hand corner. The size of that piece will be determined by how many absorbing states
we have, and then this upper right hand corner has nothing but zeros in it. Then we have
these other two pieces down here. The transition matrix splits into those four blocks. That
will always be the case if you have absorbing states and list your absorbing states first.
We’re going to be working with those four blocks, so we in fact are going to be giving
them names. Since this upper left hand corner is an identity matrix of some size determined
by how many absorbing states we have, we’ll called it I. This top right hand corner is
just in this instance it’s two rows and four columns for this little matrix up in
the corner. It’s a zero matrix of some size depending on how many absorbing and non-absorbing
states we have. This bottom left hand piece, which in this instance has four rows and two
columns, is matrix we’ll call R. Then the bottom right hand chump, which will always
be a square matrix (in this instance four by four) we’ll call Q. Now what we’re looking at on this slide
is the transition matrix raised to the third power (approximately). I’ve done some rounding
off here just because some of the decimal expressions were getting too long. This is
approximately T-cubed. Let’s think about the meaning interpretation of some of these
numbers that appear in the matrix T-cubed. For example, the point six one two here, that’s
the probability. Remember we learned back in section six point two that for T cubed
the numbers in T-cubed can be interpreted as the three step transition probabilities.
This point six one two is the three step transition probability from the freshman state to the
senior state. Which to put in more natural language would simply be saying it’s the
probability that someone who’s a freshman will become a senior in three years. That
would be where a freshman would be in three years if they stay on track, right? Because
the first year they would go from freshman to sophomore, the second year from sophomore
to junior, and the third year from junior to senior. This point six one two is the probability
that someone who starts as a freshman does make it to being classified as a senior within
the three years time frame, which they would make it to that if they were operating on
schedule. This little left hand bottom piece is of special interest because it indicates
the probabilities of hitting each of the two absorbing states (graduated and dropped out).
For example, for someone who starts as a freshman, we’re looking at three step probabilities
so this would say someone who starts as a freshman can’t possibly graduate in three
years, but there will be a point one nine three probability that someone who starts
as a freshman will drop out within their first three years. Someone who’s classified as
a sophomore within three years has almost a seventy percent chance of graduating (about
a fourteen percent chance of dropping out). Someone who’s a junior has a ninety percent
chance of graduating within the following three years and about a ten percent chance
of dropping out (and so forth). If you’re interested in which state a student winds
up in, this portion of T-cubed is of particular interest if you’re tracking the student
for three years and want to know what has become of them by the end of that time.
An interesting question and probably the most important question that can be asked is if
you’re looking at the distribution of data that we have in the transition matrix, what
kind of techniques can you use to determine the long term probabilities for entering each
of the absorbing states? It’s obvious that you would want to know that, isn’t it? Someone
who’s thinking about going to that school would be very interested in the question “If
I go to that school what are the chance I’ll eventually graduate?” There’s hardly any
information that would be more relevant than the answer to that question.
Fortunately there is a recipe for doing this and we’re going to track through that recipe
which involves some matrix information. Remember we slice the matrix up into the four pieces
shown here. Step one of the recipe is we’re going to take this bottom right hand chunk
Q, which in our example is four by four, and subtract it from the identity matrix of the
same size. In this matrix expression, I minus Q, we’re not really talking about this minus
this because their different sizes. This identity matrix being identified in this picture is
a two by two identity matrix. Understand we’re talking about I minus Q, we’re talking about
this four by four matrix Q and we’re talking about subtracting Q from the four by four
identity matrix because you have to have two matrices the same size if you’re going to
subtract them. The first step is a simple matrix subtraction. Four by four identity
matrix minus this four by four identity matrix Q. The next step is to find the inverse of
that. Having done the subtraction to find the inverse of that, and that’s what’s
called N. The matrix N is the inverse of that matrix we computed in step one. The final
step in the recipe is to multiply the product N times this matrix R, and that product is
called B. Let’s think about the sizes to make sure everything makes sense here. Q is
four by four, I minus Q is four by four, so the inverse of that will also be four by four.
N is a four by four matrix. In this example R is a four by two matrix, so we’ll be multiplying
a four by four matrix times a four by two matrix (which of course is possible), and
we’ll get when you multiply a four by four times a four by a two you get a four by two
matrix. The matrix B that we’re calculating in step three will in this example be a four
by two matrix (four rows and two columns).
In this example when you carry out the recipe for I minus Q we get the matrix shown here.
R we don’t have to do any calculation to get that, we just copy it down from the original
transition matrix. We know what R is. Remember we have to find the inverse of this guy. You
can find the inverse of that using the matrix tool and you come out with a matrix, which
rounded off a bit is shown at the top. The final multiplication (N times R), this four
by four times the four by two matrix R that we saw in the previous slide, and we come
out with the product matrix B that looks something like what’s shown here. This is the matrix
that contains a lot of valuable information.
To interpret that information what we do is we label the rows of B with the names of the
non-absorbing states in the same order that we had listed them in the original transition
matrix (freshman, sophomore, junior, and senior). These were states three, four, five, and six
in the original transition matrix. We label the rows with those same states in the same
order and we label the columns with the names of the two absorbing states. Again taken from
the original transition matrix keeping the same order. We originally had these as states
one and two in the original transition matrix. They’re are our column labels. With this
labeling, the numbers in this matrix B can be interpreted as the long term probabilities
from going from any of the non-absorbing states into any of the absorbing states. For example,
the point seven five three says that there’s about a seventy-five percent chance that someone
who enters as a freshman will eventually graduate. About a twenty-five percent chance they’ll
drop out instead. The point eight nine eight in it’s position is saying that for someone
who is currently a junior, there’s almost a ninety-percent chance that they will eventually
graduate and only about a ten percent chance that they will drop out. Obviously the long
term prospects of a junior are better than the long term prospects of a freshman simply
because they’re already much closer to graduation. The important point is that the numbers in
the matrix B show the probability of eventually arriving in each of the absorbing states depending
on which column you’re looking at from any of the non-absorbing states depending on which
row you’re looking at. The point one zero two is the probability that someone who is
classified as a junior will eventually drop out rather than graduate.
Now to move to an entirely different example. Sally and Becky are playing tennis. Deuce
is a score in tennis which says that in the game that’s being played, the two players
are tied with the same number of points and then at that point someone has to get two
points ahead in order to win the game. Now the assumption we’re making in this example
is that when the game is tied at deuce, Sally has two-thirds probability of winning the
next point and Becky has one-third probability of winning the point. On the other hand when
Sally has advantage, when she is a point ahead, then he has probability three-fourths of winning
the next point. When Becky has advantage, that is when Becky’s a point ahead, she
has probability one-half of winning the next point. The probabilities of either of them
winning the next point depends on what state they’re in relative to the score. We’re
going to set this up as a Markov chain with the absorbing states being Sally wins the
game and the game is over or Becky wins the game and the game is over or Sally’s advantage
(where Sally is a point ahead) or Becky’s advantage (where Becky’s a point ahead)
or deuce (where the score is tied). The play can go on for a long period of time. It has
to go on until somebody gets two points ahead (that’s the nature of the game of tennis).
One other question we’ll look at is if the game is at deuce, find how long the game is
expected to last and the probability that Becky wins. The second question we’ll look
at is if Sally has advantage (if Sally’s a point ahead) what’s the probability that
she eventually wins the game?
We set up names for the five states. The game can be at deuce, it can be Becky’s advantage
(Becky a point ahead), Sally’s advantage (meaning Sally’s a point ahead), or Becky
wins (which is an absorbing state because at that point the same is over), or Sally
wins (which is the other absorbing state because, again, the game is over is someone has won).
The transition diagram is as shown here based on the probabilities that we saw in the original
setup. If the game is tied at deuce there’s a two-third probability that Sally wins so
the score goes to Sally’s advantage. There’s a one-third probability that Becky wins the
point so it goes to Becky’s advantage. If it’s at Sally’s advantage there’s a
three-fourth probability that Sally wins the point (so Sally wins the game). There’s
a one-fourth probability that Sally wins the point, so it goes back to deuce. If it’s
Becky’s advantage then there’s a one-half probability that Becky wins the point (so
Becky wins the game). There’s a one-half probability that Sally wins the game so the
score goes back to deuce. Check all the number here to make sure that you agree that the
transition diagram we’ve drawn is consistent with the information we had to start out with.
Putting those numbers into a transition matrix (the transition matrix is the one that’s
shown) the top left corner is two by two identity because we again have two absorbing states
(all zeros in the top right piece). The lower left block R has three rows and two columns,
and the matrix Q in this example is a three by three matrix.
Following the recipe, we start with Q, subtract Q from the three by three identity matrix,
which takes us to here. R is just the lower left block we’ve copied over, and then we
find the inverse of R (a three by three matrix), which is the matrix shown right here.
The multiplication that’s done is to multiply that inverse matrix N times the block R (which
was taken from the lower left corner of the original transition matrix) and the result
of that matrix multiplication is the three by two matrix shown here. The number of rows
is the same as the number of non-absorbing states that we had in the original transition
matrix and the number of columns is the same as the number of absorbing states.
We label this matrix B with the row labels coming from the non-absorbing states in the
same order as in the original transition matrix and the columns being the absorbing states.
N itself is a square matrix and we label the rows and columns of N with the names of the
non-absorbing states. All the information we’re interested in, in problems that involve
absorbing Markov chains comes from one of these two matrices. The long-term probabilities
come from the matrix B. The meaning of the one-fourth is if the game is tied at deuce
(which is one of the non-absorbing states), then the long-term probability that Becky
wins the game is one-fourth. The long-term probability that Sally wins the game is three-fourths.
Another way of saying that in a natural fashion is to say that if the game is at deuce, Becky
has a one-fourth probability of eventually winning the game whereas Sally’s probability
of eventually winning the game is three-fourths. If it’s Becky’s advantage, then the chances
that Becky eventually wins the game is five-eights and Sally’s, three-eighths (and so forth).
This is where we go to get the long-term probabilities based on whatever assumption we want to make
on whichever non-absorbing state we happen to be in. We’re also in this example going
to illustrate the use of the matrix N. The numbers that appear in N represent long-term
expected values. For example, the three-fourths says that if the game is at Becky’s advantage
(Becky’s a point ahead) then the expected value for the number of times the game will
be at deuce prior to somebody winning the game is three-fourths. The five-fourths in
the bottom right hand corner says if the game is at Sally’s advantage (so if Sally is
a point ahead) then the expected number of times for that to be the score (in order words
the expected number of times that the game will be at Sally’s advantage prior to somebody
winning) is five-fourths. Notice that all of these numbers are bigger than one. They
have to be at least big as one because, for example, if you’re at Becky’s advantage
then it’s already one time that you’re at the score (so you’ll be at that score
at least that one time before the game ends). In fact this number is one and a quarter (one
point twenty-five), so the expected value for the number of times you’ll be at that
score before the game ends is a little bit bigger than one because it could go from Becky’s
advantage back to deuce and then back to Becky’s advantage again. The other interesting thing
about this matrix is that you can add up the numbers in any row to get the long-term number
of steps for the game to end. Looking at row-two, we’re looking at the scenario where it’s
tied at deuce. On average the score will be Becky’s advantage point five times before
the game ends. It’ll be deuce one point five times before the game ends and it will
be Sally’s advantage one time before the game ends. If you add up all of those numbers
you have the expected value for the total number of points played prior to the game
ending. In this case you add up the numbers in the second row and you get three. That’s
saying that if the score is tied at deuce, then on average they’re going to play three
point more in order for the game to end. The game could end in two points. Sally could
win the next two points and it would be over, but it might go back and forth and back and
forth and back and forth. They might play a large number of points before the game ends.
It averages out to three. If it’s at deuce the number of points played for the game to
end averages out to three. You could similarly add up the first row or the third row to find
the average number of points to be played if you’re starting from Becky’s advantage
or from Sally’s advantage.
If you go back to the example of tracking students and think about that same kind of
question, in that example the matrix N turned out to be what’s shown on this slide. If
we go back and label N with the labels form the non-absorbing states (freshman, sophomore,
junior, and senior), then what you have is the average number of years spent at each
of the classification levels (prior to either dropping out or graduating). Let’s look
at some of these numbers and interpret them. Let’s look at a freshman. What does this
number say? It says on average a freshman will spend one point eleven years classified
as a freshman prior to either dropping out or graduating. We know there’s one year
because they’re already a freshman right now. That year will contribute one year, but
there’s a possibility of more than one because they may fail their freshman year and have
to repeat it again. On average freshman spent one point eleven years classified as freshman.
What about this point eight eight four? The freshman might drop out before ever making
it to the junior year so they may not transition through the junior year (which is why that
number’s less than one). There’s no guarantee that they’ll be ever classified as a junior.
Of course they may be a junior more than one year. They may be a junior and fail that year
and have to repeat as a junior the next year, so it’s possible that a freshman could spend
more than one year as a junior (but it averages out to a little less than one year because
some freshman never make it that far), and even less for a senior. Why is there a zero
down in this spot? Well if somebody’s already a junior, then the expected number for the
expected number of additional years they’ll spend as sophomore is zero. They’ll never
go backwards to being a sophomore. If someone is a senior, why is this number a little bit
bigger than one? Well, they’re already guaranteed their one year as a senior since they’re
already a senior and if they fail their year they’ll have to repeat as a senior the next
year also. That has a small probability of happening, which means the average will be
a little bit more than one year that a senior will be in the school. Let’s think about
the row sums. If you add up the numbers in the first row you get about three point eight.
What is that saying? It’s saying that for somebody who’s a freshman, if you add up
the expected value for the number of years they’ll spend as a freshman, sophomore,
junior, and senior, you get about three-point eight as the expected value for the number
of years they’ll spend in school. Of course it will take them four years at least to graduate.
For someone who’s graduating it will be one year as a freshman, one as a sophomore,
one as a junior, and one as a senior guaranteed. Possibly additional years if they have to
repeat any grades. But on average how can this average out to be less than the four
years required for graduation? Well it averages out of that simply because some students drop
out before graduating (so they’re in school for a much shorter time). Remember these are
averages (expected values). For somebody who’s a sophomore, of course if you look at the
number of years to graduate, if they’re going to graduate then they’ll spend the
sophomore, junior, and senior years doing it. That’s at least three years. Not all
of them will graduate but for somebody who has made it successfully to the sophomore
year, the expected value now turns out to be a little more than three because the chances
of somebody repeating slightly more than offsets the possibility of dropping out. Some of the
sophomores will spend less than three years because they drop out. Some will spend more
because they have to repeat. It averages out to a little bit more than three years, and
similarly the junior row sum indicates that on average somebody who’s a junior will
spend a little more than two years in the school and for seniors, one point O point
three.