Lecture 2
How good is this assumption? What are some places where it doesn’t work? Does it work in the stock market? What about weather? Remark: What’s a good non-Markov process? For example, suppose you had $N$ closed boxes, and a prize in one of them. Call the location of the prize $Y$. Suppose ${X_n}_{n=1}^N$ is the Markov chain and each time, you pick randomly from the unopened boxes until you hit the prize. Once you hit the prize, $X_n$ stays put. Given that the prize has not been hit, the $X_n$ depends on the past sequence of boxes opened. This ought to be Markov unless there is some freak accident. But anyway, this is a pain in the butt to check.
Here’s a simpler example. Suppose you toss a fair coin; this is your zeroth toss. If it shows up heads, you pick up a coin with probability $p$ of heads and continue to toss it. If it shows up tails, you pick up a coin with probability $q$ of heads and continue to toss it. Let $X_i \in { H, T}$.
Lecture 3
How would you understand the long term behavior of the system? You would want to set $n$ to infinity. Suppose the limit exists $\lim_{n \to \infty} \phi \P^{n} = \pi$; $\pi$ is a vector of probabilities.
That’s why $\pi$ is called the invariant probability distribution.
Lecture 4
Remark I will give them problems 1.1, 1.2, 1.3 and 1.4 to solve in class. This is a in-class tutorial.
Lecture 5
Let’s introduce two important examples, the random walk with reflection and the one with absorption.
Ex1: States ${0,\ldots,4}$. $0 \to 1$ with probability $1$ and $4$ to $3$ with probability $1$.
Ex2: States ${0,\ldots,4}$. $0 \to 0$ with probability $1$ and $4$ to $4$ with probability $1$.
Note the periodicity of the states. What’s the long time behavior of $P^n$ on both these examples.
Ex3: Suppose $S = {0,1,2,3,4}$ and simply write down the transition matrix that splits into two sets of states.
I will show them a video of Plinko and then do the random walk with reflected barriers. Here’s a video with skinny Drew Carey.
Then I will do an introduction to python. Tell them to get help online from pdfs, use stackoverflow and stackexchange to ask questions and then come to me with questions. Import numpy and scipy, two numerical libraries.
The random walk shows them clear periodicity. Show them how to do the random walk using python too.
Lecture 6 This sort of equivalence relationship allows us to divide any space into separate sections. You can do it to groups, vector spaces, whatever. So usually you would write this as $S / \sim$, when $S$ is the state space.
Irreducibility. If the chain consists of only one communicating class, then the chain is irreducible. Recall examples $1$ and $2$. Notice that any matrix satisfying
is irreducible. Remember how we showed these two properties. Example $1$ (reflecting RW) does not satisfy the theorem (it has period $2$ as we shall see), but has one communicating class. Example $2$ (absorbing RW) has three.
Classes can also be either recurrent or transient. Again remember the absorbing random walk that illustrates these two.
Write down the general transition matrix of a Markov chain. It must have some recurrent classes and some transient classes.
Remark Maybe as HW it is good to show that if a Markov chain starting in a recurrent class never leaves it.
Lecture 7 I made a mistake in the periodicity proof. To fix this, simply draw a picture.
Remark I ought to give them a bunch of gap filling exercises in the new HW.
Irreducible, aperiodic chains. Big theorem. In the big theorem, note that is says that $\pi(i) > 0$. Where does this come from? It comes from the Perron-Frobenius theorem.
Remark Is there any easy way to directly prove that $\pi(i) > 0$?
In any case, the proof goes as follows. For each state $i$, there is an $r(i)$ such that $p_{n}(i,i) > 0$ for all $n > r(i)$. This is because $d = 1$; this is where aperiodicity is used. Then there is an $m(i,j)$ such that $p_{m(i,j)}(i,j) > 0$ because the chain is irreducible (there is only one communicating class). Pick $n = \max_{i}(r(i)) + \max_{i,j}(m(i,j))$. Then, We ensured that we can always return to $i$, and then make the jump from $i$ to $j$! This means that all the entries of $P^n$ are positive.
Reducible or periodic chains. Divide into recurrent states and transient states. Assume for the time being that each recurrent state has a stationary distribution $\pi_k$. Then $p_n(i,j) \to \pi_k(j)$ if $i,j$ are in the same recurrent state. What happens if $i$ and $j$ are in different recurrent states? What if $j$ is in a transient state? What happens if $i$ is in a transient state and $j$ is in some recurrent state? In this last case, let $\alpha(k)$ be the probability that $i$ ends up in recurrent class $k$. Let $j \in R_k$ the $k$\textsuperscript{th} recurrent class. Then
Now we return to the periodic behavior. Show them the python notebook with the reflecting states. In this case the stationary distribution does not represent the limit of $P^n$. In fact
Write down the generalization to period $d$.
Remark Note also that $\pi(i)$ represents the fraction of time spent in state $i$.
Suppose $X$ is a irreducible chain.
Let $Y(n,i)$ be the amount of time spent in state $i$. Then compute $\frac{1}{n} \E[ Y(j,n-1) | X_0 = i ]$ because it’s related to the transition probabilities. In fact, |
But we’ve show that (including the periodic case) that these averages must go to $\pi(j)$!
Remark I ought to give this as an exercise.
Now we will relate the stationary distribution to the return time to a state. This is a beautiful argument. Define $T$ to be the return time to state $i$; $T_k$ is the $k$th return. Then these returns are like independent rvs. Clearly $ k^{-1} \sum_k T_k \to \E[T]$. In other words, we have $k$ returns to state $i$ in approximately $k \E[ T ] $ steps. This means that the fraction of time we’ve spent in state $i$ is
Do the two state example. The distribution of the two state example can be written down explicitly.
Remark: Again, this is HW. This is because there is a good review of taking expectation and whatever.
Next time, transient states, gambler’s ruin etc.
Lecture 9
Announce that transient states is not included. Remind them about the quiz this Friday.
Draw a matrix with
We wanted to ask the question, suppose there are at least two different recurrence classes. Then, if we wanted to find $\alpha(t,r_1)$, then we would get a cool formula for $A$, the matrix of $\alpha$ by conditioning on the first step. This gives me
This is quite an interesting formula, that I can’t seem to get directly!
Then do absorbing random walk again, starting from the state $1$. It should give you a $3/4$ probability.
As a final example, I ought to do Gambler’s ruin. For gambler’s ruin, I need to show them how to solve difference equations. This I will do on wednesday.
Gamblers ruin, SRW on a circle, Urn model, Cell Genetics and Card Shuffling. Remark: I didn’t do the time to being absorbed. This should be on HW too.
Random walk on a circle, some nice questions about the cover time of the circle. Has a nice mix of conditioning and stuff.
Urn model. Would be good to show that the stationary distribution is the binomial distribution. Remark: should be on HW.
Remark on the mixing time of Card Shuffling and tell them the story of Persi Diaconis.
Maybe show them the google algorithm.
I spent the previous week discussing exams and starting simulation from Chapter 11 in Ross. The exam had two problems, both on classifying states and drawing diagrams for Markov chains.
In the programming section, I’ve covered
Then I did sampling of uniform random variables in areas on the plane. This would be used in generating a normal random variable in 2D.
Remark: Ask them in HW how this thing would be a uniform random variable.
Finally, I asked them to simulate the reflecting random walk, and estimate the stationary probability by
a. directly simulating the markov chain and looking at the proportion of time spent in each state. b. sampling from the stationary distribution using ‘coupling from the past’
Final Week of September I held a final week of programming, where they learned a couple of different ways to simulate Markov chains. Essentially, they looked at coupling from the past where you could directly simulate from the chain.
Now, I’m moving onto countable Markov chains.
We finished countable Markov chains rather quickly, and I’m finishing up branching processes right now.
While reading about the branching process, I also learned about the things Francis Galton did. Quite remarkable! Eugenics, “regressing towards the mean”, Plinko (that we saw earlier in the course), was Darwin’s cousin.