MATH 5040/6810: Homework


HW 6

Due Dec 2

Do problems 3.1 3.4 3.6 3.8 3.11 and 3.12 from Lawler’s book.

3.1

a. From the Poisson distribution a. The number of calls in the second hour is independent of the first. Therefore, a. We know the times $T_i$ are iid $\operatorname{Exp}(\lambda)$. Then a. This is Bayes rule. a. Similar to the above Why do you think this is Binomially distributed? Can you explain it?

3.4

It essentially follows from the hint. The last bit that one has to prove is the following. We know that since $P$ is a transition matrix of an irreducible, aperiodic Markov chain. We have where $D$ is a diagonal matrix with eigenvalues ${\lambda_1,\ldots,\lambda_n}$ on the diagonal. We know that $\lambda_1 = 1$ and $|\lambda_i| < 1$. Then Now it’s clear that $D - I$ is diagonal with entries ${\lambda_1-1,\ldots,\lambda_n-1}$ on the diagonal. From this it follows that $A$ must have one $0$ eigenvalue and the rest must be negative.

3.6

The solution is given in the accompanying python file. Simply find the Jordan decomposition of $A$ as Then, The eigenvalues are $(0,-3,-7)$ from which we get

3.8

a. Again, computing using the python code, you get the eigenvector corresponding to $0$: This is not normalized, so we normalized this to get b. You leave state $1$ with rate $1 + 1 + 1$, and therefore, the time until you leave the state $1$ is $T = \operatorname{Exp}(3)$. We know that $\E[ T ] = 1/3$. a. Make state $4$ absorbing. So you get Let $T_4$ be the time to absorption into the $4$\textsuperscript{th} state, with $b(x) = \E[ T_4 | X_0 = x ]$. From page 74 in the textbook, Therefore, we $b(1) = 1$. The $[1 \quad 1 \quad 1]^T$ notation means that you’re taking a row vector and taking it’s transpose to get a column vector.

3.11

Since it seems like $\lambda_n > \mu_n$, we compute This is not summable in $n$ (it’s the Harmonic series). Therefore the chain is not transient, it’s recurrent. Let’s determine if it’s positive recurrent. This clearly blows up, and hence the chain is null recurrent.

For the second part, we have $\lambda_n = (n+1)/(n+2)$. Since the birth rate is lower than the death rate, we expect this to be recurrent. which is not summable. Therefore it’s recurrent. You might suspect that this case might be positive recurrent: This isn’t summable either, and this case is also null recurrent.

3.12

We did this in class. For transience we have So this is transient iff $\mu < \lambda$. For recurrence, notice that This means it’s always positive recurrent for $\lambda \leq \mu$. In fact, $\pi(0) = 1$.

Hints

3.4 This solution is due to Dylan Finlayson (thanks!). Choose $a$ larger in absolute value than all the entries of the matrix $A$. We have to show

  1. $P$ is a stochastic matrix.
  2. The chain is irreducible.
  3. $P$ is aperiodic. The only part that needs explanation is this one. To show that it’s aperiodic, simply notice that the diagonal entries of $P_{ii}$ are bigger than $0$. This is because

Recall the definition of periodicity: it’s the greatest common divisor of the set

But we’ve seen that $P_{ii} = p_1(i,i) > 0$. One way of staying in state $i$ after $n$ steps is to keep taking the loop from $i$ to $i$ $n$ times. Therefore, for all $n \geq 1$. This shows that the periodicity of the state $i$ is $1$. This is true for all the states, and hence the chain is aperiodic.

HW 5

Due Nov 11.

Problem 1

Consider the generating function where $X$ is a non-negative random variable. How does this compare with the moment generating function

Solution

Show that $\phi(s)$ is convex by differentiating the series for $\phi(s)$: What are the conditions on $X$ and its probabilities so that $\phi’‘(s) = 0$ for all $s \in (0,1)$.

Solution: Clearly Since all the terms in the sum are positive, this is zero iff $p_i = 0 \forall i \geq 2$. This means that $\phi = p_0 + s p_1$, which makes it a linear function.

Using Jensen’s inequality give me an easy lower bound for $\phi(s)$. Recall that Jensen’s inequality says that if $f$ is a convex function, then

Solution: We want to get a lower bound for $\phi(s) = \E[s^X]$. So we have to set $f(X) = s^X$ for any fixed $s$. To apply Jensen’s inequality you have to verify that $f$ is convex. This follows from

Then

Problem 2

Consider the simple random walk (SRW) with state space $S = { 0, \pm 1, \pm 2,\ldots }$. The walk goes forward or backwards with equal probability. This is the one-dimensional simple random walk ($d=1$).

  1. Show the one-dimensional random walk is recurrent by showing that

    and then using one of the criteria for recurrence/transience that we developed in class.

    Solution: This is straight from the textbook. Use Stirling, then show that This proves recurrence.

  2. In $d \geq 2$, the simple random walk can take a step (with equal probability) in each one of the $2d$ directions.

    • What is the probability that the random walk goes from $x = (x_1,\ldots,x_d)$ to $y = (x_1 + 1,\ldots,x_d)$ in one step? Solution: The notation is a little ambiguous, but I meant that only the first coordinate changes. This has probability $1/2d$ there are $2d$ directions you can go in.

    • It turns out that for $d \geq 2$, Is the simple random walk transient or recurrent for $d \geq 2$? Again

      Solution When $d = 2$, the number of visits is infinite. When $d \geq 3$, the expected number of visits is finite.

  3. The simple random walk is recurrent in one dimension. Is it null recurrent or positive recurrent? Hint: Try to find a stationary distribution $\pi(x)$.

    • Find an equation that $\pi(x)$ must satisfy. Solution
    • Substitute $g(x) = \pi(x) - \pi(x-1)$ and try to solve for $g(x)$. Or alternatively, write down a general form for the solution of $\pi(x)$ from the theory of recurrence relationships that we’ve developed in class. Solution We have by rearranging the terms. This means that $g(x) = C$, or in other words $\pi(x) = C x + \pi(0)$ for all $x$. But this means $\pi(x)$ is linear, there is no way it can sum to $1$ and be a bonafide probability.
    • Based on your findings, tell me whether the SRW is positive or null recurrent. Solution No stationary probability, hence null recurrent.
  4. Based on your findings in part 3, simply give me a non-mathematical argument for $d=2$. You can afford to be a little vague here. Give me an intuitive explanation for whether the walk is expected to be transient, positive recurrent or null recurrent. Solution Loosely speaking, in dimension $2$ there is a lot more room than $d=1$, so one expects it to be null recurrent again.

    Alternately, you can argue that the $d=2$ walk consists of two independent one dimensional walks. For the $d=2$ walk to return to $0$, both $d=1$ walks must return to $0$, and this has smaller probability. So it ought to take longer for the $d=2$ walk to return than the $d=1$ walk; but we've shown that the expected time of return of the $d=1$ is infinite. Therefore it's null recurrent.
    

Textbook problems

Problem 2.6

The hint ought to have helped you solve this one.

a. $p(x,x+1) = p$; and $p(x,0) = 1-p$. That is, if you’ve seen $x$ consecutive $1$ so far, and you see a $0$, the current run of $1$s is reset to $0$. a. The equation is Then $\pi(x) = p^x \pi(0)$ and This gives $\pi(0) = 1 - p$. Therefore $\pi$ is geometric.

Hints

Problem 2.6

For 2.6(c), we have to find a recursion equation. Here’s how you’d get started. Let’s consider $\E[ T_k - T_{k-1}]$, the time to go from $k-1$ successive $1$s to $k$ succesive ones. Given that we’ve seen $k-1$ successive ones, if the next one is a $1$, $T_k - T_{k-1} = 1$. If the next one is a $0$, we’ve gone back to square one and the we must again wait to see $k$ ones in succession. Then, letting $\E[T_k] = a(k)$,

With a little bit of work, one sees that Verify that this formula solves the recursion.

Problem 2.10

If there is a chance the branching process runs off to infinity without returning to zero, this means that the chain is transient ($\mu > 1$)! If the branching process returns to zero with probability $1$ it must be recurrent ($\mu \leq 1$).

Problem 2.12

You do not have to submit the code electronically. Just use a programming language to run it, print out the code and the results, and then attach it to your HW. Note that $a(n) = \phi_n(0)$. This is the probability that the chain goes extinct at the $n$th step. To find this, recall the recursion So a sample python code to create the function $\phi_n$ would be.

sys.setrecursionlimit(10000)

def phi(s):
    return p_0 + p_1 * s + p_2 * s**2 + p_3 * s**3

def phin(n,s):
    if n == 1:
        return phi(s)
    else:
        return phin(n-1,phi(s))

    # define the numbers p_0, p_1 and p_2 and p_3 first. Or however many numbers you need to define phi. 

print(phin(10,0))

HW 4

Due Oct 17. The code for this should be submitted electronically through canvas.

Problem (Time to absorption in Gambler’s ruin):

Start by reading example 3 on page 30, or please refer to your notes. Suppose you’re a gambler playing a simple game where you lose $-1$ with probability $1-p$ and gain $+1$ with probability $p$. We want to figure out long it takes for the Game to end. Let $T$ be the time or the number of steps to absorption into state $0$ or $N$ (you or the bank is ruined). Then, let the conditional expectation be

  • In the case when $p=1/2$, first figure out how the text book finds the equation

    What is the corresponding equation for general $p \neq \frac12$?

    Solution:

In the $p=1/2$ case, how does one figure out the solution to the recursion equation above? The textbook suggests that we guess one of the solutions. Here is one way to make an educated guess. Manipulating the equation above, we get

If we make the substitution $H(j) = G(j+1) - G(j)$, then we get

$H$ is a discrete time-derivative of $G$, so let us pretend that these are smooth functions $G \colon \R \to \R$, and view the difference equation above as a differential equation

  • Integrate the both sides to get a general form. There was a typo in this problem, where I’d originally written in $H’(x) = -1$. But this is easily fixable if you followed the line above.
  • Since $dG(x)/dx = H(x)$, use the form of $H(x)$ that you obtained and integrate both sides again. What does this give you for the general form of $G(x)$?
  • Compare your solution in the textbook. Can you verify your solution by substituting it into Solution: We verify this by plugging it into the recursion equation for $G$. Did you get it correct? What did the textbook get wrong? Solution The textbook had $G(j) = j^2 + c_1 j + c_2$, which has an incorrect sign for the quadratic term.

  • Use the boundary values $G(0) = G(N) = 0$ to get a final form without undetermined constants for $G(x)$. Make a plot of this function using your favorite programming language. Interpret the plot in plain language. Solution Using $G(0) = 0$ gives $c_2 = 0$. $G(N) = 0$ gives $c_1 = N$. The plot tells you that the time to absorbtion looks like a parabola. The time to maximal at the midpoint $j=N/2$, and behaves quadratically in $N$. It behaves linearly in $j$ when $j$ is close to either $0$ or $N$.

A small, interesting project is to try and guess the solution for the general case $p \neq \frac12$ using the above method. Does it work?

Problem (Stationary distribution for urn mdoel):

  1. Suppose you run the Markov chain for the urn model starting with a 100 red balls and 200 blue balls. What is the probability that you’re going to have less than 60 red balls in the urn after a really long period of time.

    Solution We know $X \sim \Binomial(300)$ is the stationary distribution of the number of blue balls. Then

  2. Use the central limit theorem to approximate the probability that you’ve previously written down. Hint: Recall normal approximation for the binomial distribution from MATH 5010. Solution The binomial formula above is quite a pain to compute, but we know that where $\mu$ and $\sigma^2$ are the mean and variance of the binomial. Then Using the python code

        import scipy.stats as st
        import numpy as np
        st.norm.cdf(-90/np.sqrt(150/4))
    

    Gives a value of $3.37 \times 10^{-49}$ for this probability. So it’s quite unlikely.

Problem (Uniform random variables on areas):

We suggested in class that one way to make a random variable that would fall in an area $A$ was to do the following:

  • Pick a square box containing the area $A$ of side length $R$. Simulate uniform random variables $(U_1,U_2)$ with distribution $U_i \sim \text{Unif}([0,R])$.
  • Define the random variable

I want you to prove that $(X_1,X_2)$ is uniformly distributed on the area $A$. To do so, assume that $B \subset A$, and compute

Interpret the answer you get in terms of areas.

Solution: You’re essentially following the textbook. If $(X_1,X_2)$ is uniform in an area $A$, then for any set $B$ in $A$, the probability that $(X_1,X_2) \in B$ should be

Our procedure uses $U_1,U_2$ to generate $(X_1,X_2)$. If $(U_1,U_2)$ is accepted then we set $X_1,X_2 = U_1,U_2$. So we should compute the probability This proves that the $X_1,X_2$ that we’ve generated using $U_1,U_2$ is uniformly distributed on $A$.

Problem (Simulation: reflecting random walk; make it pretty)

I want to see your code for this problem. It must be submitted with the HW on Canvas. An ideal format would be the jupyter notebooks in python, or R-markdown. Of course, you’re free to use anything, but the code must run. Please include graphs and pictures and make them pretty!

Consider the reflecting random walk on $5$ states. The transition matrix for this can be found in my code example on Canvas.

  1. Find $\pi$, the stationary probabilities by simulating running the chain for a long time, and computing the average amount of time spent in each state.

  2. Sample from the stationary probabilities directly by ‘coupling from the past’. Simulate this random variable a few times and estimate the stationary probabilities by seeing how often it lands in each state. The algorithm may be found in Chapter 11 of Ross’ book (see syllabus), but we’ll also go over it carefully in class in case you don’t have the book.

Problem (Google’s search algorithm is a Markov chain)

Start by reading wikipedia’s page on the PageRank algorithm that Google uses to ranking its search results. Get thoroughly confused. It sorta works like this:

  • Think of the pages of the world wide web as the states of the Markov chain.
  • The states are explored by a ‘random web surfer’. These are what are called ‘web-crawlers’.
  • The probability after a really really long time for the crawler to be in a particular webpage is the stationary probability of the Markov chain. The higher the probability of the webpage, the higher the PageRank.

After looking at the wikipedia page, see Firas’ presentation. There is an example of a Markov chain on $4 \OR 5$ states there.

  • Form the transition matrix of the Markov chain on a computer.
  • Compute (using a programming language) the stationary distribution.
  • Simulate the Markov chain and determine the stationary distribution, and compare with the above computation.
  • Determine the page ranks using the stationary probabilities.

HW 3

This HW was due on Sep 26, and you simply had to correct the errors on your exam. People who didn’t make any errors on the exam were exempt.

HW 2

Due on Sep 19, 2016. REMEMBER there is a quiz on Sep 16.

You’re going to be tested on classification of states, and mean return times in this one. You’re also going to be asked to fill in some details of proofs.

Problem 1: (10 pts)

In class we stated that there were two communication classes.

  1. Recurrent. By definition, once the chain enters this class, it can never leave it.
  2. Transient. We claimed that we would leave transient communication classes with probability one.

Are there any types of communication classes? For example, is there a class $T$ that you can stay in forever with probability $p$ and leave eventually with probability with $1-p$ (for $p > 0$)? We will show that such classes cannot exist.

Let $T = \cup_{i=1}^k T_i$ be a disjoint union of all the transient communication classes, and let $R$ contain all the recurrent states.

  1. Fix a state $i \in T_1$. We show first that there must be a way for the chain to transition from $i$ to a state $j \in R$; i.e., show that this has non-zero probability. One way to do this is by contradiction. Assume that you can never get from $i$ to the set $R$ and arrive at a contradiction.

    Solution: Let $T_1,\ldots,T_k$ be all the transient states. Let $R$ represent the union of all the recurrent classes. This is a key argument.

    Suppose you start in transient class $T_1$. There must be a way to leave class $T_1$; for if not, $T_1$ is a communicating class that you can never leave and this means that it is recurrent. Let us suppose we’ve ordered $T_1,\ldots,T_k$ so that one (of the many possible) classes that you can reach is $T_2$.

    We claim that you can never go back from $T_2$ to $T_1$. If it were possible to go back, then this would mean that $T_1$ and $T_2$ communicate; this violates our assumption about them being distinct communicating classes. Arguing this way, we find that must there is path from $i$ to a final communication class $T_k$. Now, we cannot have an edge between $T_k$ and any previous state $T_i$ for $i=1,\ldots,k-1$. If it did this would mean that $T_i$ and $T_k$ communicate, and this is not allowed. This means that $T_k$ must have a state from which there is a positive probability of reaching $R$. In other words, there is $k_i$ such that

  2. Next we must show that if $i \in T$, $\Prob( \exists~n > 0 \,s.t., X_n \in R X_0 = i ) = 1$. To arrive at a contradiction, suppose again that we never exit the transient states. Since $T$ is a finite set, there must be at least one state that is repeated infinitely many times. Let us show this semi-rigorously. Let

    Then,

    represents the probability that you never leave the transient class. Let

    Let $R_j^{(k)}$ be the $k^\text{th}$ return to $j$. Then,

    Then, this is non-zero iff $\rho(j,j) = 1$ for at least one $j \in T$.

    However since there is certainly one way of reaching $R$ from $j$. This is a contradiction.

Problem 2: (10 pts)

Consider a two-state Markov chain with $S = {0,1}$. Let $p(0,1) = 1/2$ and $p(1,0) = 2/3$. Let $T$ be the first return time to $0$ given that you started in state $0$.

  • Compute $\E[T]$ using the stationary distribution of the chain.

    Solution: The stationary distribution is obtained by solving $\pi P = \pi$. We get Then, using equation $1.11$, we get

  • Find $\Prob(T \geq n)$ by computing Solution: This is on page $26$. Simply read the textbook and follow the computation. Notice how we’re computing the full distribution of the random variable $T$.

  • Compute $\E[T]$ using the formula Solution: It’s the line following $1.13$.

Problem 3: (10 pts)

  • Suppose $a_n$ is a sequence of numbers such that Show that Solution: We use the $\e-\delta$ definition of a limit. Fix $\e$ and pick $N$ so large that Then $\e$ was arbitrary, this completes the proof. We’ve used the triangle inequality in the third step.

  • Suppose $X$ is a Markov chain that’s irreducible and has period $d > 1$. Let $Y(n,i)$ be the amount of time spent in state $i$ in $n$ steps of the chain. We’ve seen that Show that $A \to \pi(i)$, where $\pi$ is the unique stationary probability distribution.

    Solution: We know that This is in the textbook. So set $\sum_{r=1}^d p_{0+r} = a_0$, $\sum_{r=1}^d p_{d+r} = a_1$, $\sum_{r=1}^d p_{2d+r} = a_2$ and so on. That is, we’re grouping every $d$ terms in the sum $A$. Then we have The first of these terms goes to $\pi(i)$ since from what we’ve shown previously since $a_k \to d \pi(i)$ as $k \to \infty$. The second term goes to $0$ since $p_k(j,k) \leq 1$ and

Problem 4: (10 pts)

Recall the umbrella example from HW 1.

  1. What is the average number of umbrellas you’d have on hand in the long term?

    Solution: Simply use the stationary probabilities, and find This ranges from $2.2$ to $2.5$. I could live with this.

  2. Suppose you start with $0$ umbrellas. What is the average amount of time until you have $0$ umbrellas again?

    Solution: $\E[T] = \frac{1}{\pi(0)} = (5-p)/(1-p)$. As $p = 1$, (it rains all the time) then the time between no umbrellas goes to $\infty$! Can you see why? As $p = 0$, (it’s shiny all the time), this is $5$; can you see why?

HW 1

Due Sep 7, 2016.

Problem 1: Conditional probability.

There are two coins on the table. The first tosses heads with probability $p$ and the second with probability $r$. You select one at random, flip it, and get heads. What is the probability that the second coin was chosen?

Answer Let $H$ be the event that you get heads. Let $C_2$ be the event that the second coin is chosen. We want

Problem 2: Conditional probabilities and Markov chains.

Let $A_0, A_1, \ldots, A_n$ be events of non-zero probability.

  • Show that

    Answer: (2pts) We need the non-zero probability assumption because the conditional probabilities are meaningless otherwise. In other words, to define

    we need $\Prob(B) > 0$ to ensure that we’re not dividing by zero. So make the above statement to be true, we need to assume that

    The proof is by induction. For the induction hypothesis, assume that it’s true for any set of $n$ non-empty events. We will show that it’s true for $n+1$ non-empty events ${A_i}_{i=0}^{n}$. Then, using the definition of conditional probability in the second step and the induction hypothesis in the third step.

    We’ve used the notation (that we’ll use throughout the course) that

  • Suppose we have some stochastic process with state space $S = { a, b, c}$. What does equation in the previous part say about the event

    Answer (2pts)

  • Suppose the process is Markov, simplify the equation you obtained above.

    Answer (2pts) We simply apply the Markov property.

Problem 3: Linear Algebra

Consider a stochastic matrix of the form

  • Find the determinant of $A$.

    Answer (2pts): $\det(A) = 1/3 - 1/6 = 1/6$.

  • Find the eigenvalues of $A$ by solving the characteristic equation

    where $I$ is the identity matrix.

    Answer: (2pts) It’s easy to see that which gives the eigenvalues $x=1,x=1/6$.

  • Find normalized eigenvectors ($|v| = 1$) of $A$ finding solutions to $Av = \lambda v$. Answer: (2 pts) For $\lambda = 1$, solve to get the left eigenvector $(0.6172, 0.762)$. For the eigenvalue $\lambda = 1/6$, we get the eigenvectors $(-0.7211,0.7211)$. For this last step, I got python to solve the linear system of equations above. These are normalized left eigenvectors; i.e., they satisfy $\sqrt{v_1^2 + v_2^2} = 1$. The right eigenvectors may be found to be I agree that it looks rather weird because of the normalization. Nevertheless, this is what python chugs out if you ask for eigenvectors. If you didn’t use the computer, but solved by hand, you would get something like this.

  • Find a matrix $Q$ such that $Q^{-1} A Q$ is diagonal. In other words, find the Jordan decomposition of $A$. Hint: once you’ve found the eigenvectors $v_1$ and $v_2$, let $Q = [v_1 v_2]$.

    Answer: (2 pts) This is the same as where $\Lambda$ is the diagonal matrix of eigenvalues (or more generally something in Jordan form).

Problem 4: Transition matrix for a simple Markov chain

A thinker who owns 4 umbrellas travels back and forth between home and office, taking along an umbrella (if there is one at hand) in rain (probability $p$) but not in shine (probability $q=1-p$). Let the state be the number of umbrellas at hand, irrespective of whether the thinker is at home or at once. Suppose he has two umbrellas at home and two at work, and he’s at home. The state $X_0 = 2$. It rains that day, he takes one umbrella from home to work and so he has three umbrellas at work. The state is now $3$. Set up the transition matrix and find the stationary probabilities.

Answer: First, draw a picture of the possible transitions.

The form the matrix

Solve $\pi P = P$ and use the constraint $\sum_{i=0}^4 \pi_i = 1$ (it’s a probability vector). This gives us the equations

So finally we get $\pi_1 = \pi_2 = \pi_3 = \pi_4$ and therefore,