504 Final Exam Notes

I didn’t understand some things about stopping times and the debut theorem when I taught this class, so I’ve added some extra notes here. Next time, I’ll certainly build some more general ideas about stochastic processes like joint and progressive measurability after constructing Brownian motion. I think this might be more helpful than doing fiddly little things about the reflection principle and Hausdorff dimension.

Carl Mueller showed me Greg Lowther’s notes about stochastic processes. They are very well written. I used them quite heavily for problem 11 and problem 12.

  1. Poincare recurrence (easy warmup). We have the usual setup $(\W,\mathcal{F},\Prob,T)$ with $T$ measure preserving and ergodic. Prove

    \[\lim_{n \to \infty} \frac1 n \# \left( k \leq n : T^k x \in Q \right) = \Prob(Q)\]

    Then, if $R$ is the return time to $Q$, prove that (you may repeat the proof in class) \(\Prob(R < \infty) = 1.\)

    Solution: This was essentially the ergodic theorem, and all of you got this right.

  2. An extreme point of a convex set cannot be written as a nontrivial convex combination of two other points in the set. Let $\mathcal{I}$ be the invariant $\sigma$-algebra. Let $\mathcal{M}$ be the space of all $T$-invariant probability measures.

    1. Show that a probability measure is ergodic iff it is an extreme point of $\mathcal{M}$.

      Solution: An extreme point of a convex set cannot be written as a nontrivial convex combination of two other points of that set. If $P$ is not ergodic, then there is a nontrivial invariant set $A$ for $\Prob$. You found $\mu$ and $\nu$ by writing $\mu(B) = P(B | A)$ and $\nu(B) = P(B | A^c)$. Therefore, $P$ cannot be an extreme point. All of you got this part of the argument right.

      Most of you did not get the second part right. If $P$ is not an extreme point of $\mathcal{M}$, then it can be written as a combination of $\nu$ and $\mu$, where $\nu$ and $\mu$ are both extreme points. We claim that $P$ cannot be ergodic.

      The only thing we can use at this point is the ergodic theorem, so lets keep that in mind as we proceed. If $\nu$ and $\mu$ are different measures, there must be a bounded function (an indicator of a set, e.g.) such that

      \[\int f d \nu \neq \int f d \mu\]

      So the pointwise ergodic theorem shows that there is a $\nu$-null set $B$ such that the ergodic average $S_n f \to \int f d \nu$ on $\W \setminus B$, and further, $B$ is invariant. Similarly, $S_n f \to \int f d \mu$ on $\W \setminus B’$. But we ought to have $\W \setminus B’ \subset B$, which means we must have $\mu(B) = 1$. Thus, we have found an invariant set $B$ such that

      \[P(B) = p \nu(B) + (1-p) \mu(B) < 1\]
    2. Assume that $\W$ is “nice”. Then, from your previous semester’s probability class, any measure $\Prob$ has a regular conditional probability with respect to the invariant $\sigma$-algebra such that

      \[\Prob(\w,A) = \E[ 1_A | \mathcal{I} ]\]

      Show that the regular conditional probability is ergodic for the translation map $T$.

      Solution: The niceness of the space lets you assume that a regular conditional probability exists. Our version of niceness always included a notion that said that our $\sigma$-algebra was countably generated. So it’s enough to check anything we have on a countable collection of generating sets.

      Most of you got this part right. Essentially, for any invariant set $E$, you have to check

      \[\Prob(\w,E) = \E[ 1_E | \mathcal{I} ] = 1_E \in \{0,1\}\]

      Since $E$ is $\mathcal{I}$ invariant, the conditional expectation leaves the indicator unchanges. This may be proved by testing on the countable generating set.

    3. Let $\mathcal{M}_e$ be the set of all ergodic invariant measures. Then, show that for any invariant measure $P$, there is a probability measure $\mu_P$ on the set $\mathcal{M}_e$ such that

      \[P(A) = \int_{\mathcal{M}_e} Q(A) \mu_P(dQ)\]

      Hint: For any measurable set $A$, prove that

      \[P(A) = \int_{\W} P(\w,A) dP(\w)\]

      Remark: This is an important fact. Any invariant probability measure can be decomposed as a convex combination of its ergodic components using the regular conditional probability with respect to the invariant $\sigma$-algebra.

      Solution: All of you checked the tower property.

      People tend to use Choquet’s theorem or Krein-Milman for the last two parts instead of using the conditional probability. See wikipedia or this pdf. In one line, the space of invariant measures is a topological vector space; restricting to probability measures makes it compact. Hence Choquet’s theorem applies.

  3. Recall the definition of unique ergodicity and generic points from your notes (or mine, on this website). It says that the following are equivalent.

    1. $(\W,T)$ is uniquely ergodic.
    2. Every $\w \in \W$ is generic.

    Recall that generic points are points for which the ergodic theorem holds for all continuous functions on $\W$ (it comes with a topology). The system is uniquely ergodic if there is only one measure $\mu$ that is invariant under $T$.

    Prove that Haar measure is uniquely ergodic for the circle rotation $T_\theta \colon [0,1) \mapsto [0,1)$ where $T_\theta(\w) = \w + \theta \mod 1$. Hint: Look in Durrett.

    Solution: Durrett shows that the ergodic theorem holds for indicators of half-open intervals in Theorem 7.2.4 (version 4.1 on his website).

  4. Write down the probability that

    \[\Prob_0((B(t_1),\ldots,B(t_n)) \in A_1 \times \cdots \times A_n)\]

    for Brownian motion in $\R^d$. Here, $A_i$ are measurable sets in $\R^d$ and $0 < t_1 < \cdots < t_n$.

    Yes, this problem is that easy.

    Solution

  5. Next, we —by we I mean I asked Carl for a reference, and he randomly picked McKean’s book. There’s no escaping him.— construct Brownian motion on $[0,1]$ using the Haar basis. This proof is due to Paul Lévy, and was later simplified by Ciesielski. Consider the Haar basis, defined by $f_0 = 1_{[0,1]}$ and

    \[f_{k2^{-n}} = \begin{cases} + 2^{(n-1)/2} & (k-1)2^{-n} \leq t < 2^{-n} \\ - 2^{(n-1)/2} & k 2^{-n} \leq t < (k+1) 2^{-n} \\ 0 & \text{otherwise} \\ \end{cases}\]

    for odd $k < 2^n$. Show that ${f_{k2^{-n}} }_{k,n}$ form an orthornormal basis for $L^2[0,1]$. Note that the integral of this basis is the Schauder basis

    \[\int_0^t f_{k 2^{-n}}(s)\,ds\]

    that forms tent-shaped functions (Draw pictures for yourself)

    Levy’s idea is to use the formal Haar series to define Brownian motion. Let $g_{k 2^{-n}}$ be an independent Gaussian family of mean-$0$ random variables with variance $1$. Define

    \[b(t) = g_0 \int_0^t f_{0} + \sum_{n=1}^\infty \sum_{k \text{ odd} < 2^n} g_{k 2^{-n}} \int_0^t f_{k 2^{-n}},\]

    a random Fourier series.

    There are two things to check:

    1. The series for $b(t)$ is uniformly convergent on $[0,1]$. Hence, $b(t)$ is continuous.
    2. The process $b(t)$ has the right correlations

      \[\E[ b(t) b(s) ] = t \wedge s\]

    For $1.$, let

    \[e_n = \Norm{\sum_{k \text{ odd} < 2^n} g_{k 2^{-n}} \int_0^t f_{k 2^{-n}}}{\infty} = 2^{-(n+1)/2} \max_{k \text{ odd } < 2^n} |g_{k 2^{-n}}|\]

    You will have to show the second equality above. Then, estimate

    \[\Prob( e_n > \theta \sqrt{(2^{-n} \log 2^n)})\]

    and then use the Borel-Cantelli lemma.

    For the correlations in $2.$, use Parseval’s relations for the Haar basis applied to the indicator functions $j_1$ and $j_2$ of the intervals $[0,t]$ and $[0,t]$. (This part is cute).

    Solution: All of you got this problem right, except for the correlation part. There is a bit of work to show that the basis is complete in $L^2$, but that’s quite easy to see. For example, SQ and DS showed that indicators of dyadic intervals can be written as a (limit of) a linear combination of the Haar basis.

    \[\begin{align} \E[ b(t) b(s) ] & = \int_0^1 1_{[0,t]} f_{0} \int_0^1 1_{[0,s]} f_0 + \sum_{n=1}^\infty \sum_{k \text{ odd} < 2^n} \int_0^1 1_{[0,s]} f_{k 2^{-n}} \int_0^1 1_{[0,t]} f_{k 2^{-n}}\\ & = \left( 1_{[0,t]} , f_{0} \right) (1_{[0,s]}, f_0) + \sum_{n=1}^\infty \sum_{k \text{ odd} < 2^n} ( 1_{[0,t]}, f_{k 2^{-n}} ) ( 1_{[0,s]}, f_{k 2^{-n}} ) \\ & = (1_{[0,t]}, 1_{[0,s]}) \end{align}\]

    where the last identity follows from Parseval’s relationship for the Haar basis. Parseval’s relationship is fairly straightforward to prove for yourself.

  6. Exercise 8.1.2 from Durrett. Show that Brownian motion is not Hölder 1/2 + 1/k.

  7. Exercise 8.2.3 from Durrett. The set of local maxima of Brownian motion is an almost surely a dense set.

    Solution: Show it first for some rational interval. Given some $(a,b)$, there is a point in the interior of $(a,b)$ where the local maximum is taken since there is a sequence of times $t_k \to a$ such that $B_{t_k} = B_a$. Recall that this was a consequence of Blumenthal’s 0-1 law. Then, use the Markov property.

  8. Let $\W_0 = \left\{ f | f \colon [0,\infty) \to \R \right\}$, and let $\mathcal{F}_0$ be the $\sigma$-algebra generated by finite dimensional sets of the form ${ f \colon f(t_i) \in A_i, t_i \in [0,\infty), A_i \subset \R, 1 \leq i \leq n }$. Let $\mathcal{C} = \{ f \colon t \to f(t) \text{ is continuous} \}$. Show that $\mathcal{C} \not \in \mathcal{F}_0$.

    Solution

    Both NC and DS noticed the following: show that the $\pi$-system containing sets of the form (for a fixed sequence ${ t_k }$ and measurable set $E$ in the product space $\mathcal{B}(\R^{\Z^+})$

    \[A = \left{ \w \colon \exists \{ t_k \} \text{ and } \E \in \mathcal{B}(\R^{\Z^+} ( B(t_1),B(t_2), \ldots ) \in E \right}\]

    is also a $\lambda$-system and is hence a $\sigma$-algebra. So any $\sigma$-algebra containing cylinders should contain this set.

    To complete the argument, you do the following: suppose continous functions were measurable, then there is a sequence ${t_k}$ and a set $E$ such that $\mathcal{C}$ is in the form above. However, consider the indicator of the point $c \in \R^+$, $1_{c}(x)$, which is not continuous. Therefore, some $t_k = c$, otherwise $1_c$ will belong to $\mathcal{C}$. But there are uncountably many $c$ and only a countable number of $t_k$.

  9. Suppose $f(t) > 0 \forall t > 0$. Show that for some $c \in [0,\infty]$ we have

    \[\overline{\lim_{t \to 0}} \frac{B(t)}{f(t)} = c \quad \Prob_0 \, \almostsurely\]

    Solution This was Blumenthal. I liked SQ’s solution which also used the law of the iterated logarithm for Brownian motion here.

  10. Suppose

    \[\begin{align} \mathcal{N}_x & = \left\{ A \colon A \subset D \, \Prob_x(D) = 0 \right\} \\ \mathcal{F}_s^x & = \sigma \left( \mathcal{F}^+_s \cup \mathcal{N}_x \right) \\ \mathcal{F}_s & = \cap_x \mathcal{F}^x_s \end{align}\]

    Show that $\mathcal{F}$ is a right-continuous $\sigma$-algebra.

    Solution: The key here, is to notice (like NC did) that since $\sigma \left( \mathcal{F}^+_s \cup \mathcal{N}_x \right)$ is a completion of a measure space, all sets in the $\sigma$-algebra can be written in the form

    \[A \cup B\]

    where $A \in \mathcal{F}^+_s \AND B \in \mathcal{N}_x$.

  11. If $G$ is open let $T = \inf \left\{ t \geq 0 \colon B_t \in G \right\}$, where $B_t$ is a d-dimensional Brownian motion. Show that $T$ is a stopping time. Repeat the problem with $K$ closed instead of $G$.

    Do you need continuity of the Brownian motion here?

    Can you find a discontinuous Markov process for which $T$ is not a stopping time?

    Solution: This is a problem that I didn’t fully understand when I gave it, but while solving it myself, I learned about the full debut theorem. I’ll give you a short explanation of it. All of these concepts are better explained in Greg Lowther’s notes.

    There are three ways to describe stochastic processes.

    1. As a collection of random variables ${X_t}_{t \geq 0}$, one for each $t$,
    2. As a path $t \to X_t(\w)$, one for each $\w$,
    3. And as a function from the product space $\R^+ \times \Omega \to \R^d$.

    The third is a useful way to think about things, especially when you want to talk about stopping times and such.

    Indistinguishability: Two processes $X_t$ and $Y_t$ are indistinguishable if

    \[\Prob(X_t = Y_t \forall t \geq 0 ) = 1\]

    This is also called equivalent up to evanescence.

    However, as we did in class, processes are usually described by their finite-dimensional distributions. So we need another weaker notion of equivalence.

    Stochastically equivalent: Two processes are stochastically equivalent if they have the same finite-dimensional distributions. A simpler way to say this is to say for each time $t \geq 0$,

    \[\Prob( X_t = Y_t) = 1.\]

    As we have seen before, stochastic equivalence does not imply indistinguishability. BUT, the following lemma is quite relates stochastic equivalence and indistinguishability when you have some extra continuity.

    Lemma: All right-continuous processes and left-continuous processes that are stochastically equivalent are indistinguishable.

    The next important concept is joint-measurability.

    Joint-measurability: A process is jointly measurable if it is measurable with respect to the product $\sigma$-algebra $\mathcal{B}(\R^+) \times \mathcal{F}$.

    Note the Borel $\sigma$-algebra is on the whole positive real line. Also notice that we’re not using the fact $\mathcal{F}_t$ is a filtration for the process $X_t$. There is a related notion called progressive measurability, that uses the fact that $\mathcal{F}_t$ is a filtration. I’ll touch upon that in a minute.

    Lemma: All right-continuous and left-continuous processes are jointly measurable.

    Suppose $\tau$ is any random time, $\tau \colon \Omega \to \R$. Note that $\tau$ is not a stopping time; like joint measurability, it’s not using the fact that $\mathcal{F}_t$ is a filtration.

    Lemma: If $X_t$ is a jointly measurable process, then $X_\tau$ is a measurable random variable.

    With what we have so far, let us state a baby version of the Debut Theorem. We assume that the filtration is right-continuous.

    Theorem: Let $X$ be an adapted right-continuous stochastic process taking values in $\R$ that is defined on a complete filtered probability space. If $K$ is any real number, let

    \[\tau(\w) = \inf \left\{ t \in \R^+ \colon X_t(\w) \geq K \right\}\]

    Then $\tau(\w)$ is a stopping time. This is proved in this post about stopping times and the debut theorem.

    I think the proof extends quite easily to any closed set $K \in \R^d$. However, this is not the most general result possible if we’re allowed to assume that the process (like Brownian motion) is progressively measurable. This general version of the debut theorem does not assume anything about the continuity of the process either.

    Next, we talk a little bit about measurable selection. This will lead into the generalized debut theorem, which talks about entrance times into measurable sets.

    I’ll ignore some of the material from the post titled Filtrations and Adapted Processes, but I highly recommend reading it. The one definition we need from it is the following.

    Progressive measurability: A process is called progressive measurable if for each fixed $t \geq 0$, the map $X_s(\w) \colon [0,t] \times \W \to \R^d$ is $\mathcal{B}([0,t]) \times \mathcal{F}$ measurable.

    Let $\pi_B$ be the projection operator from sets $A \times B$ onto $B$. That is $\pi_B((a,b)) = b$.

    Theorem (Measurable Projection): If $(\OFP)$ is a complete probability space and $A \in \mathcal{B}(\R) \times \mathcal{F}$, then $\pi_\W(A) \in \mathcal{F}$.

    This seems obvious, but it’s not. It’s also important for the space to be complete here.

    There is an interesting mistake related to this that Lebesgue made. The mistake was discovered by Suskin, which eventually led into the study of descriptive set theory.

    Definition (Graph): The graph of a random time is

    \[[\tau] = \left\{ (t,\w) \in \R^+ \times \W \colon t = \tau(\w) \right\}\]

    Measurable projection allows us to classify stopping times by graph measurability.

    Lemma: Let $\tau \colon \W \to \R^+ \cup \infty$ be any map. Then

    • $\tau$ is measurable iff $[\tau]$ is jointly measurable.
    • $\tau$ is a stopping time iff $[\tau]$ is progressive.

    The second line could use some explanation: it means that

    \[1_{[\tau]}(t,\w)\]

    is a progressively measurable process.

    The debut of a set $A \subset \R^+ \times \W$ is a map $D_A \colon \W \to \R^+$ defined by

    \[D_A(\w) = \inf \left\{ t \in \R^+ \colon (t,w) \in A \right\}\]

    That is, it’s the first time $t \geq 0$ when the process enters the set $A$.

    Theorem (Debut): If $A \subset \R^+ \times \W$ is progressively measurable and the filtration is right-continuous, then $D_A$ is a stopping time.

    Corollary: If $X$ is a progressively measurable process, $K \subset \R^d$ is Borel, and $\mathcal{F}$ is right continuous, then

    \[\tau = \inf \{ t \in \R^+ \colon X_t \in K \}\]

    is a stopping time.

    Comments: Since it’s clear that breaking continuity is not going it work so easily to beat the debut theorem, we should try to break something else. One way to make something not be a stopping time is to break the right-continuous filtration property. DS did the following: let $X$ be a Bernoulli(1/2) random variable, and define the process $A(t)$ as follows

    \[A(t) = \begin{cases} 0 & t = 0 \\ X & t > 0 \end{cases}\]

    Notice that the debut time $T$ of the set $(1/2,\infty)$ is not a stopping time since $T \leq 0$ is not $\mathcal{F}_0$ measurable. The property that DS broke is the following: $\mathcal{F}_0 \neq \cap_{t > 0} \mathcal{F}_t = \cap_{t > 0} \sigma(X)$.

  12. Give an example of a process that is Markov but not strong Markov.

    There are several simple examples. Here is one. Let $X(t)$ be the process

    \[X(t) = \begin{cases} B(t) & X(0) = x \\ 0 & X(0) = 0 \\ \end{cases}\]

    where $B(t)$ is a standard Brownian motion.

    We may check that it is Markov. For this, one has to check the following property.

    \[\E_x [ f(X_{t+s}) | \mathcal{F}_t ] = \E_{X_t} [ f(X_t) ]\]

    The left hand side is a conditional expectation. For $x=0$, this is easy to verify of course. So pick a set $B \in \mathcal{F}_t$ and let $x \neq 0$. Then, since $X_t$ is a Brownian motion, one only needs to note that $\Prob(X_t = 0) = 0$ for any fixed $t$, and therefore

    \[\begin{align} \E_x[ 1_B \E_x [ f(X_{t+s}) | \mathcal{F}_t ] ] & = \E_x[ 1_{B \cap \{ X_t \neq 0 \} } \E_x [ f(X_{t+s}) | \mathcal{F}_t ] ] \\ & = \E_x[ 1_{B \cap \{ X_t \neq 0 \} } \E_{X_t} [ f(X_{t+s}) ] ] \end{align}\]

    where in the last step we just used the Markov property for Brownian motion.

    I’ll also sketch Ito’s example to you, which is also quite simple. This is from the series of lectures he gave at the Tata Institute for Fundamental Research, Bombay in 1961. He uses a deterministic process

    \[\xi^{(a)}_t = \begin{cases} a + t & a \neq 0 \\ 0 & a = 0, t < \tau(\w) \\ t - \tau(\w) & a = 0, t \geq \tau(\w) \end{cases}\]

    That is, the process just increases linearly from its starting point $a$, except when it starts at the origin $a=0$. In the latter case, it waits for an exponential waiting time $\tau(\w)$ and then starts increasing linearly.

    Again, it’s easy to verify the Markov property. For the strong Markov property, Ito uses the hitting time $\sigma$ of the set $(0,\infty)$, and considers

    \[\Prob_0( \sigma(\w) > 0, \sigma( \theta_\sigma \w) > 0)\]

    where, like Durrett, I’ve used $\theta_t$ for the time-shift operator. It’s easy to show that the strong Markov property does not hold for this probability.

    There are processes called the strong Feller processes that always have the strong Markov property. It’s worth visiting their definition to see how we’re breaking the strong Markov property. In the proof of the Strong Markov property, Durrett uses the continuity of the function

    \[\phi(s,x) = \E_x[ f(X_s) ]\]

    where $f$ is some bounded continuous function. He then uses

    \[\lim_{(s,x) \to (0,y)} \phi(s,x) = \phi(0,y)\]

    This was easily shown to be true for Brownian motion, and essentially followed from the continuity of the process.

    the Feller property is usually defined using the semigroup (in time) $P_t$ acting on the space of continuous functions $C(\R^d)$:

    \[P_t f(x) = \E_x [ f(X_t) ]\]

    where $X_t$ represents the Markov Process at time $t$. Of course, we just called this $\phi(t,x) = P_t f(x)$. A Feller semigroup on $C(\R^d)$ satisfies:

    • $P_t f$ is a map from $C(\R^d)$ to $C(\R^d)$.
    • $P_t$ forms a semigroup; i.e., $P_{t + s} = P_t P_s$
    • $\Norm{P_t f}{} = \Norm{f}{}$ for all $t \geq 0$.
    • $\lim_{t \to 0} \Norm{P_t f - f}{} = 0$

    One has to build some theory to show that Feller semigroups define Markov processes, and then show that the process has the strong Markov property.

    In our examples, we break the first property. $P_t f(x)$ does not turn out to be continuous function for all continuous $f$. In our case, we get $P_t f(0) = f(0)$ for all bounded functions $f$. We then choose a function such that $\lim_{x \to 0} P_t f(x) \neq f(0)$.

    More questions for the future: Give me a process that is adapted but not jointly measurable. Give me a process that is jointly measurable but not progressively measurable.