Bayes AI

Unit 7: Markov Chain Monte Carlo

Vadim Sokolov
George Mason University
Spring 2025

MCMC Simulation

Suppose that \(X \sim F_X ( x )\) and let \(Y = g (X)\).

How do we find \(F_Y ( y )\) and \(f_Y ( y )\) ?

von Neumann

Given a uniform \(U\), how do we find \(X= g(U)\)?

In the bivariate case \((X,Y) \rightarrow (U,V)\).

We need to find \(f_{(U,V)} ( u , v )\) from \(f_{X,Y}(x,y)\)

Applications: Simulation, MCMC and PF.

Transformations

The cdf identity gives \[ F_Y ( y) = \mathbb{P} ( Y \leq y ) = \mathbb{P} ( g( X) \leq y ) \]

Hence if the function \(g ( \cdot )\) is monotone we can invert to get

\[ F_Y ( y ) = \int_{ g( x) \leq y } f_X ( x ) dx \]

If \(g\) is increasing \(F_Y ( y ) = P( X \leq g^{-1} ( y ) ) = F_X ( g^{-1} ( y ) )\)

If \(g\) is decreasing \(F_Y ( y ) = P( X \geq g^{-1} ( y ) ) = 1 - F_X ( g^{-1} ( y ) )\)

Transformation Identity

Theorem 1: Let \(X\) have pdf \(f_X ( x)\) and let \(Y=g(X)\). Then if \(g\) is a monotone function we have

\[ f_Y ( y) = f_X ( g^{-1} ( y ) ) \left | \frac{ d}{dy} g^{-1} ( y ) \right | \] There’s also a multivariate version of this that we’ll see later.

Suppose \(X\) is a continuous rv, what’s the pdf for \(Y = X^2\)?
Let \(X \sim N ( 0 ,1 )\), what’s the pdf for \(Y = X^2\)?

Probability Integral Transform

theorem Suppose that \(U \sim U[0,1]\), then for any continuous distribution function \(F\), the random variable \(X= F^{-1} (U)\) has distribution function \(F\).

Remember that for \(u \in [0,1]\), \(\mathbb{P} \left ( U \leq u \right ) = u\), so we have

\[ \mathbb{P} \left (X \leq x \right )= \mathbb{P} \left ( F^{-1} (U) \leq x \right )= \mathbb{P} \left ( U \leq F(x) \right )=F(x) \] Hence, \(X = F_X^{-1}(U)\).

Normal

Sometimes thare are short-cut formulas to generate random draws

Normal \(N(0,I_2)\): \(x_1,x_2\) uniform on \([0,1]\) then \[ \begin{aligned} y_1 = & \sqrt{-2\log x_1}\cos(2\pi x_2)\\ y_2 = & \sqrt{-2\log x_1}\sin(2\pi x_2) \end{aligned} \]

Simulation and Transformations

An important application is how to transform multiple random variables?

Suppose that we have random variables:

\[ ( X , Y ) \sim f_{ X , Y} ( x , y ) \] A transformation of interest given by: \[ U = g ( X , Y ) \; \; {\rm and} \; \; V = h ( X , Y ) \]

The problem is how to compute \(f_{ U , V } ( u , v )\) ? Jacobian

\[ J = \frac{ \partial ( x , y ) }{ \partial ( u , v ) } = \left | \begin{array}{cc} \frac{ \partial x }{ \partial u} & \frac{ \partial x }{ \partial v} \\ \frac{ \partial y }{ \partial u} & \frac{ \partial y }{ \partial v} \end{array} \right | \]

Bivariate Change of Variable

Theorem: (change of variable)

\[ f_{ U , V } ( u , v ) = f_{ X , Y} ( h_1 ( u , v ) , h_2 ( u , v ) ) \left | \frac{ \partial ( x , y ) }{ \partial ( u , v ) } \right | \] The last term is the Jacobian.

This can be calculated in two ways.

\[ \left | \frac{ \partial ( x , y ) }{ \partial ( u , v ) } \right | = 1 / \left | \frac{ \partial ( u , v ) }{ \partial ( x , y ) } \right | \]

So we don’t always need the inverse transformation \(( x , y ) = ( g^{-1} ( u , v ) , h^{-1} ( u , v ) )\)

Inequalities and Identities

Markov

\[ \mathbb{P} \left ( g( X ) \geq c \right ) \leq \frac{ \mathbb{E} ( g(X) ) }{c } \; \; {\rm where} \; \; g( X) \geq 0 \]

Chebyshev

\[ \mathbb{P} \left ( | X - \mu | \geq c \right ) \leq \frac{ Var(X) }{c^2 } \]

Jensen

\[ \mathbb{E} \left ( \phi ( X ) \right ) \leq \phi \left ( \mathbb{E}( X ) \right ) \]

Cauchy-Schwarz \[ corr (X,Y) \leq 1 \]

Chebyshev follows from Markov. Mike Steele and Cauchy-Schwarz.

Markov Inequality

Let \(f\) be non-decreasing \[ \begin{aligned} P ( Z > t ) &= P ( f(Z) \geq f(t) ) \\ & = E \left ( \mathbb{I} ( f( Z) \geq f(t ) ) \right ) \\ & \leq E \left ( \mathbb{I} ( f( Z) \geq f(t ) ) \frac{f(Z)}{f(t) } \right ) \\ & = E\left ( \frac{f(Z)}{f(t) } \right ) \end{aligned} \]

Concentration Inequalities

Law of Large Numbers \[ \lim_{ n \rightarrow \infty } \mathbb{P} \left ( | Z - E(Z) | > n \epsilon \right ) = 0 \; \; \forall \epsilon > 0 \]

Central Limt Theorem (CLT) \[ \lim_{ n \rightarrow \infty } \mathbb{P} \left ( n^{- 1/2} ( | Z - E(Z) | ) > \epsilon \right ) = \Phi ( x ) \]

Posterior Concentration

Hoeffding and Bernstein

Let \(Z= \sum_{i=1}^n X_i\).

Hoeffding \[ P ( Z > E(Z) + t ) \leq \exp \left ( - \frac{ t^2}{2n} \right ) \]

Bernstein \[ P ( Z > E(Z) + t ) \leq \exp \left ( - \frac{ t^2}{ 2 ( Var(Z) + t/3 ) } \right ) \] Large Deviations (Varadhan)

Special Distributions

See Common Distributions

Bernoulli and Binomial
Hypergeometric
Poisson
Negative Binomial
Normal Distribution
Gamma Distribution
Beta Distribution
Multinomial Distribution
Bivariate Normal Distribution
Wishart Distribution

\(\ldots\)

Example: Markov Dependence

We can always factor a joint distribution as

\[ p( X_n , X_{n-1} , \ldots , X_1 ) = p( X_n | X_{n-1} , \ldots , X_1 ) \ldots p( X_2 | X_1 ) p( X_1 ) \]

example - A process has the Markov Property if

\[ p( X_n | X_{n-1} , \ldots , X_1 ) = p( X_n | X_{n-1} ) \]

Only the current history matter when determining the probabilities.

A real world probability model: Hidden Markov Models

Are stock returns a random walk?

Hidden Markov Models (Baum-Welch, Viterbi)

Daily returns on the SP500 stock market index.

Build a hidden Markov model to predict the ups and downs.

Suppose that stock market returns on the next four days are \(X_1 , \ldots , X_4\).
Let’s empirical determine conditionals and marginals

SP500 Data

Marginal and Bivariate Distributions

Empirically, what do we get? Daily returns from \(1948-2007\).

center \(x\) Down Up —————- ——- ——- \(P( X_i ) = x\) 0.474 0.526

Finding \(p( X_2 | X_1 )\) is twice as much computational effort: counting \(UU,UD,DU,DD\) transitions.

center \(X_i\) Down Up —————— ——- ——- \(X_{i-1} = Down\) 0.519 0.481 \(X_{i-1} = Up\) 0.433 0.567

Conditioned on two days

Let’s do \(p( X_3 | X_2 , X_1 )\)

center \(X_{i-2}\) \(X_{i-1}\) Down Up ———– ———– ——- ——- Down Down 0.501 0.499 Down Up 0.412 0.588 Up Down 0.539 0.461 Up Up 0.449 0.551

We could do the distribution \(p( X_2 , X_3 | X_1 )\). This is a joint, marginal and conditional distribution all at the same time.

Joint because more than one variable \(( X_2 , X_3 )\), marginal because it ignores \(X_4\) and conditional because its given \(X_1\).

Joint Probabilities

Under Markov dependence \[ \begin{aligned} P( UUD ) & = p( X_1 = U) p( X_2 = U | X_1 = U) p( X_3 | X_2 = U , X_1 = U ) \\ & = ( 0.526 ) ( 0.567 ) ( 0.433) \end{aligned} \]
Under independence we would have got \[ \begin{aligned} P(UUD) & = P( X_1 = U) p( X_2 = U) p( X_3 = D ) \\ & = (.526)(.526)(.474) \\ & = 0.131 \end{aligned} \]