ST 321 Markov Chains
Task 1. Precipitation records from the
(a). What is the transition
probability matrix?
(b). If it rains today (Friday), what is the probability
that it will rain on Monday?
(c). A Markov chain is regular if some power of the
transition matrix has only positive elements.
For a regular chain, Pn approaches a limiting matrix W as n
goes to infinity. Each row of W is the
same. The jth element of the
row represents the long-run probability of state j. What is the
probability that a day picked at random from the distant future will be dry?
(d). Use the following program to compare simulated
independent data and simulated Markov chain data to the actual rainfall data.
Use a variety of transition probabilities, as well as those given above. Note: the program can be found in G:/ST321/Spring2004.
"markov.rainfall"<-function(p00
= 0.5, p11 = 0.5) {
x <- scan("g:/st321/wetdry.500")
par(mfrow = c(2, 1))
plot(x, type=”l”)
title(main = "Snoqualmie Rainfall Data")
P <-
rbind(c(p00, 1 - p00), c(1 - p11, p11)) #
#default
days are dry
mc <- rep(0, 500)
for(i in 2:500) {
if(mc[i - 1] == 0) {
#a dry day
if(runif(1) < P[1, 2]) {
#step to a rainy day
mc[i] <- 1
}
}
else {
#a rainy day
if(runif(1) < P[2, 2]) {
#step to a rainy day
mc[i] <- 1
}
}
}
plot(mc, type=”l”)
title(main = "Simulated Markov Chain")
par(mfrow = c(1, 1))
return()
}
Task 2. The following
program puts the transition matrix for an absorbing Markov chain into canonical
form and then computes various numerical characteristics of the chain,
including expected times to absorption and probabilities of absorption.
"Absorbing"<- function(P = rbind(c(0,
1/2, 0, 0, 1/2, 0), c(1/3, 0, 1/3, 1/3, 0, 0), c(0, 1/2, 0, 0, 0, 1/2), c(0, 1/3, 0, 0, 1/3, 1/3), c(0, 0, 0, 0, 1, 0), c(0,
0, 0, 0, 0, 1))) {
# this program puts the
transition probability matrix
# for an absorbing Markov
chain into canonical form
# and computes expected
time spent in transient state
# j starting from transient
state i (N=[n_ij]),
# expected time until absorption starting from
transient state i
# (tt=[t_i]), and
probability of absorption in absorbing state j
# starting from transient state i (B=[b_ij])
#
k <- dim(P)[1]
i <- 1
absorb <- c()
for(i in 1:k) {
if(P[i, i] == 1) {
#absorbing state
absorb <- c(absorb, i)
}
}
#end for loop
a <- length(absorb)
for(i in 1:a) {
#move each absorbing state to the bottom
Psave
<- P
P[k, ] <- c(rep(0, k - 1), 1)
P[1:(k - 1), 1:(k - 1)] <- Psave[ - absorb[i], - absorb[i]]
P[1:(k - 1), k] <- Psave[ - absorb[i], absorb[i]]
absorb[absorb > absorb[i]] <- absorb[absorb >
absorb[i]] - 1
}
transient <- k - a
Q <- P[1:transient, 1:transient]
R <- P[1:transient, (transient + 1):k]
N <- solve(diag(rep(1, transient)) - Q)
tt <- N %*% cbind(rep(1, transient))
B <- N
%*% R
return(P, Q, R, N, tt, B)
}
(a). Use the program to reproduce
the analysis done in class about the Drunkard’s Walk Example. What is the probability that the Drunkard
will end up in state 0 (home) if he starts in state 1? What is the expected number of steps in which
he is in state 2?
(b). The mean first passage time is the expected
number of steps to reach state j for the first time, starting from state i.
Techniques for absorbing chains can be used to compute mean first passage times
for non-absorbing chains by the following simple trick. Make a new Markov chain by setting pjj=1. Then the chain is absorbing, but before it
gets absorbed in state j, it behaves exactly like the old Markov chain. Use this idea to determine the expected
number of dry days before the first rainy day in the Snoqualmie data.
Task 3. Markov chains are
often used to describe genome data, where the states are nucleotides (a,c,g,t). The
following program uses data for the genome for baker’s yeast (available at http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html). The program estimates a transition
probability matrix from these data.
(a). Do the data look
independent? Is the Markov chain
regular?
(b). What is the probability that g
follows t? What is the probability that
a randomly selected nucleotide is g?
"markov.yeast"<-function()
{
x <- scan("g:/st321/bakers.yeast", list(x =
""))
y <- x$x
n <-
length(y)
z <- rep(1, n)
z[y == "c"] <- 2
z[y == "g"] <- 3
z[y == "t"] <- 4
plot(z[1:250])
title(main = "First 250 bp from baker's yeast")
zlead <- z[-1]
zlag <- z[ - n]
P1 <- table(zlead[zlag == 1])/length(zlead[zlag == 1])
P2 <- table(zlead[zlag == 2])/length(zlead[zlag == 2])
P3 <- table(zlead[zlag == 3])/length(zlead[zlag == 3])
P4 <- table(zlead[zlag == 4])/length(zlead[zlag == 4])
P <- rbind(P1, P2, P3, P4) #
Pn <- P # compute P^n as n gets large
for(i in 1:1000) {
Pn <-
Pn %*% P
}
return(P, Pn)
}