ST 321                                                            Markov Chains                                                   May 7, 2004

 

Task 1.  Precipitation records from the Snoqualmie Pass region of Washington State show that about 71.3% of dry days are followed by dry days, while 25.8% of rainy days are followed by dry days.  Model the presence or absence of daily precipitation as a two-state Markov chain with dry=0, wet=1. 

 

(a). What is the transition probability matrix? 

(b). If it rains today (Friday), what is the probability that it will rain on Monday?

(c). A Markov chain is regular if some power of the transition matrix has only positive elements.  For a regular chain, Pn approaches a limiting matrix W as n goes to infinity.  Each row of W is the same.  The jth element of the row represents the long-run probability of state j. What is the probability that a day picked at random from the distant future will be dry?

(d). Use the following program to compare simulated independent data and simulated Markov chain data to the actual rainfall data. Use a variety of transition probabilities, as well as those given above.  Note: the program can be found in G:/ST321/Spring2004.

 

"markov.rainfall"<-function(p00 = 0.5, p11 = 0.5) {

   x <- scan("g:/st321/wetdry.500")

   par(mfrow = c(2, 1))

   plot(x, type=”l”)

   title(main = "Snoqualmie Rainfall Data")

   P <- rbind(c(p00, 1 - p00), c(1 - p11, p11)) #

   #default days are dry

   mc <- rep(0, 500)

   for(i in 2:500) {

       if(mc[i - 1] == 0) {

#a dry day

          if(runif(1) < P[1, 2]) {

#step to a rainy day

              mc[i] <- 1

          }

       }

       else {

#a rainy day

          if(runif(1) < P[2, 2]) {

#step to a rainy day

              mc[i] <- 1

          }

       }

   }

   plot(mc, type=”l”)

   title(main = "Simulated Markov Chain")

   par(mfrow = c(1, 1))

   return()

}

 

Task 2. The following program puts the transition matrix for an absorbing Markov chain into canonical form and then computes various numerical characteristics of the chain, including expected times to absorption and probabilities of absorption.

 

"Absorbing"<- function(P = rbind(c(0, 1/2, 0, 0, 1/2, 0), c(1/3, 0, 1/3, 1/3, 0, 0), c(0, 1/2,  0, 0, 0, 1/2), c(0, 1/3, 0, 0, 1/3, 1/3), c(0, 0, 0, 0, 1, 0), c(0, 0, 0, 0, 0, 1))) {

# this program puts the transition probability matrix

# for an absorbing Markov chain into canonical form

# and computes expected time spent in transient state

# j starting from transient state i (N=[n_ij]),

# expected time until absorption starting from transient state i

# (tt=[t_i]), and probability of absorption in absorbing state j

# starting from transient state i (B=[b_ij])

#

   k <- dim(P)[1]

   i <- 1

   absorb <- c()

   for(i in 1:k) {

       if(P[i, i] == 1) {

#absorbing state

          absorb <- c(absorb, i)

       }

   }

#end for loop

   a <- length(absorb)

   for(i in 1:a) {

#move each absorbing state to the bottom

       Psave <- P

      P[k,  ] <- c(rep(0, k - 1), 1)

       P[1:(k - 1), 1:(k - 1)] <- Psave[ - absorb[i],  - absorb[i]]

       P[1:(k - 1), k] <- Psave[ - absorb[i], absorb[i]]

       absorb[absorb > absorb[i]] <- absorb[absorb > absorb[i]] - 1

   }

   transient <- k - a

   Q <- P[1:transient, 1:transient]

   R <- P[1:transient, (transient + 1):k]

   N <- solve(diag(rep(1, transient)) - Q)

   tt <- N %*% cbind(rep(1, transient))

   B <- N %*% R

   return(P, Q, R, N, tt, B) 

}

 

(a). Use the program to reproduce the analysis done in class about the Drunkard’s Walk Example.  What is the probability that the Drunkard will end up in state 0 (home) if he starts in state 1?  What is the expected number of steps in which he is in state 2?

(b). The mean first passage time is the expected number of steps to reach state j for the first time, starting from state i. Techniques for absorbing chains can be used to compute mean first passage times for non-absorbing chains by the following simple trick.  Make a new Markov chain by setting pjj=1.  Then the chain is absorbing, but before it gets absorbed in state j, it behaves exactly like the old Markov chain.  Use this idea to determine the expected number of dry days before the first rainy day in the Snoqualmie data. 

 

Task 3. Markov chains are often used to describe genome data, where the states are nucleotides (a,c,g,t).   The following program uses data for the genome for baker’s yeast (available at http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html).  The program estimates a transition probability matrix from these data. 

(a). Do the data look independent?  Is the Markov chain regular?

(b). What is the probability that g follows t?  What is the probability that a randomly selected nucleotide is g?

 

"markov.yeast"<-function() {

   x <- scan("g:/st321/bakers.yeast", list(x = ""))

   y <- x$x

   n <- length(y)

   z <- rep(1, n)

   z[y == "c"] <- 2

   z[y == "g"] <- 3

   z[y == "t"] <- 4

   plot(z[1:250])

   title(main = "First 250 bp from baker's yeast")

   zlead <- z[-1]

   zlag <- z[ - n]

   P1 <- table(zlead[zlag == 1])/length(zlead[zlag == 1])

   P2 <- table(zlead[zlag == 2])/length(zlead[zlag == 2])

   P3 <- table(zlead[zlag == 3])/length(zlead[zlag == 3])

   P4 <- table(zlead[zlag == 4])/length(zlead[zlag == 4])

   P <- rbind(P1, P2, P3, P4) #

   Pn <- P   # compute P^n as n gets large

   for(i in 1:1000) {

       Pn <- Pn %*% P

   }

   return(P, Pn)

}