ST 321                                Random Variables and Histograms                              March 5, 2004

 

Try the following R/Splus commands:

 

sample(1:6,8,replace=T) # generate 8 random numbers from discrete uniform 1 to 6

 

rbinom(8,15,0.4)        # generate 8 random numbers from binomial with n=15 and

                        # success probability 0.4

 

rgeom(8,0.4)+1          # generate 8 random numbers from geometric with success

                        # probability 0.4 (note: we need to add 1 to the results

                        # as Splus uses a different parametrization)

 

rpois(8,2.3)            # generate 8 random poisson random numbers with mean 2.3

 

runif(8,4,5.5)          # generate 8 random numbers from continuous uniform

    # ranged from 4 to 5.5

 

rexp(8,0.6)             # generate 8 random exponential numbers with rate 0.6

 

rnorm(8,3,1.3)          # generate 8 random normal numbers with mean 3 and  

                        # standard deviation 1.3

 

We can use the “hist” command to plot the histogram of any set of random numbers generated.  For examples,

 

x <- rnorm(1000,3,1.3)

hist(x,freq=T)

 

x <- rbinom(1000,20,0.3)

hist(x,freq=T)

 

x <- runif(1000,4,5.5)

hist(x,freq=T)

 

Now try plotting the histograms of other random numbers (e.g., geometric, exponential, etc).

 

Task 1. (a). Suppose we roll a die three times.  Let X be the minimum of the three numbers obtained.  The following R/Splus program estimates the probability density function (or probability mass function) of X :

 

estimatepdf1 <- function(N=10000)

{

  freq <- seq(0,0,length=6) #generate a vector of 6 zeros

  for (i in 1:N) {

    die1 <- sample(1:6,1,replace=T)

    die2 <- sample(1:6,1,replace=T)

    die3 <- sample(1:6,1,replace=T)

    result <- min(die1, die2, die3) #so "result" is a number between 1 to 6

    freq[result] <- freq[result]+1

  }

  freq <- freq/N

  cat(freq, "\n")

}

 

Run the above program. 

 

(b). Suppose now we roll a die four times and let Y be the second largest number obtained.  Modify the above program to estimate the probability function of Y.  The R/Splus function sort may be useful.

 

Task 2. (a). (Problem 5.1.31 of the online downloadable text.) “In one of the first studies of the Poisson distribution, von Bortkiewicz considered the frequency of deaths from [mule] kicks in the Prussian army corps.  From the study of 14 corps over a 20-year period, he obtained the data”:

 

# of deaths,x

0

1

2

3

4

# of corps with x deaths in a given year

144

91

32

11

2

 

Fit a Poisson distribution to these data and see if you think that the Poisson distribution is appropriate.

 

y <- c(rep(0,144),rep(1,91),rep(2,32),rep(3,11),rep(4,2))

observed <- table(y)

lambda <- mean(y)

expected <- c(dpois(0:3,lambda),1-sum(dpois(0:3,lambda)))*length(y)

plot(0:4,observed,type="h",xlab="Number of Deaths",ylab="Frequency")

lines(0:4,expected,col=8)

 

What do the following R/Splus functions do: rep, table, mean, dpois and length?  In R/Splus try “help(rep)”.

 

(b). (Problem 5.1.26 of the same text.) “Feller discusses the statistics of flying bomb hits in an area in the south of London during the Second World War.  The area in question was divided into 24x24=576 small areas.  The total number of hits was 537.”  Data are given in the following table:

 

# of hits, x

0

1

2

3

4

5 or more

# of squares with x hits

229

211

93

35

7

1

 

Fit a Poisson distribution to these data and see if you think that the Poisson distribution is appropriate.

 

Task 3. Write a computer algorithm that simulates m hypergeometric random variables with parameters N=total number of balls in urn, k=number of red (i.e., N-k blue), and n=number of balls drawn.  The R/Splus command for “n choose k” is “choose(n,k)”.