December 11, 2014

Why parallel?

Why parallel?

We've already seen the doParallel package, which acts as an interface between foreach and parallel. Whereas foreach was a parallel analogue of a for loop, the parallel package provides handy analogues of apply functions

The parallel package merges multicore and snow

  • Provides drop-in replacements for most of their functionality, with integrated handling of random-number generation
  • By default, uses multicore functionality on Unix-like systems and snow functionality on Windows (we can also use the snow functionality to execute on a cluster for both systems)

Getting started

Parallel backend

To get started, we must first make the desired number of cores available to R by registering a parallel backend. We can do this with the doParallel package.

library(doParallel)

# Determine number of CPU cores in your machine
nCores = detectCores()

# Create cluster with desired number of cores
cl = makeCluster(nCores)

# Register cluster
registerDoParallel(cl)

Parallel apply

We've seen how we can move from a traditional for loop to foreach. Similarly, we can speed up traditional apply functions by using their parallel analogues.

N = 100
input = seq_len(N) # same as 1:N but more robust
input.list = as.list(input)

testFun = function(i){
  mu = mean(runif(1e+06, 0, i))
  return(mu)
}

system.time(sapply(input, testFun))
##    user  system elapsed 
##   5.216   0.175   5.422
system.time(parSapply(cl, input, testFun))
##    user  system elapsed 
##   0.004   0.000   2.455

system.time(lapply(input.list, testFun))
##    user  system elapsed 
##   4.937   0.150   5.116
system.time(parLapply(cl, input.list, testFun))
##    user  system elapsed 
##   0.002   0.000   2.363
system.time(mclapply(input.list, testFun, mc.cores=nCores))  
##    user  system elapsed 
##   5.324   0.344   2.150
# not available on Windows

pvec

We can also parallelize a vector map function using pvec. This splits the vector and submits each part to one core. It is often only worthwhile on very long vectors and for computationally intensive calculations.

library(fields)
d = runif(5e+06, 0.1, 10)
system.time(Matern(d))
##    user  system elapsed 
##   3.439   0.103   3.603
system.time(pvec(d, Matern, mc.cores=nCores))
##    user  system elapsed 
##   3.950   0.349   1.451

Random number generation

Once again, we need to be careful when parallelizing a process which generates (psuedo-)random numbers.

We want the different processes to run independent (and preferably reproducible) random-number streams. We can do this using clusterSetRNGStream.

Let's go back and fix our first example:

N = 100
input = seq_len(N)

testFun = function(i){
  mu = mean(runif(1e+06, 0, i))
  return(mu)
}

# RNGkind("L-Ecuyer-CMRG") # Automatically uses L'Ecuyer's algorithm

clusterSetRNGStream(cl, iseed=0)
res1 = parSapply(cl, input, testFun)

clusterSetRNGStream(cl, iseed=0)
res2 = parSapply(cl, input, testFun)

identical(res1, res2)
## [1] TRUE

Bootstrapping

Bootstrapping is another example of an embarassingly parallel task. In the example below, we use bootstrapping to generate a 95% confidence interval for R-squared of a linear regression. Using parLapply we can split the n*1000 bootstrap replicates between the n cores.

library(boot)

run1 = function(...){
  library(boot)
  rsq = function(formula, data, indices){
    d = data[indices,]
    fit = lm(formula, data=d)
    return(summary(fit)$r.square)
  }
  boot(data=mtcars, statistic=rsq, R=1000, formula=mpg~wt+disp)
}

system.time(car.boot <- do.call(c, lapply(seq_len(nCores), run1)))
##    user  system elapsed 
##   8.216   0.075   8.498
clusterSetRNGStream(cl, iseed=123)
system.time(
  car.boot2 <- do.call(c, parLapply(cl, seq_len(nCores), run1))
)
##    user  system elapsed 
##   0.003   0.000   4.204

boot.ci(car.boot2, type="bca")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 4000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = car.boot2, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.6510,  0.8564 )  
## Calculations and Intervals on Original Scale

Note that the boot package has built-in parallel support, so we can also simply run the following:

rsq = function(formula, data, indices){
  d = data[indices,]
  fit = lm(formula, data=d)
  return(summary(fit)$r.square)
}
nBoot = nCores*1000
set.seed(123, kind="L'Ecuyer")
system.time(
  car.boot3 <- boot(data=mtcars, statistic=rsq, R=nBoot, 
              formula=mpg~wt+disp, parallel="multicore", ncpus=4)
)
##    user  system elapsed 
##  11.460   0.383   3.602

Stop cluster

It's a good idea to stop your cluster when you are done using it.

stopCluster(cl)

references