Parallel computing with WinBUGS/OpenBUGS by using the snowfall package in R: Part 1

The motivation to do parallel computing with WinBUGS/OpenBUGS is that I’m trying to fit a set of Bayesian regression models to 100~800 simulated datasets.  My solution follows the partial example code originally provided by Josh Nowak and posted at the BUGS email list by Alan Kelley.

To speed up fitting one dataset, one of the solutions is to distribute multiple MCMC chains (usually 3) to several CPU cores (am I using the right terms? I’m no longer as familiar with computer hardwares any more as back to my college days).

To speed up fitting multiple datasets, the idea is to distribute them to several CPU cores.

I’ll post an example of how to distribute several chains to different cores here, and post another example of how to distribute several datasets later.

Here is the example:  I simulated 100 datasets, and for each dataset I created a 8-fold cross-validation (CV) datasets.  I want to fit a Bayesian regression to each of the CV dataset, that is 800 (=100*8) in total.  I have a Dell PC with Intel Core2 Duo Processor, so I basically have two cores (or processors?) for parallel computing.  I’m gonna run two MCMC chains for each regression model and let them run on separate cores simultaneously.

I use R2OpenBUGS to call OpenBUGS from R.  Here is the beginning of the R code.

The rlecuyer package is used to set different processors with different random seeds to let the two MCMC chains independent from each other (i.e. starting with different initial parameter values).  The snowfall package is essentially a wrapper of the real functions to set up parallel computing implemented in the snow package.

The next step is to set up some basic parameters for the data and MCMC. These parameters should be clear given the comments in my R code.  I have a duo-processor PC, so I specify the number of processors to be 2.  I have 100 simulated datasets, each of which has a 8-fold CV datasets.  These two parameters are set up for a loop.  I’m gonna run 2 MCMC chains in total, and one chain on one processor, so here I specify the number of chains to be run on one processor to be 1.  I run 50,000 iterations for each chain.

Next is to start the loop to load each CV dataset and set up things for OpenBUGS.  Basically, there is an individual-level predictor (x1) and a neighborhood-level predictor (z1).  The dependent variable is saved in y.  The code in lines 49-51 set up the data list, parameter list, and initial value to be passed to OpenBUGS.

Finally, do the parallel computing. The function sfInit() tells the computer that “I’m gonna use two processors to do something”.  From this point afterwards, you have to tell each processor what’s going on, what data, variables, etc. are defined or available, and  what to do.

The function sfExportAll() tells each processor what variables defined in the R code above they need to know.  For example, each processor has to know what data, parameters, and initial values are used to fit the model (as defined in line 49-51).

The function sfLibrary(R2OpenBUGS)  will load the R2OpenBUGS package to each of all the processors.

The function sfClusterSetupRNG() will set different random streams for different processors to make sure each MCMC chain is independent from each other.

The key to distribute each chain on a separate processor is to use the function sfLapply which is a parallel computing version of the lappy function in R.  The code in line 62 basically says applying a function with an argument k to the sequence 1:nCPU (i.e. 1:2 in this case).  The argument k “loop through” the values in 1:nCPU.  And the function is basically a call to the bugs function from R2OpenBUGS which takes the arguments defined in the previous step.  Since n.chains = nChains = 1, two OpenBUGS will be launched, each running one MCMC chain on separate processors.

There is one trick here worth special attention.  R2OpenBUGS will create several txt files, including one model file, files of initial values (as many as the number of chains), and a log file storing BUGS results.  These are the actual files used to run OpenBUGS.  Since we are calling two OpenBUGS programs simultaneously, and R2OpenBUGS will write a set of files with same names (i.e. model.txt, inits1.txt, log.txt, etc.) to the hard drive, there will be a conflict between the two OpenBUGS programs.  Therefore, I create two folders named “1” and “2” under the working directory (line 66: working.directory = paste(work.dir, k, sep=””)).  The R2OpenBUGS running on one processor will write out the files to the folder named “1” and initiate an OpenBUGS to run, while the other R2OpenBUGS running on the second processor will write out the files to the folder named “2” and initiate a second OpenBUGS to run.

About Hongwei Xu

I'm a social demographer, a single-child, a husband, and a father.
This entry was posted in Research. Bookmark the permalink.

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s