# Copy and paste this into an R console and run it.
# changing N will make it run faster or slower...
N <- 200000;
n <- 10;
Result <- rep(0, n);
for( i in 1:n){
# Do something and put the answer in Result.
for(j in 1:N){ # this for loop is to slow down execution
Result[i] <- j+i;
}
}
The first thing you should do is add some sort of output so that you know where the computer is in
the calculations. There is nothing more frustrating than having no idea when your simulation is
going to finish. So lets add a line at the start of the loop stating what loop iteration we are on.
Again, run this in an R console.
# Copy and paste this into an R console and run it.
N <- 200000;
n <- 10;
Result <- rep(0, n);
for( i in 1:n){
# Write to the console what loop iteration we're on
cat( 'Iteration: ', i, '\n');
# Do something and put the answer in Result.
for(j in 1:N){ # this for loop is to slow down execution
Result[i] <- j+i;
}
}
Your program should not produce a huge amount of output. It doesn't make sense to have it output
something at every single step because you don't want to look at that much stuff. My simulations
usually output something every 5 or 10 minutes, not every .25 seconds. One reason to be careful
about this is that time spent writing output is time not doing your simulation.
The next thing to think about is how you want to access your results. Typically I will save a data structure with my results to a file. This way, when it comes time to look at the results, I'll just load the results file and continue.
# Copy and paste this into an R console and run it.
N <- 200000;
n <- 10;
Result <- rep(0, n);
for( i in 1:n){
# Write to the console what loop iteration we're on
cat( 'Iteration: ', i, '\n');
# Do something and put the answer in Result.
for(j in 1:N){ # this for loop is to slow down execution
Result[i] <- j+i;
}
}
save(Result, file='MyResultsFile.RData');
Save the above program in a file. I chose
Sim.R. This program does
everything we want and we *could* just run it on a PC by copying and
pasting into an R console. While we would get regular updates on where
it was in the execution, we would have to remain logged onto a PC. On
linux we'll be able to log on, start the process, log off, and then come
back a couple days later to see if it is done.
> R --save < Sim.RThis will invoke R, (and save the R environment when it is done because you told it to with the --save) and run whatever commands are in Sim.R. The only problem is that we are still stuck looking at the R output. It would be much better if we just capture all the stuff that R is spewing to a file. This is called re-directing the output. Notice that when you run the below command, that the R output is no longer being printed to the screen. Instead it is going into the file spew.txt .
> R --save < Sim.R > spew.txtIf there was already a file named spew.txt then one of two things might happen. either it will complain that spew.txt already exists and you have to delete the file first and then try again, or it will just overwrite the file. Another option is to replace > with a >> which will append the output to the existing file if it exists. This has the potential to make a very large file, so make sure you delete it when you are finished with your simulation. Another choice is to replace > with >! which will force it to overwrite the file.
Since spew.txt is basically garbage that you don't want, but will check a few times, you might considering writing this to the local machine's /tmp directory. This will speed things up a tad because you won't be writing to a network drive. Zube has a nice description of why this is useful on the stat FAQ . It might even be appropriate and easy to run the entire simulation out of the /tmp directory, although you should double check that there is enough disk space using the command df -h /tmp which will tell you how much space is used and how much is available.
Once the above command has run, examine the contents of spew.txt. There are several ways to do this, but the most useful in this case is to use the command tail which will show only the last few lines of a file. One neat flag that you can give is tail -f which will continue to look at the file and redisplay the file when ever it is appended to. This won't be important when you are checking the file every couple hours, but is useful if you just want to sit and watch it for a few minutes without constantly typing the tail command. Another command that could be used is less. Amusingly, there is another command more that does just about the same thing as less but it isn't as nice to use. This reminds me that unix people have a sense of humor.
> tail spew.txt
The next thing that would be nice is if we could kick this program off and not have to be logged in while it runs. To do this, we have to tell the operating system to run the job in the background and not have it be associated with me being logged in. To do this, all we have to do is put an & at the end of the command.
> R --save < Sim.R > spew.txt &While this is running, try running the tail spew.txt command a few times. You should see the file being updated. At this point we could log off the machine, even if the simulation hasn't finished, and the program will continue to run. It will eventually finish its calculations, save the Result vector to the file that we told it, and quit. Because we called R with the --save option, then when R finishes it will save everything to its history file. If you start R up again in the same directory, it will try to reload that history.
One final thing. If 5 people are hammering a machine with simulations, the computer will slow down and make it hard to log into, which would be annoying for people that might want to check how their simulations are going. It would be neat if it were possible for the operating system to know that your process is as important as Zube logging in and fixing something. That is exactly what the nice command does. All a user has to do is put a nice before any command that he or she wants to have lower priority.
> nice R --save < Sim.R > spew.txt &then log off. You'll have lunch, go to class, go home, sleep, come back the next day, and log on again. Then go into the simulation directory and type
> tail spew.txtwhich will tell us if the simulation is done or if the you have to wait longer.
If it is done running you'll probably copy the MyResultsFile.RData file back to the windows side, and write an R program to finish the analysis. To do that you'd write a program that does something like this
# load the results of your simulation
mydata <- load('MyResultsFile.RData');
# do something appropriate
mean(mydata);