Sunday, June 10, 2012

How to Install rpy2 on Mac OS X 10.7 (Lion)

Python and R are powerful tools for machine learning and data analysis. Like super heroes in movies, their power can be unmatched when combined. Python provides a richer set of scientific and data processing modules, while R provides easier plotting and analytic modeling capabilities. 

To access R from python, you will need to install the rpy2* package. Usually, it's just as easy as running the python "easy_install":

easy_install rpy2

However, I found I had to jump through a few hoops to get the rpy2 package compiled and installed on my mac. The time I spent/wasted convinced me the info is worth sharing. 

If you encounter errors while running easy_install on your mac os 10.7, try the following steps: 

1. Install Xcode on your mac.
You will need the gcc compiler to build the rpy2. If Xcode is not installed, download and install it from the mac app store. (It's free.) Then install the command line tools from the xcode (go to preferences -> Downloads tab and click the "Install" button next to the Command Line Tools). This is what the preferences pop-up looks like after installation.

    
    Note if you upgraded your mac os to 10.7 (lion) from 1.6 (snow leopard) and had xcode installed before the upgrade, you still have to do this since the old xcode tools were moved from /usr/bin/ to /Developer/usr/bin (it was a surprise to me) and the old binaries may not work properly.

2. Make sure your R installation is shared library enabled. If not, build it yourself. 
You will need the header files from R to build rpy2. If your R is installed from a binary only release (i.e installed from the one click mac os R package), you need to download the R source code and build it yourself. Here is the instruction from CRAN on how to build R from the source: http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-_0028Mac_0029-OS-X

You may have to install the gfortran to build R. Unfortunately the gfortran link provided from CRAN site does not work for osx 10.7. Make sure you get the right version. You can find a good reference here:

3. Download and build rpy2. 
The rpy2 page on source forge (http://rpy.sourceforge.net/rpy2/doc-2.2/html/overview.html#installation) provides pretty good instructions on how to build and install rpy2.  

Notice the default python installation (/usr/bin/python) on Lion is python 2.7. If you encounter version compatibility issue, you can still build it using python 2.6:

export ARCHFLAGS='-arch i386 -arch x86_64'
/usr/bin/python2.6 setup.py build  # specify --r-home if R not in default location

4. Install and test.
After successfully building it, you can install the python package (to the same version you used to build the package):

python setup.py install

and verify your installation with the following:

import rpy2.robjects as robjects


If you don't see any error, congratulations, your rpy2 is ready to go.


*rpy2 is the redesign of the rpy module. It's interface to R is better designed and is recommended over the rpy module.