Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Sunday, June 10, 2012

How to Install rpy2 on Mac OS X 10.7 (Lion)

Python and R are powerful tools for machine learning and data analysis. Like super heroes in movies, their power can be unmatched when combined. Python provides a richer set of scientific and data processing modules, while R provides easier plotting and analytic modeling capabilities. 

To access R from python, you will need to install the rpy2* package. Usually, it's just as easy as running the python "easy_install":

easy_install rpy2

However, I found I had to jump through a few hoops to get the rpy2 package compiled and installed on my mac. The time I spent/wasted convinced me the info is worth sharing. 

If you encounter errors while running easy_install on your mac os 10.7, try the following steps: 

1. Install Xcode on your mac.
You will need the gcc compiler to build the rpy2. If Xcode is not installed, download and install it from the mac app store. (It's free.) Then install the command line tools from the xcode (go to preferences -> Downloads tab and click the "Install" button next to the Command Line Tools). This is what the preferences pop-up looks like after installation.

    
    Note if you upgraded your mac os to 10.7 (lion) from 1.6 (snow leopard) and had xcode installed before the upgrade, you still have to do this since the old xcode tools were moved from /usr/bin/ to /Developer/usr/bin (it was a surprise to me) and the old binaries may not work properly.

2. Make sure your R installation is shared library enabled. If not, build it yourself. 
You will need the header files from R to build rpy2. If your R is installed from a binary only release (i.e installed from the one click mac os R package), you need to download the R source code and build it yourself. Here is the instruction from CRAN on how to build R from the source: http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-_0028Mac_0029-OS-X

You may have to install the gfortran to build R. Unfortunately the gfortran link provided from CRAN site does not work for osx 10.7. Make sure you get the right version. You can find a good reference here:

3. Download and build rpy2. 
The rpy2 page on source forge (http://rpy.sourceforge.net/rpy2/doc-2.2/html/overview.html#installation) provides pretty good instructions on how to build and install rpy2.  

Notice the default python installation (/usr/bin/python) on Lion is python 2.7. If you encounter version compatibility issue, you can still build it using python 2.6:

export ARCHFLAGS='-arch i386 -arch x86_64'
/usr/bin/python2.6 setup.py build  # specify --r-home if R not in default location

4. Install and test.
After successfully building it, you can install the python package (to the same version you used to build the package):

python setup.py install

and verify your installation with the following:

import rpy2.robjects as robjects


If you don't see any error, congratulations, your rpy2 is ready to go.


*rpy2 is the redesign of the rpy module. It's interface to R is better designed and is recommended over the rpy module.


Friday, May 11, 2012

How to install R on Cloudera CDH3

I wanted to play with the RHadoop package to see how R worked with Hadoop. Since the demo CDH3 image I was using from Cloudera did not bundle R, the first thing I had to do was to install R. Easy, I thought, I just needed to install the 3 R rpms from CRAN and it would be done.

Turned out the R rpms had a lot of dependencies (about 20-30 of extra rpms required) and the easiest way to install them was to install the EPEL (extra package for enterprise linux) repo first. Unfortunately the repo location returned from the google search (http://download.fedora.redhat.com) didn't seem to be working any more. Finally, I found the right repo and everything was done in just 2 commands:


$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
$ sudo yum install R


*replace the x86_64 with i386 if you are installing on a 32-bit system.


Monday, April 2, 2012

A simple R script to find the Pi

Recently I am working on a data mining project and investigating on different visualization  and computation tools to help to analyze the data model. Both Octave and R with R Studio interest me.

I am eager to try out both tools and come across an interesting article about using Monte Carlo simulation to find out the pi. I know this is not exactly data mining topic. However, it is still interesting.

Here is the article by Niall O'Higgins.

http://niallohiggins.com/2007/07/05/monte-carlo-simulation-in-python-1/

The author uses python to demonstrate the concept. After learning R will couple hours, I rewrote code in R.

n <- 1000000
x <- runif(n, 0.0, 1.0)
y <- runif(n, 0.0, 1.0)
score <- ifelse((sqrt(x^2 + y^2) <=1.0), 1, 0)
hit <- sum(score)
myPi <- 4 * hit / n

I am surprisingly the result code can be pretty compact. R seems to have a better data import/export capability than Octave. I am very impressed with both Octave and R can do.