Showing posts with label rpy2. Show all posts
Showing posts with label rpy2. Show all posts

Sunday, July 15, 2012

Exciting Python Machine Learning Package (scikit-learn)

A while back, I blogged about using rpy2 to leverage the power of plotting and aplenty model selection of R in python. It's usable but still a bit cumbersome. Turns out there is even an easier way to do machine learning in python: use the scikit-learn.

Scikit-learn is another project born out of Google's summer of code. It's currently only 0.11, but has been around for 2+ years and supports many models in supervised and unsupervised learning. Its BSD license may be more attractive to people who is considering embedding a machine learning library in their own products. Overall it seems to be a very exciting new module to be added to python's machine learning toolkit. 

Their web site is full of useful info (docs, tutorials, and demo videos), so go check it out: scikit-learn.org

P.S. if you encounter problems installing scikit-learn on your mac, here is a very useful page on installing all the required packages: http://kral4u.blogspot.com/2012/07/installing-numpy-scipy-matplotlib.html. Also highly recommend upgrading easy_install to pip.

Sunday, June 10, 2012

How to Install rpy2 on Mac OS X 10.7 (Lion)

Python and R are powerful tools for machine learning and data analysis. Like super heroes in movies, their power can be unmatched when combined. Python provides a richer set of scientific and data processing modules, while R provides easier plotting and analytic modeling capabilities. 

To access R from python, you will need to install the rpy2* package. Usually, it's just as easy as running the python "easy_install":

easy_install rpy2

However, I found I had to jump through a few hoops to get the rpy2 package compiled and installed on my mac. The time I spent/wasted convinced me the info is worth sharing. 

If you encounter errors while running easy_install on your mac os 10.7, try the following steps: 

1. Install Xcode on your mac.
You will need the gcc compiler to build the rpy2. If Xcode is not installed, download and install it from the mac app store. (It's free.) Then install the command line tools from the xcode (go to preferences -> Downloads tab and click the "Install" button next to the Command Line Tools). This is what the preferences pop-up looks like after installation.

    
    Note if you upgraded your mac os to 10.7 (lion) from 1.6 (snow leopard) and had xcode installed before the upgrade, you still have to do this since the old xcode tools were moved from /usr/bin/ to /Developer/usr/bin (it was a surprise to me) and the old binaries may not work properly.

2. Make sure your R installation is shared library enabled. If not, build it yourself. 
You will need the header files from R to build rpy2. If your R is installed from a binary only release (i.e installed from the one click mac os R package), you need to download the R source code and build it yourself. Here is the instruction from CRAN on how to build R from the source: http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-_0028Mac_0029-OS-X

You may have to install the gfortran to build R. Unfortunately the gfortran link provided from CRAN site does not work for osx 10.7. Make sure you get the right version. You can find a good reference here:

3. Download and build rpy2. 
The rpy2 page on source forge (http://rpy.sourceforge.net/rpy2/doc-2.2/html/overview.html#installation) provides pretty good instructions on how to build and install rpy2.  

Notice the default python installation (/usr/bin/python) on Lion is python 2.7. If you encounter version compatibility issue, you can still build it using python 2.6:

export ARCHFLAGS='-arch i386 -arch x86_64'
/usr/bin/python2.6 setup.py build  # specify --r-home if R not in default location

4. Install and test.
After successfully building it, you can install the python package (to the same version you used to build the package):

python setup.py install

and verify your installation with the following:

import rpy2.robjects as robjects


If you don't see any error, congratulations, your rpy2 is ready to go.


*rpy2 is the redesign of the rpy module. It's interface to R is better designed and is recommended over the rpy module.