Monday, May 19, 2014

OpenStack - OpenVSwitch Agent won't register

I hit the wall today when I tried to add an another compute node to my OpenStack. Everything seemed to install fine. However, the Openvswitch Agent did not register itself. Of course, I am using the Oracle Enterprise Linux which is not official supported yet by OpenStack and it made me uneasy. During the installation process, I hit a few problems of loading the wrong yum packages and I correct them manaully.

First I checked out neutron-openvswitch-agent.log on the compute node. There is no trace and error. I checked out the neutron-server.log, no trace and error neither.  I understand that the registration mechanism is by sending a report_status message from the agent to neutron controller.

I used rabbitmqctl to check for the existence of the queue and it did. It really buffled me. So I compared the message in the neutron-server.log.  With the surprise, the clock on the new compute node was not set correctly and the '_context_timestamp' is off in the debug messager.

2014-05-19 15:13:33.923 21639 DEBUG neutron.db.agents_db [req-915d3718-8489-4bd1-905f-3a02959b2746 None] Message with invalid timestamp received report_state /opt/stack/neutron/neutron/db/agents_db.py:214

So the time is the problem.  I adjusted the clock. Magically, the agent is registered. I hope there is much more clear message to idenify the problem.

Friday, May 9, 2014

Adding your own configuration to OpenStack Neutron using oslo.config

If you are doing OpenStack development,  eventually you will want to add your own configuration in OpenStack for you own plugin/mechanism driver. The steps required is pretty simple.

Let say my plugin/mechanism driver need to talk to a web service running to perform some operation and I do not want to hard code the web service configuration.

OpenStack uses oslo.config to do the configuration parsing. 

First, you need to tell oslo.config what configuration options are needed.

Here is the sample code fragment to create my configuration parameter list. 

ml2_my_opts = [
    cfg.StrOpt('hostname', default=None,
               help=_("The hostname of the my service")),
    cfg.IntOpt('port', default=8443,
               help=_("The port number of my service")),
    cfg.StrOpt('username', default=None,
               help=_("The username of my service")),
    cfg.StrOpt('password', default=None,
               help=_("The password of my service"))]

I specify the hostname, port, username and password. I set the port default value as 8443.

Then I register it by doing the following.

cfg.CONF.register_opts(my_service_opts, "my_service")

"my_service" is optional. If specified, it is the namespace I am going to use. 

To retrieve the configuration,

hostname = cfg.CONF.my_service_opts.hostname
port = cfg.CONF.my_service_opts.port
username = cfg.CONF.my_service_opts.username
password = cfg.CONF.my_service_opts.password

Now the coding is done. 

Let put the parameters on the configuration file. The configuration file is specified by the --config-file in the argument of python. For example, 

stack    17357  0.1  0.0 284908 52156 pts/9    S+   08:57   0:05 python /usr/bin/neutron-server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini

For the neutron-server, two configuration files are specified. One is /etc/neutron/neutron.conf and the other is /etc/neutron/plugins/ml2/ml2_conf.ini.

For the above example, I will put the following configuration in one of the configuration file listed above.

[my_service]
hostname=my_host
port=8443
username=guest
password=guest

With oslo.config, adding your own additional configuration is extremely simple.

Wednesday, April 9, 2014

Listen to OpenStack Neutron Messages from RabbitMQ using Kombu messaging library

As I continue to investigate on how to write the plugin or more precise the mechanism driver for the neutron ml2 plugin. I would like to look at the interaction among nova, neutron and its agents. I found out that some of the communication is using the RabbitMQ messaging. I understand that neutron uses the python kombu message library.  So I trying to write a few lines of code to listen to the messages.

I've modified the sample code of worker.py from here to suit my need. Here is my initial code.

from kombu.mixins import ConsumerMixin
from kombu.log import get_logger
from kombu import Queue, Exchange

logger = get_logger(__name__)


class Worker(ConsumerMixin):
    task_queue = Queue('notifications.info', Exchange('neutron', 'topic'))

    def __init__(self, connection):
        self.connection = connection

    def get_consumers(self, Consumer, channel):
        return [Consumer(queues=[self.task_queue],
                         accept=['json'],
                         callbacks=[self.process_task])]

    def process_task(self, body, message):
        print("RECEIVED MESSAGE: %r" % (body, ))
        message.ack()

if __name__ == '__main__':
    from kombu import Connection
    from kombu.utils.debug import setup_logging
    # setup root logger
    setup_logging(loglevel='DEBUG', loggers=[''])

    with Connection('amqp://guest:supersecrete@localhost:5672//') as conn:
        try:
            print(conn)
            worker = Worker(conn)
            worker.run()
        except KeyboardInterrupt:
            print('bye bye')


The above highlighted codes are the changes. I make sure I use the correct queue name and exchange, and the 'guess' password you set in your setup. 

To find out which queue name and topic available, I use

sudo rabbitmqctl list_exchanges

and

sudo rabbitmqctl list_queues

To find out more info about rabbitmqctl,  read the man page here.

I pick the 'notifications.info' queue because I am interesting to look into the 'port.create.start' and 'port.create.end' events which are useful to my current work.

Then I ran the above program,
<Connection: amqp://guest@localhost:5672// at 0x2396050>

Everything seemed fine but I did not receive any events when the port creation was triggered by instance creation.

So what did it go wrong?

After poking a few places, I saw the message from the rabbitmq log. The log is located at /var/log/rabbitmq.

=ERROR REPORT==== 8-Apr-2014::17:47:06 ===
connection <0.25614.29>, channel 1 - soft error:
{amqp_error,precondition_failed,
            "cannot redeclare exchange 'neutron' in vhost '/' with different type, durable, internal or autodelete value",
            'exchange.declare'}

So the default settings of the kombu topic is different from the Neutron ml2 plugin of the OpenStack. RabbitMQ thought I tried to redeclare some of the attributes of the topic. Since sample code uses the default settings of the Exchange class, I check the default settings from the API doc and the settings of the topic using 'rabbitmqctl list_exchanges'.

Note: The default output of 'sudo rabbitmqctl list_exchanges' only shows name and type attributes. To list the addition attributes, you need to specify them as arguments. For example, 'sudo rabbitmqctl list_exchanges name type autodelete' lists the name, type and autodelete attributes. Please see the man page for details.

I found that the autodelete attributes for both topic and queue needs to be set to False. Here is the source code with the highlighted changes.

from kombu.mixins import ConsumerMixin
from kombu.log import get_logger
from kombu import Queue, Exchange

logger = get_logger(__name__)


class Worker(ConsumerMixin):
    task_queue = Queue('notifications.info', Exchange('neutron', 'topic', durable=False), durable=False)

    def __init__(self, connection):
        self.connection = connection

    def get_consumers(self, Consumer, channel):
        return [Consumer(queues=[self.task_queue],
                         accept=['json'],
                         callbacks=[self.process_task])]

    def process_task(self, body, message):
        print("RECEIVED MESSAGE: %r" % (body, ))
        message.ack()

if __name__ == '__main__':
    from kombu import Connection
    from kombu.utils.debug import setup_logging
    # setup root logger
    setup_logging(loglevel='DEBUG', loggers=[''])

    with Connection('amqp://guest:supersecrete@localhost:5672//') as conn:
        try:
            print(conn)
            worker = Worker(conn)
            worker.run()
        except KeyboardInterrupt:
            print('bye bye')

After making the changes, now I can receive the messages. 

Saturday, April 5, 2014

Devstack with Oracle Enterprise Linux

Currently I am working on a development of OpenStack Neutron plugin for a network switch. One of the OS I need to deploy on is Oracle Enterprise Linux. Oracle announced the support of OpenStack in last December, the annocument is here. However, if you try to use devstack to setup your enviroment, it still complains that it is not a supported platform.

If you make the two line changes on the funtions-common in devstack show below.

--- a/functions-common
+++ b/functions-common
@@ -364,8 +364,6 @@ function GetOSVersion {
             fi
         elif [[ $os_VENDOR == "openSUSE project" ]]; then
             os_VENDOR="openSUSE"
-        elif [[ $os_VENDOR == "OracleServer" ]]; then
-            os_VENDOR="Red Hat"
         elif [[ $os_VENDOR =~ Red.*Hat ]]; then
             os_VENDOR="Red Hat"

         fi

Run stack.sh again, you can deploy it without any issue.

In next article, I will talk about using vagrant, pycharm and virtual box to set up your development environment.

Sunday, September 8, 2013

Prezi - What Power Point Should Be

I have always found PowerPoint limiting and wanted to find a presentation tool that feels more natural and provides better story telling capability. Last year I attended a meetup and noticed the tool used by the presenter was quite unusual. It zoomed in and out and worked like a mind map. I asked the presenter which tool he was using (ironically I found the tool more interesting than the topic he was presenting). He told me he was using a product called Prezi.

I made a mental notes to check it out. it was indeed very cool. I signed up a free account (like evernote, you can sign up a free account, the tool is cloud based) and played with it a bit. I liked it a lot. However, I didn't have any presentation I needed to prepare then so I just put the idea back on the shelf.

Lately I was preparing a design doc and found Prezi's ability to drill down to details really a great help. I can present an overall system architecture first, then zoom in to the design of each module, then zoom in to even more details. Very intuitive. Also, because everything in Prezi stays on the same canvas (instead of being fragmented into slides), the story line is preserved visually, making it easier to show the big picture too.

Plus, it's really fun to create a Prezi. I felt like becoming a kid in the elementary school art class again. (Don't let me mislead you, you can use a Prezi template and quickly create a nice presentation instead of starting from scratch.) To me, a tool that can makes me feel playful, is as good as it gets.


Sunday, June 30, 2013

A Few Tricks in Writing End-to-end JAX-RS Unit Tests with Mock Objects

Lately I was helping a team migrating their spring mvc based web service to a JAX-RS based (more specifically, Jersey) implementation. When the team developed their current web services, they didn't have time to develop the end-to-end unit tests, so things could only be tested through the browser after a full deployment. It was time consuming, so the first thing I wanted to make sure when I started the migration was I should be able to unit test things end-to-end without a full deployment*. 

Jersey was designed with this in mind. You can extend your unit test from JerseyTest  and it can deploy the web resources to an embedded web container. Combined with tools like RestAssured and Mockito (or any other mocking framework), you can start an embedded grizzly container with your web resources (with JerseyTest), mock the service layer (using Mockito), and test it with real REST calls (with RestAssured).    

Things become slightly tricky when you are also using Spring. The Spring context is created and hidden inside the embedded web container**. If you need to access some Spring beans during testing, there is no easy way to get them because the Spring context is not available. One web post suggested modifying the JerseyTest and extending the container factory to provide a hook. Though this approach works***, it seems a bit too heavy to me. Is there a simpler way? 

The answer, turns out to be extremely simple if you are using Spring 3. In Spring 3, you can annotate a class with @Configuration to indicate the class will provide context configuration and handle bean creations with @Bean annotation on a static method. Since a bean can "capture" the context if it implements  the ApplicationContextAware interface, we can "capture" the context easily by adding the @Configuration annotation to the test itself and created a context holder bean, as showed in the following code snippet:   



To demonstrate how an end-to-end unit test can be developed using this technique, I wrote a sample test case which combined all the techniques mentioned in this post. You can access the code on github (https://github.com/jiunjiunma/spring-jersey-test). Had you needed to unit test your RESTful web services, these can be pretty neat tricks to add to your toolbox.


*It also gave me more confidence when I did the refactoring.

**Note when you add the @ContextConfiguration annotation with JerseyTest, a new spring context is created but is not known to the embedded web container.

***The author had made the code available. The link on the original post is broken, but you can still do a google search on the class name and find the code.

Sunday, November 25, 2012

Babbage's Difference Engine in Action


I used the holiday season to visit the computer history museum in Mountain View. It was not the first time I had visited this wonderful museum (a must see if you ever visit the Silicon Valley), but it was the first time I saw their Babbage machine (Difference Engine #2, Serial 2) in action (it was in repair due to the shipping damage last time I visited the museum). When I saw the double helix pattern appearing when the machine turned to do a calculation, I was speechless. It's like seeing the birth of the modern computer!

I would not go into the details about the machine, for which I encourage you to visit the museum (or the Science Museum in London), but I found a few "design principles" Babbage applied to the machine quite interesting. For example, to "improve" the speed of the calculation, the machine was designed to do the calculation in parallel. Babbage also added a printing system, a design he borrowed from his "analytical machine" (which he never finished). This reusable module could even print different fonts! He also designed the wheels in such a way that if  some mechanical parts were out of sync (so they would generate wrong results) they would jam the whole machine so a properly functional machine would never generate incorrect calculations. It's fascinating to see those design ideas or safety mechanisms applicable even in the modern software or hardware design.

One interesting anecdote about the machine. The restoration project was sponsored by Microsoft's ex-CTO Nathan Myhrvold. He ordered a duplicate to be made so he could display it in his living room. It was on loan to the computer history museum when he was getting his home ready. Well, let's hope his home remodeling takes a bit longer so more people can see this wonderful machine in its full glory.

One of the two fully restored Babbage difference engines in the world.
Details of the difference engine You can read the results of the calculation here or have it printed.
The printing system. It supports multiple fonts.

Tuesday, September 25, 2012

Thought on AWS's Fast I/O Instance

A while back I wrote a blog on how to use Delphix in AWS cloud. It was sort of a thought experiment, because the performance of EBS was too slow for serious database usage. However, with the release of Amazon's latest high performance EC2 SSD instance, things changed. Netflix had done their Cassandra benchmark with the new instance. They were able to use this instance to replace the m2.4xlarge instance with cache and cut about 60% of cost.

To me, the fast I/O EC2 instance now makes AWS very attractive to be used for data intensive analytics project. It also means you can really run databases in the cloud without relying on heavy (sometimes custom) caching*. I can hardly wait to see if I can use it in my future project.

*See the very interesting five-minute rule on how SSD improves the disk performance.

Sunday, July 15, 2012

Exciting Python Machine Learning Package (scikit-learn)

A while back, I blogged about using rpy2 to leverage the power of plotting and aplenty model selection of R in python. It's usable but still a bit cumbersome. Turns out there is even an easier way to do machine learning in python: use the scikit-learn.

Scikit-learn is another project born out of Google's summer of code. It's currently only 0.11, but has been around for 2+ years and supports many models in supervised and unsupervised learning. Its BSD license may be more attractive to people who is considering embedding a machine learning library in their own products. Overall it seems to be a very exciting new module to be added to python's machine learning toolkit. 

Their web site is full of useful info (docs, tutorials, and demo videos), so go check it out: scikit-learn.org

P.S. if you encounter problems installing scikit-learn on your mac, here is a very useful page on installing all the required packages: http://kral4u.blogspot.com/2012/07/installing-numpy-scipy-matplotlib.html. Also highly recommend upgrading easy_install to pip.

Sunday, June 10, 2012

How to Install rpy2 on Mac OS X 10.7 (Lion)

Python and R are powerful tools for machine learning and data analysis. Like super heroes in movies, their power can be unmatched when combined. Python provides a richer set of scientific and data processing modules, while R provides easier plotting and analytic modeling capabilities. 

To access R from python, you will need to install the rpy2* package. Usually, it's just as easy as running the python "easy_install":

easy_install rpy2

However, I found I had to jump through a few hoops to get the rpy2 package compiled and installed on my mac. The time I spent/wasted convinced me the info is worth sharing. 

If you encounter errors while running easy_install on your mac os 10.7, try the following steps: 

1. Install Xcode on your mac.
You will need the gcc compiler to build the rpy2. If Xcode is not installed, download and install it from the mac app store. (It's free.) Then install the command line tools from the xcode (go to preferences -> Downloads tab and click the "Install" button next to the Command Line Tools). This is what the preferences pop-up looks like after installation.

    
    Note if you upgraded your mac os to 10.7 (lion) from 1.6 (snow leopard) and had xcode installed before the upgrade, you still have to do this since the old xcode tools were moved from /usr/bin/ to /Developer/usr/bin (it was a surprise to me) and the old binaries may not work properly.

2. Make sure your R installation is shared library enabled. If not, build it yourself. 
You will need the header files from R to build rpy2. If your R is installed from a binary only release (i.e installed from the one click mac os R package), you need to download the R source code and build it yourself. Here is the instruction from CRAN on how to build R from the source: http://cran.r-project.org/doc/manuals/R-admin.html#Installing-R-under-_0028Mac_0029-OS-X

You may have to install the gfortran to build R. Unfortunately the gfortran link provided from CRAN site does not work for osx 10.7. Make sure you get the right version. You can find a good reference here:

3. Download and build rpy2. 
The rpy2 page on source forge (http://rpy.sourceforge.net/rpy2/doc-2.2/html/overview.html#installation) provides pretty good instructions on how to build and install rpy2.  

Notice the default python installation (/usr/bin/python) on Lion is python 2.7. If you encounter version compatibility issue, you can still build it using python 2.6:

export ARCHFLAGS='-arch i386 -arch x86_64'
/usr/bin/python2.6 setup.py build  # specify --r-home if R not in default location

4. Install and test.
After successfully building it, you can install the python package (to the same version you used to build the package):

python setup.py install

and verify your installation with the following:

import rpy2.robjects as robjects


If you don't see any error, congratulations, your rpy2 is ready to go.


*rpy2 is the redesign of the rpy module. It's interface to R is better designed and is recommended over the rpy module.


Wednesday, May 23, 2012

Mechanical Symapthy

I came across this article written by Martin Fowler

http://martinfowler.com/articles/lmax.html

Original I was looking for more info about the LMAX Disruptor after Nathan Marz talked about replacing the traditional queues with the LMAX Disruptor in his Storm 0.8 to increase the overall performance in a meetup.

In second part of the Martin Fowler's article, it was the first time I read the term "Mechanical Sympathy". According to the article, quote "The term comes from race car driving and it reflects the driver having an innate feel for the car, so they are able to feel how to get the best out of it." unquote.  Basically, you need to understand how the modern hardware work in order to squeeze the last drop of the performance. In these days, it is no longer just the disk is a very slow operation, even memory is a very slow operation as well. You want to make sure that your code and data are in the cache to get the performance.


This reminds me the old day when I worked on the embedded system, you needed to consider both the software and hardware could provide in order to come up the best solution.







Friday, May 11, 2012

How to install R on Cloudera CDH3

I wanted to play with the RHadoop package to see how R worked with Hadoop. Since the demo CDH3 image I was using from Cloudera did not bundle R, the first thing I had to do was to install R. Easy, I thought, I just needed to install the 3 R rpms from CRAN and it would be done.

Turned out the R rpms had a lot of dependencies (about 20-30 of extra rpms required) and the easiest way to install them was to install the EPEL (extra package for enterprise linux) repo first. Unfortunately the repo location returned from the google search (http://download.fedora.redhat.com) didn't seem to be working any more. Finally, I found the right repo and everything was done in just 2 commands:


$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
$ sudo yum install R


*replace the x86_64 with i386 if you are installing on a 32-bit system.


Monday, April 30, 2012

HDFS - java.io.IOException: File XXX could only be replicated to 0 nodes, instead of 1

I am playing with the Cloudera Manager lately. After installing the hdfs service successfully using Cloudera Manager, I hit the following error when I had tried to copy a file to the hdfs.

Here was the message.

-bash-3.2$ hadoop dfs -copyFromLocal file01 file01
12/04/23 23:24:37 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/cloudera/file01 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1520)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:665)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)


at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3553)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3421)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2627)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2822)


It took me a while to figure out the problem. If you use the command line to report the file system.

-bash-3.2$ hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 28672 (28 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 28672 (28 KB)
DFS Used%: 100%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0


-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)


Name: 10.132.169.81:50010
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 28672 (28 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Mon Apr 23 23:50:10 EDT 2012

From above, you see that the configured capacity is 0 KB. However, Cloudera manager reported the file system is healthy.

It took me a while to figure it out. Even though I have 3 data nodes and each have 2GB available. Unfortunately, the default configuration requires each data node having at least 10GB free space.

Since my Hadoop setup is running in a VM environment, I've added extra 20GB virtual disk on each VM and problem solved.

Monday, April 2, 2012

A simple R script to find the Pi

Recently I am working on a data mining project and investigating on different visualization  and computation tools to help to analyze the data model. Both Octave and R with R Studio interest me.

I am eager to try out both tools and come across an interesting article about using Monte Carlo simulation to find out the pi. I know this is not exactly data mining topic. However, it is still interesting.

Here is the article by Niall O'Higgins.

http://niallohiggins.com/2007/07/05/monte-carlo-simulation-in-python-1/

The author uses python to demonstrate the concept. After learning R will couple hours, I rewrote code in R.

n <- 1000000
x <- runif(n, 0.0, 1.0)
y <- runif(n, 0.0, 1.0)
score <- ifelse((sqrt(x^2 + y^2) <=1.0), 1, 0)
hit <- sum(score)
myPi <- 4 * hit / n

I am surprisingly the result code can be pretty compact. R seems to have a better data import/export capability than Octave. I am very impressed with both Octave and R can do.