Thursday, March 15, 2012

Highlights of SVForum's Big Data Analytics Conference

I attended the SVForum's Big Data Analytics Conference today. The conference took place in the Microsoft conference center in Mountain View. There were 3 keynote sessions and 3-4 panel discussions. Here are some of the highlights:

Microsoft - Data Doesn't Matter Until...
Microsoft speaker Bruno Aziza kickstarted the keynote session. Surprisingly, Microsoft supports the open-source big data community (Hadoop and others) and even contributes an "Excel module for Hive". (I find this combination a bit mind-boggling.) His talk focused on 4 trends of big data:

  1. Data as Service - Raw data is only the beginning, providing value added data service is the key.
  2. Get(Data) - get data as much as you can since cost of storage is down and you can always analyze it later
  3. Scarcity vs Accessibility - the demand of data analysts in the future is high, need new tool to make the data more accessible to leverage global talent pool.
  4. Marketing matters - use result of data analytics to promote action, case study: how Lego analyze children's feedback to create new products and new markets.
IBM - Smarter Decision Making, Leveraging Big Data to Gain New Actionable Insights
I was surprised that IBM had already formed a big data division. The speaker from IBM is the VP of that division. Her talk started with "where are the data of the big data from" (social media, RFID sensors, phones, cameras, GPS devices, smart meters, etc), then followed by a few case studies IBM had done in different segments (healthcare, green technology, traffic control). A few key take aways are
  • noise (as in signal/noise ratio in the data stored in HDFS can be high
  • time to process data is important, Hadoop is good at batch processing the unstructured data, but not that good at doing real-time, interactive data analytics
  • a mix of Hadoop and traditional data warehouse approach may bring the best of breeds
  • big data analytics can provide values in 3 areas: strategic advantage, operational intelligence, customer insight.

Greylock - Data Jujitsu, the art of turning data into product
Dr DJ Patil (ex LinkedIn and eBay) talked about how to form a good data science team. How LinkedIn promoted data analysis. What will make a good data project successful (having a diverse set of tools to use are very important), etc.

Panel Discussions
The panel discussions are a bit free formed, so it's hard to summarize. Here are some insights I find interesting:
  • Hadoop's batch processing can be thought as big data 1.0. A more interactive, real-time type of big data analytics will be big data 2.0.
  • A successful big data tool should be able to handle the data both in the cloud or on-premises (security is not the concern, logistics of moving large amount of data is). 
  • A good visualization tool is not enough. A good tool should provide actionable insights and recommendations (what we should do next).
  • Building a generic big data platform may not be a wise starting point for your big data project. Starts with a vertical (specific) big data problem, then move horizontally.
  • Structured data and unstructured data will coexist. So are SQL and NoSQL. 
  • Hadoop opens doors to new ETL (extract, transform, load) architecture. New ETL tools for real time analysis and reentrant analysis will be expected.
  • With the capabilities provided by Hadoop/HDFS, disk has become the new tape, and memory the new disk.
  • Big data analytics is still at the first half of the hype curve, but the future is very exciting.

P.S. I asked the organizer if the slides in the conference would be available. He said he would ask the speakers. However, the conference was videoed and the videos should be available soon. So check out their web site if you find the talks interesting.

No comments:

Post a Comment