Search This Blog

Thursday, May 15, 2014

115 Projects and Mountains of Data

Have you heard of Hadoop?  Sure you have.  You're reading this, aren't you?

My colleague, who is going for every Hadoop certification available, has kindly provided some links to add to my 150MB OneNote notebook on the Hadoop and Apache ecosystems.  That's about 1.8 blocks of HDFS data (replicated 3 times) if you haven't adjusted the default size and are using MR2.  I'll try to share some of them on this site.


The list of projects out there doesn't quite qualify as big data but is still getting pretty unmanageable for me.  Apache alone has 115 projects listed, though some are shelved and haven't been updated in awhile, and only about 11 are categorized as "Big Data."


I'm currently pursuing one certification for now, and focusing a bit more on some of the amazing tools out there that work with the core infrastructure.  I will try to share some of my findings on this blog for anyone who might find it helpful.


If you're going to get certified in the core of Hadoop, you'll want to understand Java programming and MapReduce theory. This could change in the future, as MapReduce slowly gets relegated to the mines of Mordor, with YARN treating it as a tenant in a larger domain of heterogeneous applications.  The possibility of running different MR versions, or even doing away with MR and going with one of the other 7 Dwarves (or perhaps 13) as a core piece of the architecture is a serious concern.


Speaking of Mordor, an Oliphaunt is a large war elephant from Lord of the Rings.  



The New York Times has an article from 1984 called "The Mystery of Hannibal's Elephants."   Hannibal had a 38-node cluster of War Elephants, and crossed the alps with those elephants and 100,000 men (give or take 60,000 or so, Wikipedia has a different number).

There are currently 129 people considered as Apache committers who contribute to > 10 projects each.  That's about 3% of the 3500 or so committers listed on the Apache site.  The top two committers, Jim Jagielski and Dr. Chris Mattmann have contributed to at least 35 different projects each.  The Apache ecosystem is an amazing community with some very dedicated and passionate individuals.  However, there is an even larger "dark pool" of talent branching and forking open-source code for their own needs within the silos of companies like Twitter, Intel, eBay, Linkedin, IBM, Facebook, Google, Yahoo, and yes even Microsoft.  


The cute elephant in the room of 2006 is turning into a herd of war elephants that will crush relational database systems as we know them.

Or so they say...



I will either find a way, or make one.
-- Hannibal



No comments:

Post a Comment