Search This Blog

Saturday, December 16, 2017

IBM and Hortonworks Consolidate Offerings at DataWorks Summit

At DataWorks Summit this year, a few announcements were made.  One in particular further consolidates the Hadoop distributions and makes Hortonworks Data Platform (HDP) an even more compelling offering.

https://hortonworks.com/press-releases/ibm-hortonworks-expand-partnership/

IBM and Hortonworks are both members of the ODPi, and now they are offering IBM Data Science Experience and IBM Big SQL as packaged offerings with HDP.

In addition, IBM is migrating BigInsights customers to HDP, consolidating IBM BigIntegrate, IBM BigQuality, and IBM Information Governance Catalog into Apache Atlas, and continuing to contribute to open source platforms including Apache Spark and SystemML.

IBM has at least 4 official Apache Spark committers with 2 official committers from Hortonworks.  When I looked at this list in April, 2014, neither company had committers.  The list of committers has almost doubled since then.  Mridul Muralidharam joined Hortonworks from Yahoo!, Nick Pentreath joined IBM from Mxit, Prashant Sharma joined IBM from Databricks.

IBM, Databricks, and Hortonworks are by far the top contributing companies to PySpark 2.0.  Two years ago IBM went all-in on Spark, calling it "Potentially the Most Significant Open Source Project of the Next Decade"

Another announcement was the inclusion of Hortonworks Registry for Kafka, Storm and NiFi.  Similar to https://github.com/confluentinc/schema-registry it distinguishes itself from the competition by providing pluggable storage of schemas in MySql or Postgres, a web-based UI, search capabilities.

The question that popped into my head right away is why didn't they just extend the Hive metastore to become the Schema Registry for all things streaming, and provide tumbling windows on Kafka and Storm from Hive?  This would have been an awesome addition to the Hive StorageHandlers.

There's always HiveKa if anyone wants to pick it up...

The latest HDF 3.0 was announced.  One component that brought some excitement was the generically-named Streaming Analytics Manager.  It's gui-based design is a bit similar to NiFi, with the addition of Dashboards, the aforementioned Schema Registry, and monitoring views.  This tool tries to democratize the creation and managment of streaming data sources.

Data in motion is the story of 2017 and beyond.


No comments:

Post a Comment