Cloudera: Paternal eyes on the Hadoop stack
Today, Hadoop is already enabling management and querying of big data in industries like healthcare, banks, telcos and more.
One of the first companies to take it out of the Silicon Valley to become ‘mainstream’ amongst businesses and enterprises, Cloudera, also has as its Chief Architect, the founder (or Father) of Hadoop Doug Cutting who saw the benefit of a data management framework that was open source rather than proprietary.
“That there is less support for open source solutions is not really true. There is support available and in a range of things, in fact for free from the community. In fact, people need less support when they have all the source code… software becomes more transparent and people can see and diagnose problems themselves,” Cutting explained.
Companies like Cloudera also offer support which companies can buy. “They end up happier in long term. License fees (which proprietary software incurs) are arbitrary, whereas (with open source) you only pay for support that you receive.”
In any case, Cutting’s main idea to create data management at much lower cost and much greater flexibility with open source, has borne fruit and Hadoop has exploded in popularity and usage.
What next does its founder see coming for the open source data management framework?
Eyes on the prize
Cutting said, “Our strategy is to try encourage the community. We do not necessarily know what the next technology will be. We want to help encourage things that look promising.”
For example, he shared that Cloudera hadn’t invented Kafka or Spark, two open source Apache projects that each handle real-time data feeds and process big data very quickly.
“Spark came from research out of University of Berkeley, while Kafka came from LinkedIn. But, we saw that these two fulfill requirements that the industry has, so we adopted them and made them better integrated as well as supported and brought them to the industry at large.”
To date, Cutting claims that having been around for longer compared to competitors have given them an advantage. “We have more experience training and supporting businesses. By far we are the largest vendor in this space and we have done a better job of curating the stack.”
That said, Cutting emphasises that Cloudera is not tied to any particular technology. “Instead, we are associated with the whole stack. And the stack is changing.”
Challenges of the Hadoop stack
Due to Cloudera’s heft in the industry, projects that they pick to integrate with and support, inevitably could end up being standards for the rest of the industry to follow.
Cutting admitted, “Picking projects (to invest in) is a challenge. We don’t want to be too late choosing technology, or be too early, because once we start supporting something, we have to do it for a long time.
“Every year, tens of projects are starting in this space. It is hard to know which is best till people try them. The market has to come realise what works,” Cutting opined, also sharing the example of Cloudera having invested early in Spark while rival Hortonworks was betting on something different.
As a result Cloudera has gained more expertise with Spark and would be better able to offer quality support for Spark which is earning recognition and wider acceptance in the market.
In conclusion, Cutting opined, “Our future roadmap is watching what the new technologies that are most useful, are. We don’t know what these will be, but we are picking and developing so that they are appropriate for enterprises in terms of integration with the rest of the Hadoop stack.”