Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

KuduThe set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster, easier to work with, and more secure.

Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, announced today the public beta release of Kudu, a revolutionary new columnar store for Hadoop that enables the powerful combination of fast analytics on fast data.

Using the existing storage options, developers had to make choices based on their capabilities. HDFS provides fast analytics – scanning over large amounts of data very quickly. However, HDFS was not built to handle updates. If data changed, it would need to be appended in bulk after a certain volume or time interval, preventing real-time visibility into this data. HBase, on the other hand, complements HDFS’ capabilities by providing fast and random reads and writes and supporting updating data. But this online access came at the cost of scan performance. While these two storage engines addressed many of the key needs for big data applications, there was still a gap, especially for developers wanting fast analytics on fast-changing data.

Introducing Kudu

To address these trends they investigated two separate approaches: incremental modifications to existing Hadoop tools, or building something entirely new. The design goals that they aimed to address were:

  • Strong performance for both scan and random access to help customers simplify complex hybrid architectures
  • High CPU efficiency in order to maximize the return on investment that our customers are making in modern processors
  • High IO efficiency in order to leverage modern persistent storage
  • The ability to update data in place, to avoid extraneous processing and data movement
  • The ability to support active-active replicated clusters that span multiple data centers in geographically distant locations

A beta download of Kudu is now available at cloudera.com/downloads along with a tutorial to help you get started. As an Apache open source project (with intent to donate to the ASF incubator), you can also start contributing to this project at getkudu.io.

Enter Your Comment

Your email address will not be published. Required fields are marked *