The Importance of Apache Drill to the Big Data Ecosystem


You might be wondering what bearing a history lesson may have on a technology project such as Apache Drill. In order to truly appreciate Apache Drill, it is important to understand the history of the projects in this space, as well as the design principles and the goals of its implementation. The lessons that have been learned from this space directly contribute to the reasons why Apache Drill is a serious big data tool with zero barriers to entry, which will enable organizations to leverage big data in ways that were not possible with other tools.

Inspired predominantly by Google’s Dremel, Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL that can query across data sources. It can handle flat fixed schemas and is purpose-built for semi-structured/nested data.

Why Drill is compelling for customers

  1. Drill provides SQL access on any type of data, with extreme flexibility and ease of use

With Drill, you can query data in files, a Hive data warehouse, HBase tables, or even non-Hadoop based storage systems in just a few minutes, and you can combine data from these sources on the fly. There’s no need to define and maintain any central metadata definitionsDrill also comes with ODBC/JDBC drivers, so it can be plugged into BI tools such as Tableau and MicroStrategy very easily for wide usage in the organization.

  1. Drill provides low latency performance at scale

Drill is a distributed and columnar SQL query engine built from the ground up for complex data. It doesn’t use MapReduce, Tez, or Spark. Drill can be deployed on a single node or can be horizontally scaled to 10s to 100s to 1000s of nodes, depending on the number of users that need to be supported, performance SLAs to be met, and the amount of data you that needs processing. Along with scale, Drill is built for performance. The in-memory columnar execution engine, designed for optimistic processing of short queries, is combined with advanced and pluggable optimizations including partition pruning, pushdown operators, and rule-based and cost-based query re-write capabilities. These capabilities make Drill a powerful interactive tool in the big data ecosystem.

  1. Drill provides a granular and de-centralized security model

Drill supports user impersonation, so the specific user identity can be used to access these views instead of system or process users accessing the data, which is not acceptable in several user environments. Drill also offers powerful ownership-chaining capabilities that control how many levels of nested views a given user can access, so organizations can strike a balance between self-service data exploration with controlled governance

 

There is little argument to be made that SQL is here to stay. With Apache Drill, organizations now have a solution that enables them to perform easy analysis of complex data structures and datasets using well-known SQL semantics. In essence, Drill has taken the approach of learning from history instead of repeating it. By understanding the limitations of other tools in this space, Drill is enabling businesses to leverage big data in new and powerful ways that have not previously been available from within this big data ecosystem.

Enter Your Comment

Your email address will not be published. Required fields are marked *