Tag Archives: online big data hadoop training

Data-Management

What’s Next for Apache Hadoop Data Management and Governance

Hadoop – the data processing engine based on MapReduce – is being superceded by new processing engines: Apache Tez, Apache Storm, Apache Spark and others. YARN makes any data processing future possible. But Hadoop the platform – thanks to YARN as its architectural center – is the future for data management, with a selection of […]

The Importance of Apache Drill to the Big Data Ecosystem

You might be wondering what bearing a history lesson may have on a technology project such as Apache Drill. In order to truly appreciate Apache Drill, it is important to understand the history of the projects in this space, as well as the design principles and the goals of its implementation. The lessons that have been […]

SQOOP

How SQOOP-1272 Can Help You Move Big Data from Mainframe to Apache Hadoop

Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this: HDFS is used simply as an […]

Kudu

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster, easier to work with, and more secure. Cloudera, the […]

Introduction to HDFS Erasure Coding in Apache Hadoop

Hadoop is a popular open-source implementation of MapReduce framework designed to analyze large data sets. It has two parts; Hadoop Distributed File System (HDFS) and MapReduce. HDFS is the file system used by Hadoop to store its data. It has become popular due to its reliability, scalability, and low-cost storage capability. HDFS by default replicates […]

Drill into Your Big Data Today with Apache Drill

Big data techniques are becoming mainstream in an increasing number of businesses, but how do people get self-service, interactive access to their big data? And how do they do this without having to train their SQL-literate employees to be advanced developers? One solution is to take advantage of the rapidly maturing open source, open community […]

Hadoop-Cluster

How-to: Deploy Apache Hadoop Clusters Like a Boss

The HDFS docs have some information, and logically it makes sense to separate the network of the Hadoop nodes from a “management” network. However, in our experience, multi-homed networks can be tricky to configure and support. The pain stems from Hadoop integrating with a large ecosystem of components that all have their own network and […]