BigData / Hadoop Blogs

A Definitive Guide on How to Use YARN In Hadoop

Hadoop, one of the most well-known and widely used open source distributed framework used for large scale data processing. It is based on five main building blocks which are MapReduce Framework, YARN infrastructure, Storage, HDFS Federation and Cluster. This is a definitive guide on how to use YARN in Hadoop. YARN stands for yet another […]

What is HBase?

HBase is a No SQL database also known as the Hadoop Database, is an open-source database management system. It is a distributed, non-relational (columnar) database that uses Hadoop distributed file system (HDFS) as its persistence store for big data projects. It is a top-level Apache project that started out as a project by Powerset out […]

What is Hadoop?

The word “cloud” has become very active to the latest emerging technologies that were delivered in the business world. And the most common familiar technology that is used for Big Data is Hadoop. Hadoop is a free open-source Java-based programming application tool that supports the handling of a large data set through the use of […]

What is MongoDB?

In today’s digital generation, among the most sought after software database systems was the MongoDB. It is very useful for the growth of your business and other file storage needs. Well, what is MongoDB? You can find the answer to this question after you have read this article up to the end. If you are […]

How to Prepare for a Hortonworks Hadoop Certification Exam?

By performing tasks on an actual Hadoop cluster instead of just guessing at multiple-choice questions (MCQs), Hortonworks Certified Professionals have proven competency and Big Data expertise. The HDP Certified Developer exam is available from any system, anywhere, at any time. This exam costs around USD 200 per attempt. Hortonworks Data Platform 2.4 is chosen as […]

Apache Spark: The Hot Kid on the Block

Spark was developed by University of California in Berkley around 2009, and it becomes an open source in 2010, As most of the Hadoop services are open source, which is cost effective, and constantly keeps on growing with features according to user’s requirement. Spark speeds up to 100 times faster than Hadoop MapReduce in memory […]

Hadoop Ecosystem

Talking about the major ecosystem of Hadoop the first name which comes to mind is MapReduce as this is the base on which complete Hadoop framework relies. Also the processing on data can be done using MapReduce algorithm which contributes a big name for processing of data in Hadoop. For writing this MapReduce algorithm, two […]

What is Oozie Workflow?

Apache Oozie is an open source project which is based on Java Web application technology that simplifies the process of creating workflows and manages coordination among jobs. In principle, Oozie has the ability to combine multiple jobs sequentially into a single logical unit of work. One of the advantages of the Oozie framework is that […]

What is HIVE?

The Apache Hive is a data warehousing package built on top of Hadoop. It provides an SQL dialect, called Hive Query Language (HQL) for querying or summarizing data stored in a Hadoop cluster. Hive doesn’t support for row level inserts, updates, and deletes. It also doesn’t support transactions. Hive allows extensions to provide better performance […]

Hadoop Developer Responsibilities

A person comfortable in managing a team of developers and explain design concepts to customers called as Hadoop Developer. A Hadoop Developer is responsible for programming and coding of Hadoop applications. He must have knowledge of SQL, Core Java, and other languages. The role of Hadoop Developer is same as the Software Developer. Hadoop developer […]

Big Data Certifications by Hortonworks

Hortonworks is a computer software company and is a sponsor of a well-known Apache Software Foundation. The company focuses on the development of Apache Hadoop, a framework that allows processing of large data sets along the clusters of computers using simple programming model. Hortonworks product named Hortonworks Data Platform (HDP) includes Apache Hadoop and is […]

Big Data Certifications by Cloudera

Cloudera is a widely recognized company, which majorly concentrates in mega data collections built on the Apache Hadoop platform. The company’s basic function is to create an information-driven organization; this type of organization requires access to all of its data in all formats, no matter about what the data kept online for many years, or […]

Apache Hadoop Certifications

Why go for Hadoop Certification? Companies are struggling to hire Hadoop talent. Those industries or companies want assurance from the Hadoop candidates they hire for handling their petabytes of data. For this assurance, the certification is a proof of this capability and making you more responsible and a reliable person for their data. Benefits of […]

Big Data Certifications

What is Benefit of Getting Big Data Certified? Big Data Certification provides a foundation for starting a career in Big Data Hadoop architect career path. Pay package is definitely more for Big Data Certified candidates when compared to the other professionals. Job recruiters are looking for candidates with Big Data Hadoop certification which are an […]

Five Challenges of Big Data

Using data to make business value is now a reality in many IT and NON-IT industries. With the introduction of the “Internet of things,” enhanced analytics and developed connectivity through new technology and application bring significant prospects for industries. For an example, at Siemens, Big Data is changing the way maintenance services are provided, from […]

What’s Next for Apache Hadoop Data Management and Governance

Hadoop – the data processing engine based on MapReduce – is being superceded by new processing engines: Apache Tez, Apache Storm, Apache Spark and others. YARN makes any data processing future possible. But Hadoop the platform – thanks to YARN as its architectural center – is the future for data management, with a selection of […]

The Importance of Apache Drill to the Big Data Ecosystem

You might be wondering what bearing a history lesson may have on a technology project such as Apache Drill. In order to truly appreciate Apache Drill, it is important to understand the history of the projects in this space, as well as the design principles and the goals of its implementation. The lessons that have been […]

How SQOOP-1272 Can Help You Move Big Data from Mainframe to Apache Hadoop

Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe datasets to Hadoop. Following are possible reasons for this: HDFS is used simply as an […]

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster, easier to work with, and more secure. Cloudera, the […]

Introduction to HDFS Erasure Coding in Apache Hadoop

Hadoop is a popular open-source implementation of MapReduce framework designed to analyze large data sets. It has two parts; Hadoop Distributed File System (HDFS) and MapReduce. HDFS is the file system used by Hadoop to store its data. It has become popular due to its reliability, scalability, and low-cost storage capability. HDFS by default replicates […]

Drill into Your Big Data Today with Apache Drill

Big data techniques are becoming mainstream in an increasing number of businesses, but how do people get self-service, interactive access to their big data? And how do they do this without having to train their SQL-literate employees to be advanced developers? One solution is to take advantage of the rapidly maturing open source, open community […]

How-to: Deploy Apache Hadoop Clusters Like a Boss

The HDFS docs have some information, and logically it makes sense to separate the network of the Hadoop nodes from a “management” network. However, in our experience, multi-homed networks can be tricky to configure and support. The pain stems from Hadoop integrating with a large ecosystem of components that all have their own network and […]

Hadoop advantages and disadvantages

Advantages of Hadoop: 1. Scalable Hadoop is a highly scalable storage platform, because it can stores and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. Unlike traditional relational database systems (RDBMS) that can’t scale to process large amounts of data, Hadoop enables businesses to run applications on thousands of nodes […]

How to install Hadoop?

Prerequisites Supported Platforms GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes. Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform. Required Software Required software for Linux and Windows include: JavaTM 1.6.x, […]

Hadoop Admin responsibilities

Hadoop Admin Responsibilities: Responsible for implementation and ongoing administration of Hadoop infrastructure. Aligning with the system engineering team to propose and deploy new hardwares and software environments required for Hadoop and to expand existing environments. Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos […]

Comparison of Hadoop with SQL and Oracle database

Basically the difference is that Hadoop is not a database at all. Hadoop is basically a distributed file system (HDFS) – Hadoop lets you store a large amount of file data on a cloud machines, handling data redundancy etc. Comparing SQL databases and Hadoop: Hadoop is a framework for processing data, what makes it better […]

Five Must Read Books on Hadoop

    Looking for hadoop books? We have shortlisted best hadoop books. 1.Hadoop: The Definitive Guide (By: Tom White ) This is the best book for hadoop beginners. This is a best source to adapt you to the world of big data management. 2.Hadoop in Practice (By: Alex Holmes ) This book discuss about the advanced […]

What are the pre-requisites for big data hadoop?

  Working directly with Java APIs can be tedious and error prone. It also restricts usage of Hadoop to Java programmers. Hadoop offers two solutions for making Hadoop programming easier.Pig is a programming language that simplifies the common tasks of working with Hadoop: loading data, expressing transformations on the data, and storing the final results. […]

Who can become a hadoop professional?

System administrators can learn some Java skills as well as cloud services management skills to start working with Hadoop installation and operations. DBAs and ETL data architects can learn Apache Pig and related technologies to develop, operate, and optimize the massive data flows going into the Hadoop system. BI analysts and data analysts can learn SQL and Hive […]

Introduction to Big data Hadoop

“Big Data” is a concept that is crucial to driving growth for businesses and was also a challenge for the programmers to analyse. The solution to this challenge is achieved by a framework called “Hadoop“.  Hadoop Framework has overcome the Big Data challenges with the help of a file system concept called – “Hadoop Distributed […]