Why Hadoop is chosen ?
- Hadoop is linearly scalable. Hadood has an architecture that allows it to operate in cluster, ie, to be distributed operates in a cluster of nodes (servers) in the cluster, one of them "Master" and the rest slaves, simplicity and ease with which a node can be added to the cluster Hadoop makes it extremely flexible and scalable result in any change in the variation of data to be processed.
- High Availability. The files are replicated as many times as necessary by a configuration variable, thus we have a system with high reliability.
- Fault Tolerance, Any drop a node or set of nodes in the cluster does not impede the proper functioning of the system.
MapReduce and HDFS.
The Hadoop ecosystem
New concepts, new tools.
Hadoop has done nothing but walk the footsteps of a long way in its latest version the package distributed by Cloudera CDH4 and introduces version 2.0 of Apache Hadoop. In this new version new concepts such as "NameNode High Availability" which are added 8 solves the fragility of "NameNode" from previous versions. MapReduce and HDFS become more robust and stable and provide solutions to more complex systems.