Skip to main content

Hadoop (Big Data Analytic )

This tutorial is all about for our readers to make them aware about the Apache Hadoop Big data.Many of the programmers don't know what exactly hadoop is. Some say its a programing language like PHP and Java many such answers. So this tutorial is all about to clarify your doubts and make you awre of the new future Hadoop which is futured generation Big Data. Also for our readers you can download Free ebook for Hadoop Big Data which would surely guide you for hadoop.

Q. What is Apache Hadoop  ?

Ans. Apache Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.

Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.

The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X.

 Apache Hadoop framework :
  1. Hadoop Common – contains libraries and utilities needed by other Hadoop modules.
  2. Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
  3. Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
  4. Hadoop MapReduce – a programming model for large scale data processing.
Features Of Hadoop :
  1. Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
  2. Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
  3. Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.
  4. Fault tolerant – When you lose a node, the system redirects work to another location of the data and continues processing without missing a fright beat.
Hadoop%2BBig%2BData

Hadoop Big Data Users:

Yahoo!

Yahoo
On February 19, 2008, Yahoo! Inc. launched what it claimed was the world's largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on a more than 10,000 core Linux cluster and produces data that was used in every Yahoo! Web search query.
On June 10, 2009, Yahoo! made the source code of the version of Hadoop it runs in production available to the public.Yahoo! contributes all the work it does on Hadoop to the open-source community. The company's developers also fix bugs, provide stability improvements internally and release this patched source code so that other users may benefit from their effort.

Facebook

In 2010, Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.On June 13, 2012 they announced the data had grown to 100 PB. On November 8, 2012 they announced the data gathered in the warehouse grows by roughly half a PB per day.

Amazon
amazon

It is possible to run Hadoop on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). As an example The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4 TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth).
 From a pure performance perspective, Hadoop on S3/EC2 is inefficient, as the S3 file system is remote and delays returning from every write operation until the data is guaranteed not to be lost. This removes the locality advantages of Hadoop, which schedules work near data to save on network load.

Comments

Popular posts from this blog

Install Conky Manager in Ubuntu 14.04 and 14.10

Install Conky Manager in Ubuntu: Conky Manager is available in the developer’s PPA for Ubuntu 14.10, Ubuntu 14.04, Ubuntu 13.10 and Ubuntu 12.04. Press  Ctrl+Alt+T  to open terminal. When it opens, run the following commands one by one: sudo add-apt-repository ppa:teejee2008/ppa sudo apt-get update sudo apt-get install conky-manager You can also see from here http://www.webupd8.org/2014/06/conky-manager-gets-revamped-ui-new.html

Play Song From Terminal

1) sudo apt-get install sox For formating to mp3 and other extension we need decoder of Sox 2 ) sudo apt - get install libsox - fmt - mp3 Now go to the directory of your Music and give command as play *.mp3 it will play music and if you want to go to another song then press `ctrl + c`  To terminate press `ctrl+c+c` Thanks For seeing and i hope you like this  ENJOY ------------------------------------- Please if you like this Post so do not Forget to Comment and like 

Wallch 4.0 Added Clock Wallpaper in Ubuntu 14.04

Wallch  is free wallpaper utility for Linux, it offers user friendly graphical user interface. Developer introduced new feature live clock wallpaper in latest Wallch 4.0 version, which is also offered in  slidewall  wallpaper application. Wallch is open-source application, which allows anyone to download and modify code as per needs. It supports all major Linux desktops such as Unity, Gnome, LXDE, XFCE, and Mate. It doesn't simply change your desktop background with the wallpapers that you have in your hard disk, though. While it does that well by monitoring the folder that you have selected for new or deleted pictures, it has lots of features, like Picture of the day, Live Earth, Wallpaper Clocks and Live Website! Get Wallpaper from    VladStudio.com . >>> Wallch 4.0 only available for Ubuntu 14.04 Trusty Tahr/Linux Mint 17 To install Wallch 4.0 in Ubuntu 14.04 Trusty/Linux Mint 17 open Terminal (Press  Ctrl+Alt+T ) and copy the following commands in the Termi