Friday 3 October 2014

The Basics Of Hadoop



Basically Hadoop is framework of tools. It is not a software that you will download on your computer and you will say that you have downloaded Hadoop. We use Hadoop to support running applications on big Data. And the best thing is that Hadoop is open source which is distributed under Apache Licence and you don't need to pay for it.So being open source guarantees that no particular company is controlling the direction of Hadoop and it is maintained by Apache. So we understand that Hadoop is a set of tools that supports running of Applications on Big Data. So the keyword behind Hadoop is Big Data. Big Data is creating Challenges that Hadoop is addressing. And all these challenges are created at three levels.
  1. Lot of Data is Coming at Very High Speed (Velocity)
  2. Big Volume of Data is being gathered and going on gathering (Volume)
  3. And Data is of all sorts of variety means its not an organized Data as it will contains Audios, Videos, Log Files etc. (Variety)
If we talk about Enterprise Approach of managing Big Data, An enterprise will get a very powerful Computer to Process the Big Data and this computer will do a good job but until a certain point. A point will come when this powerful computer will not be able to do the processing any more because this is not scale able and the Big Data is growing. So traditional Enterprise Approach does have its limitations when it comes to Big Data. Hadoop takes a very different approach than the enterprise approach. It breaks the data into smaller pieces and that's why its able to deal with the Big Data. Breaking the data into small pieces is a good idea but then depends on you that how you are going to perform the computation. It breaks the computation as well into smaller pieces. It sends each piece of computation to each piece of data. So the data is broken down into equal pieces. So the computations chunks can be performed in equal amounts of time and once all these computations are finished then their results are combined together and the combined result will be sent back to the application.
At a very High level Hadoop has a simple architecture. You can say Hadoop has two main components
  1. MapReduce
  2. File System (HDFS)
As we have discussed that Hadoop is a set of tools and those set of tools can be known as Projects. There are numerious projects that have been started and are managed by Apache under the umbrella of hadoop. The objective of these projects is to provide assistance in the tasks that are related to Hadoop. Beside above defined components there is also other component which is known as projects. On of the important characteristics of Hadoop is that it works on a ditributed model means we are not talking about super computer but about numerous low cost computers which are known as commodity hardware. Hadoop is a linux based set of tools so we will have linux on all Low cost numerous computers. All these computers will have two components.
  1. Task tracker (To process smaller piece of tasks)
  2. Data Node (To manage the piece of data that is given to the particular node)
There i want to list few companies which are using Hadoop so that you can get idea of what type of applications can use Hadoop. The Companies are Yahoo, Facebook, Amazon, Ebay, American Airlines, The New York Times, Federal Reserve Board, Chevron, IBM etc.

0 comments:

Post a Comment