LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 05-03-2015, 11:45 PM   #1
cyberdome
Member
 
Registered: Mar 2014
Distribution: Fedora 23 - MariaDB 10.1 -
Posts: 130
Blog Entries: 2

Rep: Reputation: 8
What is this Hadoop? Why is it getting so popular? What does Hadoop do?


Hello World,

Can someone please shed some light on this subject. What is this Hadoop and why is it getting so popular?

What is this Big Data 2.0? What is it that Hadoop does?


http://www.sas.com/en_us/insights/big-data/hadoop.html


http://www-01.ibm.com/software/data/infosphere/hadoop/

http://en.wikipedia.org/wiki/Apache_Hadoop

http://www.informationweek.com/big-d.../d/d-id/899721


http://www.cloudera.com/content/clou...-big-data.html

http://hadoop.apache.org/ (added Hadoop main site)

https://developer.yahoo.com/hadoop/t...l/module3.html

https://www.digitalocean.com/communi...n-ubuntu-13-10

https://developer.yahoo.com/hadoop/tutorial/

http://wiki.apache.org/hadoop/HadoopIsNot


Thanks in advance

Last edited by cyberdome; 05-07-2015 at 11:10 PM.
 
Old 05-04-2015, 01:20 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by cyberdome View Post
What is this Hadoop and why is it getting so popular?
Hadoop is Elephant.
Elephants are symbol of Wisdom.
Everybody want Wisdom.


Quote:
Originally Posted by cyberdome View Post
What is this Big Data 2.0?
It is a buzzword.
For Consultant use in Buzzword Bingo.
Clients eyes glaze over you Win!!!


Quote:
Originally Posted by cyberdome View Post
What is that Hadoop does?
Distributed file system, distributed processing of chunks of data, framework to glue functionality together.
*BTW the vendors page isn't what you listed, it's at http://hadoop.apache.org/ so best start there or with a "HOWTO" tutorial. (Or, yes, just try Cloudera.)
 
Old 05-07-2015, 07:07 AM   #3
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,610
Blog Entries: 4

Rep: Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905
As the well-written vendor page implies, Hadoop is one of several tools for implementing "massively parallel operations" across clusters of, perhaps, thousands of machines ... without having a "single point of failure." In fact, these technologies are built to assume that machines will fail, that power-cords will from time to time be accidentally kicked by someone who's working behind the frame in which all these machines (they're really "blades" ... circuit-boards ...) are mounted. Any machine might fail at any time, you don't know which one(s), and the processing needs to reliably continue.

The processing that is being done is of such a nature that each CPU in the cluster has work to do, and the ability to do it more-or-less autonomously, such that the various CPUs are constantly sharing data among themselves. Several CPUs might be working on the same thing, or parts of the same thing. Yes, we are "throwing silicon at it," because, "chips are cheap (now)."

"Big Data" refers to what these machines are actually doing: trying to transform:
Quote:
Originally Posted by Be Afraid, be Very Afraid:
a minute-by-minute log of where every individual in New York City (and every other city in the world) was standing, plus-or-minus seven feet, dutifully collected by the "apps" on their cell-phone and transmitted ... with neither their consent nor their knowledge ... to a "Big Data" data-center ... somewhere ... where this most-intimate data is available to "someone." A person who might be wearing a clean "white" hat, or ... most-decidedly "not."
I put that in quotes to emphasize it, of course, but also as a lead-in to the observation that "this sort of thing, IMHO, is not likely to continue for too much longer." Big Data is big business right now, but be aware that a lot of what's being done today might soon become illegal. All of this stuff is so new (literally, "to human history") that the jurisprudence doesn't exist yet. But, it is coming.

Another equally-interesting (to me) observation that is beginning to come to light is: "does 'all of this computing' really give us business advantage in selling pretzels?" It's an axiom of psychology that the presence of the experimenter affects the experiment. (And that "the mouse will do as he damn well pleases ...") People are dimly becoming aware that their every move is being sliced-and-diced, that they start getting junk-mail from funeral homes when they mention in a text-message to someone else that a friend has just died, and so forth. They're throwing-away, unread, more than 95 of the e-mails that they receive, and two-thirds of the "targeted" letters that come in the mail. The more ubiquitous "big data" is, the more noticeable it is, and, the less effective it is with regards to actual human populations.

Last edited by sundialsvcs; 05-07-2015 at 07:08 AM.
 
Old 05-07-2015, 08:01 AM   #4
maples
Member
 
Registered: Oct 2013
Location: IN, USA
Distribution: Arch, Debian Jessie
Posts: 814

Rep: Reputation: 265Reputation: 265Reputation: 265
Quote:
Originally Posted by sundialsvcs View Post
As the well-written vendor page implies, Hadoop is one of several tools for implementing "massively parallel operations" across clusters of, perhaps, thousands of machines ... without having a "single point of failure." In fact, these technologies are built to assume that machines will fail, that power-cords will from time to time be accidentally kicked by someone who's working behind the frame in which all these machines (they're really "blades" ... circuit-boards ...) are mounted. Any machine might fail at any time, you don't know which one(s), and the processing needs to reliably continue.

The processing that is being done is of such a nature that each CPU in the cluster has work to do, and the ability to do it more-or-less autonomously, such that the various CPUs are constantly sharing data among themselves. Several CPUs might be working on the same thing, or parts of the same thing. Yes, we are "throwing silicon at it," because, "chips are cheap (now)."

"Big Data" refers to what these machines are actually doing: trying to transform:

Quote:
Originally Posted by Be Afraid, be Very Afraid
a minute-by-minute log of where every individual in New York City (and every other city in the world) was standing, plus-or-minus seven feet, dutifully collected by the "apps" on their cell-phone and transmitted ... with neither their consent nor their knowledge ... to a "Big Data" data-center ... somewhere ... where this most-intimate data is available to "someone." A person who might be wearing a clean "white" hat, or ... most-decidedly "not."
I put that in quotes to emphasize it, of course, but also as a lead-in to the observation that "this sort of thing, IMHO, is not likely to continue for too much longer." Big Data is big business right now, but be aware that a lot of what's being done today might soon become illegal. All of this stuff is so new (literally, "to human history") that the jurisprudence doesn't exist yet. But, it is coming.

Another equally-interesting (to me) observation that is beginning to come to light is: "does 'all of this computing' really give us business advantage in selling pretzels?" It's an axiom of psychology that the presence of the experimenter affects the experiment. (And that "the mouse will do as he damn well pleases ...") People are dimly becoming aware that their every move is being sliced-and-diced, that they start getting junk-mail from funeral homes when they mention in a text-message to someone else that a friend has just died, and so forth. They're throwing-away, unread, more than 95 of the e-mails that they receive, and two-thirds of the "targeted" letters that come in the mail. The more ubiquitous "big data" is, the more noticeable it is, and, the less effective it is with regards to actual human populations.
From your postings, it's easy to tell that you're fairly passionate on this topic.
However, making thousands of computers all work on crunching numbers together itself isn't a bad thing. For example, there's a distributed computing network (actually a few) that work on the simulation of folding protiens to give scientists ideas to try to cure cancer. Other uses for such a system include a render farm (if you're a company like Disney), Bitcoin mining (though the power costs would probably not be worth it), or just seeing how many FPS you can get on a game (though network latency wil become a problem).

So having a large amount of computing power at your fingertips itself isn't a bad thing. And it's not like that the computing cluster itself is collecting GPS data about you.
 
Old 05-07-2015, 09:27 PM   #5
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,610
Blog Entries: 4

Rep: Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905
Of course. Of course. Of course.

(Hey, I'm not that paranoid! Really. No, really.)

I am concerned ... profoundly concerned ... about the activities that I see "data-mining" being most-commonly used for.

Quote:
"We have thrown our societies headlong from the cliff of what is now possible, straight into the pit of the unforseen."

Last edited by sundialsvcs; 05-08-2015 at 07:16 AM.
 
Old 05-07-2015, 11:02 PM   #6
cyberdome
Member
 
Registered: Mar 2014
Distribution: Fedora 23 - MariaDB 10.1 -
Posts: 130

Original Poster
Blog Entries: 2

Rep: Reputation: 8
Quote:
"Elephants are symbol of Wisdom."
Very well said. So, If I setup Ubuntu LAMP Server with Hadoop. For large data, I don't need a SQL database, Oracle Database, or any other type of database. I can save a large 5 TeraByte file inside a hadoop directory?
Correct me if I am wrong? So, Hadoop is basically saving extremely large files in a directory? Then, how does a normal user retrieve information from Hadoop?
 
Old 05-08-2015, 07:17 AM   #7
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,610
Blog Entries: 4

Rep: Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905
No, Hadoop is not "merely an enormous file-system." It is a massively-parallel, fault-tolerant, workload management system. It is designed to process large amounts of data and to do so using large clusters of computing engines. You wouldn't set up one server with Hadoop. You'd set up hundreds, or thousands.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Error in Hadoop Installation bbdynamite Other *NIX 13 06-27-2014 09:48 PM
LXer: Introduction to Hadoop: Real-World Hadoop Clusters and Applications LXer Syndicated Linux News 0 04-30-2013 11:21 AM
hadoop on ubuntu 10.04 tahani Linux - Newbie 4 01-03-2013 05:02 AM
LXer: Who Loves Hadoop? LXer Syndicated Linux News 0 08-07-2012 03:00 PM
LXer: Cloudera Distribution of Hadoop Available, Makes Hadoop Easy LXer Syndicated Linux News 0 03-16-2009 08:10 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 10:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration