What is this Hadoop? Why is it getting so popular? What does Hadoop do?
Hello World,
Can someone please shed some light on this subject. What is this Hadoop and why is it getting so popular? What is this Big Data 2.0? What is it that Hadoop does? http://www.sas.com/en_us/insights/big-data/hadoop.html http://www-01.ibm.com/software/data/infosphere/hadoop/ http://en.wikipedia.org/wiki/Apache_Hadoop http://www.informationweek.com/big-d.../d/d-id/899721 http://www.cloudera.com/content/clou...-big-data.html http://hadoop.apache.org/ (added Hadoop main site) https://developer.yahoo.com/hadoop/t...l/module3.html https://www.digitalocean.com/communi...n-ubuntu-13-10 https://developer.yahoo.com/hadoop/tutorial/ http://wiki.apache.org/hadoop/HadoopIsNot Thanks in advance :D |
Quote:
Elephants are symbol of Wisdom. Everybody want Wisdom. Quote:
For Consultant use in Buzzword Bingo. Clients eyes glaze over you Win!!! Quote:
*BTW the vendors page isn't what you listed, it's at http://hadoop.apache.org/ so best start there or with a "HOWTO" tutorial. (Or, yes, just try Cloudera.) |
As the well-written vendor page implies, Hadoop is one of several tools for implementing "massively parallel operations" across clusters of, perhaps, thousands of machines ... without having a "single point of failure." In fact, these technologies are built to assume that machines will fail, that power-cords will from time to time be accidentally kicked by someone who's working behind the frame in which all these machines (they're really "blades" ... circuit-boards ...) are mounted. Any machine might fail at any time, you don't know which one(s), and the processing needs to reliably continue.
The processing that is being done is of such a nature that each CPU in the cluster has work to do, and the ability to do it more-or-less autonomously, such that the various CPUs are constantly sharing data among themselves. Several CPUs might be working on the same thing, or parts of the same thing. Yes, we are "throwing silicon at it," because, "chips are cheap (now)." "Big Data" refers to what these machines are actually doing: trying to transform: Quote:
Another equally-interesting (to me) observation that is beginning to come to light is: "does 'all of this computing' really give us business advantage in selling pretzels?" It's an axiom of psychology that the presence of the experimenter affects the experiment. (And that "the mouse will do as he damn well pleases ...") People are dimly becoming aware that their every move is being sliced-and-diced, that they start getting junk-mail from funeral homes when they mention in a text-message to someone else that a friend has just died, and so forth. They're throwing-away, unread, more than 95 of the e-mails that they receive, and two-thirds of the "targeted" letters that come in the mail. The more ubiquitous "big data" is, the more noticeable it is, and, the less effective it is with regards to actual human populations. |
Quote:
However, making thousands of computers all work on crunching numbers together itself isn't a bad thing. For example, there's a distributed computing network (actually a few) that work on the simulation of folding protiens to give scientists ideas to try to cure cancer. Other uses for such a system include a render farm (if you're a company like Disney), Bitcoin mining (though the power costs would probably not be worth it), or just seeing how many FPS you can get on a game (though network latency wil become a problem). So having a large amount of computing power at your fingertips itself isn't a bad thing. And it's not like that the computing cluster itself is collecting GPS data about you. |
Of course. Of course. Of course.
(Hey, I'm not that paranoid! Really. No, really.) ;) I am concerned ... profoundly concerned ... about the activities that I see "data-mining" being most-commonly used for. Quote:
|
Quote:
Correct me if I am wrong? So, Hadoop is basically saving extremely large files in a directory? Then, how does a normal user retrieve information from Hadoop? |
No, Hadoop is not "merely an enormous file-system." It is a massively-parallel, fault-tolerant, workload management system. It is designed to process large amounts of data and to do so using large clusters of computing engines. You wouldn't set up one server with Hadoop. You'd set up hundreds, or thousands.
|
All times are GMT -5. The time now is 06:15 PM. |