Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
02-16-2013, 09:31 AM
|
#1
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Rep:
|
Need help with Parallel Filesystem
Hello,
I am in verse to replace HDFS with Lustre for Hadoop. I have 1 MDS Machine, 1 OSS machine, 2 Lustre Client. I have installed Hadoop on two machines(Lustre Clients). I have pasted my details under http://paste.ubuntu.com/1661235/
All I need to know how shall I go ahead and run wordcount as an example program. For HDFS we specify master and slaves but for lustre, namenode and datanode is not at all trequired(Right?).
|
|
|
02-16-2013, 12:05 PM
|
#2
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406
|
Quote:
Originally Posted by linuxunix
Hello,
I am in verse to replace HDFS with Lustre for Hadoop. I have 1 MDS Machine, 1 OSS machine, 2 Lustre Client. I have installed Hadoop on two machines(Lustre Clients). I have pasted my details under http://paste.ubuntu.com/1661235/
All I need to know how shall I go ahead and run wordcount as an example program. For HDFS we specify master and slaves but for lustre, namenode and datanode is not at all trequired(Right?).
|
Have you read the documentation on lustre to see?
http://wiki.lustre.org/index.php/Use:Use
..or tried Google?
http://stackoverflow.com/questions/1...mple-in-hadoop
Best question would be: does namenode and datanode exist in your new configuration?
|
|
|
02-16-2013, 12:22 PM
|
#3
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Original Poster
Rep:
|
I read about lustre link which you suggested.
But I am still not sure what configuration files do I need to make changes for lustre.
Also, the link suggested that running word count just need a file to collect data and not need HDFS.
But how lustre gonna do analytic task.
|
|
|
02-16-2013, 06:01 PM
|
#4
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406
|
Quote:
Originally Posted by linuxunix
I read about lustre link which you suggested.
But I am still not sure what configuration files do I need to make changes for lustre.
Also, the link suggested that running word count just need a file to collect data and not need HDFS.
But how lustre gonna do analytic task.
|
Lustre isn't going to do ANY analytic tasks...you said you read the links, did you see on the Lustre front page where it tells you:
Quote:
Originally Posted by Lustre Website
...the Lustre™ file system
|
Lustre is a file system...to put it simply, it provides a 'disk' for the operating system/computer to use. Just like HDFS. Your question is like saying "I'm going from ext2 to ext4 on my system...how do I run a bash script?"
|
|
|
02-16-2013, 08:45 PM
|
#5
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Original Poster
Rep:
|
I am pretty aware that lustre is filesystem.but my question is in terms of configuration files.
Just like we have core-site.xml,hdfs-site.xml,malted-site.xml etc. what files changes do I need?
The changes which I did are enough?
Also,I don't need to store it into HDFS? Which means I don't need to take care of namenode datanode correct?
It would be great if you can share what steps do I need to follow?
|
|
|
02-17-2013, 06:36 AM
|
#6
|
Senior Member
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Rep:
|
What error are you facing?
Last edited by your_shadow03; 02-17-2013 at 06:49 AM.
|
|
|
02-17-2013, 06:46 AM
|
#7
|
Senior Member
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Rep:
|
What configuration files changes are you running?
Last edited by your_shadow03; 02-17-2013 at 06:50 AM.
|
|
|
02-17-2013, 06:58 AM
|
#8
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Original Poster
Rep:
|
Hi TB0ne,
I have the following entries in the files:
File: core-site.xml
Code:
<property>
<name>fs.default.name</name>
<value>file:///lustre</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>${fs.default.name}/hadoop_tmp/mapred/system</value>
<description>The shared directory where MapReduce stores control
files.
</description>
</property>
mapred-site.xml
Code:
[root@alpha hadoop]# cat conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
</configuration>
[root@alpha hadoop]#
File: hdfs-site.xml ==< I AM NOT SURE IF THIS IS REALLY NEEDED
Code:
[root@alpha hadoop]# cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
[root@alpha hadoop]#
All I did is I ran the below command and it went successful
Code:
[root at alpha hadoop]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /user/hadoop/hadoop/ /user/hadoop/hadoop/output
13/02/17 17:14:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/02/17 17:14:38 INFO input.FileInputFormat: Total input paths to process : 1
13/02/17 17:14:38 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/17 17:14:38 INFO mapred.JobClient: Running job: job_local_0001
13/02/17 17:14:38 INFO util.ProcessTree: setsid exited with exit code 0
13/02/17 17:14:38 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache
.hadoop.util.LinuxResourceCalculatorPlugin at 2f74219d
13/02/17 17:14:38 INFO mapred.MapTask: io.sort.mb = 100
13/02/17 17:14:38 INFO mapred.MapTask: data buffer = 79691776/99614720
13/02/17 17:14:38 INFO mapred.MapTask: record buffer = 262144/327680
13/02/17 17:14:38 INFO mapred.MapTask: Starting flush of map output
13/02/17 17:14:39 INFO mapred.JobClient: map 0% reduce 0%
13/02/17 17:14:39 INFO mapred.MapTask: Finished spill 0
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
is done.
And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/02/17 17:14:39 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache
.hadoop.util.LinuxResourceCalculatorPlugin at 6d79953c
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Merger: Merging 1 sorted segments
13/02/17 17:14:39 INFO mapred.Merger: Down to the last merge-pass,
with 1 segmen
ts left of total size: 79496 bytes
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_r_000000_0
is done.
And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task attempt_local_0001_r_000000_0
is allowe
d to commit now
13/02/17 17:14:39 INFO output.FileOutputCommitter: Saved output of
task 'attempt
_local_0001_r_000000_0' to
/user/hadoop/hadoop/output
13/02/17 17:14:39 INFO mapred.LocalJobRunner: reduce > reduce
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/02/17 17:14:40 INFO mapred.JobClient: map 100% reduce 100%
13/02/17 17:14:40 INFO mapred.JobClient: Job complete: job_local_0001
13/02/17 17:14:40 INFO mapred.JobClient: Counters: 20
13/02/17 17:14:40 INFO mapred.JobClient: File Output Format Counters
13/02/17 17:14:40 INFO mapred.JobClient: Bytes Written=57885
13/02/17 17:14:40 INFO mapred.JobClient: FileSystemCounters
13/02/17 17:14:40 INFO mapred.JobClient: FILE_BYTES_READ=643420
13/02/17 17:14:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=574349
13/02/17 17:14:40 INFO mapred.JobClient: File Input Format Counters
13/02/17 17:14:40 INFO mapred.JobClient: Bytes Read=139351
13/02/17 17:14:40 INFO mapred.JobClient: Map-Reduce Framework
13/02/17 17:14:40 INFO mapred.JobClient: Map output materialized bytes=79500
13/02/17 17:14:40 INFO mapred.JobClient: Map input records=2932
13/02/17 17:14:40 INFO mapred.JobClient: Reduce shuffle bytes=0
13/02/17 17:14:40 INFO mapred.JobClient: Spilled Records=11180
13/02/17 17:14:40 INFO mapred.JobClient: Map output bytes=212823
13/02/17 17:14:40 INFO mapred.JobClient: Total committed heap
usage (bytes)=
500432896
13/02/17 17:14:40 INFO mapred.JobClient: CPU time spent (ms)=0
13/02/17 17:14:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=99
13/02/17 17:14:40 INFO mapred.JobClient: Combine input records=21582
13/02/17 17:14:40 INFO mapred.JobClient: Reduce input records=5590
13/02/17 17:14:40 INFO mapred.JobClient: Reduce input groups=5590
13/02/17 17:14:40 INFO mapred.JobClient: Combine output records=5590
13/02/17 17:14:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient: Reduce output records=5590
13/02/17 17:14:40 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient: Map output records=21582
Does that mean lustre is working for Hadoop?
Please suggest?
How shall I know if Lustre complete setup is working?
|
|
|
02-17-2013, 09:48 AM
|
#9
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406
|
Quote:
Originally Posted by linuxunix
Hi TB0ne,
I have the following entries in the files: All I did is I ran the below command and it went successful
Does that mean lustre is working for Hadoop? Please suggest? How shall I know if Lustre complete setup is working?
|
You said before you read the lustre docs...if you did, you'd have seen the configuration/installation pages, which tell you how to install, configure, and test things.
http://wiki.lustre.org/index.php/Con...re_File_System
http://wiki.lustre.org/index.php/Lustre_FAQ
And if you ran something and it WORKED, wouldn't that tell you that it's running correctly?
|
|
|
02-17-2013, 10:55 AM
|
#10
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Original Poster
Rep:
|
I have configured lustre properly with MDS, OSS and Client configured.
But there is no document which talks about hadoop over lustre.
I had wordcount example which did work without any namenode and datanode concept. I could see hadoop did well in counting the words. I opened http://localhost:50030 but couldnt find any action out there.
1. I am surprise !!! How come jobtracker and tasktracker activity is not happening?
2. How to confirm if MDS and OSS really did any job during the wordcount?
Last edited by linuxunix; 02-17-2013 at 10:58 AM.
|
|
|
02-17-2013, 11:23 AM
|
#11
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406
|
Quote:
Originally Posted by linuxunix
I have configured lustre properly with MDS, OSS and Client configured.
But there is no document which talks about hadoop over lustre.
|
AGAIN...it's just a file system. As long as you configure Hadoop correctly, it works.
Quote:
I had wordcount example which did work without any namenode and datanode concept. I could see hadoop did well in counting the words. I opened http://localhost:50030 but couldnt find any action out there.
1. I am surprise !!! How come jobtracker and tasktracker activity is not happening?
2. How to confirm if MDS and OSS really did any job during the wordcount?
|
..and if you removed nodes from the cluster, how are they going to be tracked? AGAIN....lustre is a FILE SYSTEM. Using it vs. HDFS doesn't change the Hadoop setup, just the disk format. You say you've read the docs, and know the differences, so that should be an obvious thing.
|
|
|
02-17-2013, 12:28 PM
|
#12
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Original Poster
Rep:
|
TB0ne,
If you see the below output:
Code:
[root at alpha hadoop]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /user/hadoop/hadoop/ /user/hadoop/hadoop/output
13/02/17 17:14:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/02/17 17:14:38 INFO input.FileInputFormat: Total input paths to process : 1
13/02/17 17:14:38 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/17 17:14:38 INFO mapred.JobClient: Running job: job_local_0001
13/02/17 17:14:38 INFO util.ProcessTree: setsid exited with exit code 0
13/02/17 17:14:38 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache
.hadoop.util.LinuxResourceCalculatorPlugin at 2f74219d
13/02/17 17:14:38 INFO mapred.MapTask: io.sort.mb = 100
13/02/17 17:14:38 INFO mapred.MapTask: data buffer = 79691776/99614720
13/02/17 17:14:38 INFO mapred.MapTask: record buffer = 262144/327680
13/02/17 17:14:38 INFO mapred.MapTask: Starting flush of map output
13/02/17 17:14:39 INFO mapred.JobClient: map 0% reduce 0%
13/02/17 17:14:39 INFO mapred.MapTask: Finished spill 0
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
is done.
And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/02/17 17:14:39 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache
.hadoop.util.LinuxResourceCalculatorPlugin at 6d79953c
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Merger: Merging 1 sorted segments
13/02/17 17:14:39 INFO mapred.Merger: Down to the last merge-pass,
with 1 segmen
ts left of total size: 79496 bytes
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_r_000000_0
is done.
And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task attempt_local_0001_r_000000_0
is allowe
d to commit now
13/02/17 17:14:39 INFO output.FileOutputCommitter: Saved output of
task 'attempt
_local_0001_r_000000_0' to
/user/hadoop/hadoop/output
13/02/17 17:14:39 INFO mapred.LocalJobRunner: reduce > reduce
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/02/17 17:14:40 INFO mapred.JobClient: map 100% reduce 100%
13/02/17 17:14:40 INFO mapred.JobClient: Job complete: job_local_0001
13/02/17 17:14:40 INFO mapred.JobClient: Counters: 20
13/02/17 17:14:40 INFO mapred.JobClient: File Output Format Counters
13/02/17 17:14:40 INFO mapred.JobClient: Bytes Written=57885
13/02/17 17:14:40 INFO mapred.JobClient: FileSystemCounters
13/02/17 17:14:40 INFO mapred.JobClient: FILE_BYTES_READ=643420
13/02/17 17:14:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=574349
13/02/17 17:14:40 INFO mapred.JobClient: File Input Format Counters
13/02/17 17:14:40 INFO mapred.JobClient: Bytes Read=139351
13/02/17 17:14:40 INFO mapred.JobClient: Map-Reduce Framework
13/02/17 17:14:40 INFO mapred.JobClient: Map output materialized bytes=79500
13/02/17 17:14:40 INFO mapred.JobClient: Map input records=2932
13/02/17 17:14:40 INFO mapred.JobClient: Reduce shuffle bytes=0
13/02/17 17:14:40 INFO mapred.JobClient: Spilled Records=11180
13/02/17 17:14:40 INFO mapred.JobClient: Map output bytes=212823
13/02/17 17:14:40 INFO mapred.JobClient: Total committed heap
usage (bytes)=
500432896
13/02/17 17:14:40 INFO mapred.JobClient: CPU time spent (ms)=0
13/02/17 17:14:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=99
13/02/17 17:14:40 INFO mapred.JobClient: Combine input records=21582
13/02/17 17:14:40 INFO mapred.JobClient: Reduce input records=5590
13/02/17 17:14:40 INFO mapred.JobClient: Reduce input groups=5590
13/02/17 17:14:40 INFO mapred.JobClient: Combine output records=5590
13/02/17 17:14:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient: Reduce output records=5590
13/02/17 17:14:40 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient: Map output records=21582
What point says that it has used Lustre filesystem?
Here in my setup, I had 2 lustre client, 1 MDS/MDT and 1 OSS/OST(I pasted paste.ubuntu.com) in my first post.
I am trying to understand that though I have mentioned about file:///lustre in core-site.xml, how it used lustre?
|
|
|
02-22-2013, 10:25 AM
|
#13
|
LQ Newbie
Registered: Feb 2013
Posts: 4
Rep:
|
Hey Guys i think that Parallel File System is a high-performance spread file system developed by IBM.It is used by many of the biggest commercial companies, as well as some of the super computers.Parallel File System provides backup high-speed file access to programs performing on several nodes of groups.Thanks a lot!!
|
|
|
02-26-2013, 03:46 AM
|
#14
|
Member
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235
Original Poster
Rep:
|
Yes, But what I am using is Lustre and not IBM filesystem.
|
|
|
03-18-2013, 07:11 AM
|
#15
|
Senior Member
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Rep:
|
TB0ne,
I tried a lot but couldnt really understand if HDFS or lustre if actually working here.
I have no idea if this file:
Code:
[root@alpha hadoop]# cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
[root@alpha hadoop]#
is actually enabling HDFS.
Please suggest.
|
|
|
All times are GMT -5. The time now is 08:52 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|