LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 02-16-2013, 09:31 AM   #1
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Rep: Reputation: 18
Need help with Parallel Filesystem


Hello,

I am in verse to replace HDFS with Lustre for Hadoop. I have 1 MDS Machine, 1 OSS machine, 2 Lustre Client. I have installed Hadoop on two machines(Lustre Clients). I have pasted my details under http://paste.ubuntu.com/1661235/

All I need to know how shall I go ahead and run wordcount as an example program. For HDFS we specify master and slaves but for lustre, namenode and datanode is not at all trequired(Right?).
 
Old 02-16-2013, 12:05 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406

Rep: Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111
Quote:
Originally Posted by linuxunix View Post
Hello,
I am in verse to replace HDFS with Lustre for Hadoop. I have 1 MDS Machine, 1 OSS machine, 2 Lustre Client. I have installed Hadoop on two machines(Lustre Clients). I have pasted my details under http://paste.ubuntu.com/1661235/

All I need to know how shall I go ahead and run wordcount as an example program. For HDFS we specify master and slaves but for lustre, namenode and datanode is not at all trequired(Right?).
Have you read the documentation on lustre to see?
http://wiki.lustre.org/index.php/Use:Use
..or tried Google?
http://stackoverflow.com/questions/1...mple-in-hadoop

Best question would be: does namenode and datanode exist in your new configuration?
 
Old 02-16-2013, 12:22 PM   #3
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
I read about lustre link which you suggested.
But I am still not sure what configuration files do I need to make changes for lustre.
Also, the link suggested that running word count just need a file to collect data and not need HDFS.
But how lustre gonna do analytic task.
 
Old 02-16-2013, 06:01 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406

Rep: Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111
Quote:
Originally Posted by linuxunix View Post
I read about lustre link which you suggested.
But I am still not sure what configuration files do I need to make changes for lustre.
Also, the link suggested that running word count just need a file to collect data and not need HDFS.
But how lustre gonna do analytic task.
Lustre isn't going to do ANY analytic tasks...you said you read the links, did you see on the Lustre front page where it tells you:
Quote:
Originally Posted by Lustre Website
...the Lustre™ file system
Lustre is a file system...to put it simply, it provides a 'disk' for the operating system/computer to use. Just like HDFS. Your question is like saying "I'm going from ext2 to ext4 on my system...how do I run a bash script?"
 
Old 02-16-2013, 08:45 PM   #5
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
I am pretty aware that lustre is filesystem.but my question is in terms of configuration files.
Just like we have core-site.xml,hdfs-site.xml,malted-site.xml etc. what files changes do I need?

The changes which I did are enough?
Also,I don't need to store it into HDFS? Which means I don't need to take care of namenode datanode correct?
It would be great if you can share what steps do I need to follow?
 
Old 02-17-2013, 06:36 AM   #6
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Blog Entries: 6

Rep: Reputation: 51
What error are you facing?

Last edited by your_shadow03; 02-17-2013 at 06:49 AM.
 
Old 02-17-2013, 06:46 AM   #7
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Blog Entries: 6

Rep: Reputation: 51
What configuration files changes are you running?

Last edited by your_shadow03; 02-17-2013 at 06:50 AM.
 
Old 02-17-2013, 06:58 AM   #8
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
Hi TB0ne,

I have the following entries in the files:

File: core-site.xml

Code:
<property>
<name>fs.default.name</name>
<value>file:///lustre</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>${fs.default.name}/hadoop_tmp/mapred/system</value>
<description>The shared directory where MapReduce stores control
files.
</description>
</property>
mapred-site.xml
Code:
[root@alpha hadoop]# cat conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
</configuration>
[root@alpha hadoop]#
File: hdfs-site.xml ==< I AM NOT SURE IF THIS IS REALLY NEEDED
Code:
[root@alpha hadoop]# cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.replication</name>
    <value>1</value>
      <description>Default block replication.
        The actual number of replications can be specified when the file is created.
          The default is used if replication is not specified in create time.
            </description>
            </property>
</configuration>
[root@alpha hadoop]#
All I did is I ran the below command and it went successful

Code:
[root at alpha hadoop]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /user/hadoop/hadoop/ /user/hadoop/hadoop/output
13/02/17 17:14:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/02/17 17:14:38 INFO input.FileInputFormat: Total input paths to process : 1
13/02/17 17:14:38 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/17 17:14:38 INFO mapred.JobClient: Running job: job_local_0001
13/02/17 17:14:38 INFO util.ProcessTree: setsid exited with exit code 0
13/02/17 17:14:38 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache

.hadoop.util.LinuxResourceCalculatorPlugin at 2f74219d
13/02/17 17:14:38 INFO mapred.MapTask: io.sort.mb = 100
13/02/17 17:14:38 INFO mapred.MapTask: data buffer = 79691776/99614720
13/02/17 17:14:38 INFO mapred.MapTask: record buffer = 262144/327680
13/02/17 17:14:38 INFO mapred.MapTask: Starting flush of map output
13/02/17 17:14:39 INFO mapred.JobClient:  map 0% reduce 0%
13/02/17 17:14:39 INFO mapred.MapTask: Finished spill 0
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
is done.
                          And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/02/17 17:14:39 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache

.hadoop.util.LinuxResourceCalculatorPlugin at 6d79953c
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Merger: Merging 1 sorted segments
13/02/17 17:14:39 INFO mapred.Merger: Down to the last merge-pass,
with 1 segmen
                              ts left of total size: 79496 bytes
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_r_000000_0
is done.
                          And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task attempt_local_0001_r_000000_0
is allowe
                          d to commit now
13/02/17 17:14:39 INFO output.FileOutputCommitter: Saved output of
task 'attempt
                              _local_0001_r_000000_0' to
/user/hadoop/hadoop/output
13/02/17 17:14:39 INFO mapred.LocalJobRunner: reduce > reduce
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/02/17 17:14:40 INFO mapred.JobClient:  map 100% reduce 100%
13/02/17 17:14:40 INFO mapred.JobClient: Job complete: job_local_0001
13/02/17 17:14:40 INFO mapred.JobClient: Counters: 20
13/02/17 17:14:40 INFO mapred.JobClient:   File Output Format Counters
13/02/17 17:14:40 INFO mapred.JobClient:     Bytes Written=57885
13/02/17 17:14:40 INFO mapred.JobClient:   FileSystemCounters
13/02/17 17:14:40 INFO mapred.JobClient:     FILE_BYTES_READ=643420
13/02/17 17:14:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=574349
13/02/17 17:14:40 INFO mapred.JobClient:   File Input Format Counters
13/02/17 17:14:40 INFO mapred.JobClient:     Bytes Read=139351
13/02/17 17:14:40 INFO mapred.JobClient:   Map-Reduce Framework
13/02/17 17:14:40 INFO mapred.JobClient:     Map output materialized bytes=79500
13/02/17 17:14:40 INFO mapred.JobClient:     Map input records=2932
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/02/17 17:14:40 INFO mapred.JobClient:     Spilled Records=11180
13/02/17 17:14:40 INFO mapred.JobClient:     Map output bytes=212823
13/02/17 17:14:40 INFO mapred.JobClient:     Total committed heap
usage (bytes)=
                               500432896
13/02/17 17:14:40 INFO mapred.JobClient:     CPU time spent (ms)=0
13/02/17 17:14:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=99
13/02/17 17:14:40 INFO mapred.JobClient:     Combine input records=21582
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce input records=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce input groups=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Combine output records=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce output records=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient:     Map output records=21582
Does that mean lustre is working for Hadoop?
Please suggest?
How shall I know if Lustre complete setup is working?
 
Old 02-17-2013, 09:48 AM   #9
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406

Rep: Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111
Quote:
Originally Posted by linuxunix View Post
Hi TB0ne,
I have the following entries in the files: All I did is I ran the below command and it went successful

Does that mean lustre is working for Hadoop? Please suggest? How shall I know if Lustre complete setup is working?
You said before you read the lustre docs...if you did, you'd have seen the configuration/installation pages, which tell you how to install, configure, and test things.

http://wiki.lustre.org/index.php/Con...re_File_System
http://wiki.lustre.org/index.php/Lustre_FAQ

And if you ran something and it WORKED, wouldn't that tell you that it's running correctly?
 
Old 02-17-2013, 10:55 AM   #10
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
I have configured lustre properly with MDS, OSS and Client configured.
But there is no document which talks about hadoop over lustre.
I had wordcount example which did work without any namenode and datanode concept. I could see hadoop did well in counting the words. I opened http://localhost:50030 but couldnt find any action out there.

1. I am surprise !!! How come jobtracker and tasktracker activity is not happening?
2. How to confirm if MDS and OSS really did any job during the wordcount?

Last edited by linuxunix; 02-17-2013 at 10:58 AM.
 
Old 02-17-2013, 11:23 AM   #11
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,406

Rep: Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111Reputation: 8111
Quote:
Originally Posted by linuxunix View Post
I have configured lustre properly with MDS, OSS and Client configured.
But there is no document which talks about hadoop over lustre.
AGAIN...it's just a file system. As long as you configure Hadoop correctly, it works.
Quote:
I had wordcount example which did work without any namenode and datanode concept. I could see hadoop did well in counting the words. I opened http://localhost:50030 but couldnt find any action out there.

1. I am surprise !!! How come jobtracker and tasktracker activity is not happening?
2. How to confirm if MDS and OSS really did any job during the wordcount?
..and if you removed nodes from the cluster, how are they going to be tracked? AGAIN....lustre is a FILE SYSTEM. Using it vs. HDFS doesn't change the Hadoop setup, just the disk format. You say you've read the docs, and know the differences, so that should be an obvious thing.
 
Old 02-17-2013, 12:28 PM   #12
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
TB0ne,

If you see the below output:


Code:
[root at alpha hadoop]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /user/hadoop/hadoop/ /user/hadoop/hadoop/output
13/02/17 17:14:37 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/02/17 17:14:38 INFO input.FileInputFormat: Total input paths to process : 1
13/02/17 17:14:38 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/17 17:14:38 INFO mapred.JobClient: Running job: job_local_0001
13/02/17 17:14:38 INFO util.ProcessTree: setsid exited with exit code 0
13/02/17 17:14:38 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache

.hadoop.util.LinuxResourceCalculatorPlugin at 2f74219d
13/02/17 17:14:38 INFO mapred.MapTask: io.sort.mb = 100
13/02/17 17:14:38 INFO mapred.MapTask: data buffer = 79691776/99614720
13/02/17 17:14:38 INFO mapred.MapTask: record buffer = 262144/327680
13/02/17 17:14:38 INFO mapred.MapTask: Starting flush of map output
13/02/17 17:14:39 INFO mapred.JobClient:  map 0% reduce 0%
13/02/17 17:14:39 INFO mapred.MapTask: Finished spill 0
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
is done.
                          And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/02/17 17:14:39 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache

.hadoop.util.LinuxResourceCalculatorPlugin at 6d79953c
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Merger: Merging 1 sorted segments
13/02/17 17:14:39 INFO mapred.Merger: Down to the last merge-pass,
with 1 segmen
                              ts left of total size: 79496 bytes
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task:attempt_local_0001_r_000000_0
is done.
                          And is in the process of commiting
13/02/17 17:14:39 INFO mapred.LocalJobRunner:
13/02/17 17:14:39 INFO mapred.Task: Task attempt_local_0001_r_000000_0
is allowe
                          d to commit now
13/02/17 17:14:39 INFO output.FileOutputCommitter: Saved output of
task 'attempt
                              _local_0001_r_000000_0' to
/user/hadoop/hadoop/output
13/02/17 17:14:39 INFO mapred.LocalJobRunner: reduce > reduce
13/02/17 17:14:39 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/02/17 17:14:40 INFO mapred.JobClient:  map 100% reduce 100%
13/02/17 17:14:40 INFO mapred.JobClient: Job complete: job_local_0001
13/02/17 17:14:40 INFO mapred.JobClient: Counters: 20
13/02/17 17:14:40 INFO mapred.JobClient:   File Output Format Counters
13/02/17 17:14:40 INFO mapred.JobClient:     Bytes Written=57885
13/02/17 17:14:40 INFO mapred.JobClient:   FileSystemCounters
13/02/17 17:14:40 INFO mapred.JobClient:     FILE_BYTES_READ=643420
13/02/17 17:14:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=574349
13/02/17 17:14:40 INFO mapred.JobClient:   File Input Format Counters
13/02/17 17:14:40 INFO mapred.JobClient:     Bytes Read=139351
13/02/17 17:14:40 INFO mapred.JobClient:   Map-Reduce Framework
13/02/17 17:14:40 INFO mapred.JobClient:     Map output materialized bytes=79500
13/02/17 17:14:40 INFO mapred.JobClient:     Map input records=2932
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/02/17 17:14:40 INFO mapred.JobClient:     Spilled Records=11180
13/02/17 17:14:40 INFO mapred.JobClient:     Map output bytes=212823
13/02/17 17:14:40 INFO mapred.JobClient:     Total committed heap
usage (bytes)=
                               500432896
13/02/17 17:14:40 INFO mapred.JobClient:     CPU time spent (ms)=0
13/02/17 17:14:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=99
13/02/17 17:14:40 INFO mapred.JobClient:     Combine input records=21582
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce input records=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce input groups=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Combine output records=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient:     Reduce output records=5590
13/02/17 17:14:40 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/02/17 17:14:40 INFO mapred.JobClient:     Map output records=21582
What point says that it has used Lustre filesystem?
Here in my setup, I had 2 lustre client, 1 MDS/MDT and 1 OSS/OST(I pasted paste.ubuntu.com) in my first post.
I am trying to understand that though I have mentioned about file:///lustre in core-site.xml, how it used lustre?
 
Old 02-22-2013, 10:25 AM   #13
markspend1
LQ Newbie
 
Registered: Feb 2013
Posts: 4

Rep: Reputation: Disabled
Hey Guys i think that Parallel File System is a high-performance spread file system developed by IBM.It is used by many of the biggest commercial companies, as well as some of the super computers.Parallel File System provides backup high-speed file access to programs performing on several nodes of groups.Thanks a lot!!
 
Old 02-26-2013, 03:46 AM   #14
linuxunix
Member
 
Registered: Mar 2010
Location: California
Distribution: Slackware
Posts: 235

Original Poster
Rep: Reputation: 18
Yes, But what I am using is Lustre and not IBM filesystem.
 
Old 03-18-2013, 07:11 AM   #15
your_shadow03
Senior Member
 
Registered: Jun 2008
Location: Germany
Distribution: Slackware
Posts: 1,466
Blog Entries: 6

Rep: Reputation: 51
TB0ne,

I tried a lot but couldnt really understand if HDFS or lustre if actually working here.
I have no idea if this file:

Code:
[root@alpha hadoop]# cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.replication</name>
    <value>1</value>
      <description>Default block replication.
        The actual number of replications can be specified when the file is created.
          The default is used if replication is not specified in create time.
            </description>
            </property>
</configuration>
[root@alpha hadoop]#
is actually enabling HDFS.
Please suggest.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using noauto option for filesystem in fstab yet the filesystem mounts on boot Alpha90 Linux - Newbie 4 01-05-2013 11:58 AM
[SOLVED] Executing a command in parallel | GNU parallel or xargs the_gripmaster AIX 3 05-08-2012 08:41 AM
Confusing USB Filesystem Problem (Can't Wipe Ext3 Filesystem) dkaplowitz Linux - Hardware 3 04-14-2007 07:30 PM
DISCUSSION: Virtual Filesystem: Building a Linux Filesystem from an Ordinary File mchirico LinuxAnswers Discussion 0 10-28-2004 11:35 PM
Encrypted Root Filesystem HOWTO and /dev filesystem tmillard Linux From Scratch 0 10-18-2004 04:58 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 08:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration