LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-14-2013, 02:51 AM   #1
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Rep: Reputation: Disabled
Smile linux join command for multiple files


Hi,
I try to join the following 2 files together with linux join command. But i cannot get the output i need. Any advice or help would be appreciated.

Input 1:
A
B
C
D

Input 2:
A
C
E
F

Code:
Desired output:
A  A
B
C  C
D
   E
   F
Thank you.
 
Old 03-14-2013, 02:55 AM   #2
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,856
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
here is an example

Code:
#!/bin/sh

cat >/tmp/jointest.1 <<DONE
A 1-A
B 1-B
C 1-C
D 1-D
DONE

cat >/tmp/jointest.2 <<DONE
A 2-A
C 2-C
E 2-E
F 2-F
DONE

join -a 1 -a 2 -j 1 -o 0,1.2,2.2 -e '###' -- /tmp/jointest.1 /tmp/jointest.2

Last edited by NevemTeve; 03-14-2013 at 03:04 AM.
 
1 members found this post helpful.
Old 03-14-2013, 03:52 AM   #3
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594

Rep: Reputation: 141Reputation: 141
might be easier to accomplish with diff
Code:
diff -y f1 f2|grep -Eo "[^<|> ]*"
A                                                               A
B
C                                                               C
D
        E

        F
not completely what you want but comes in the direction.
 
1 members found this post helpful.
Old 03-14-2013, 04:06 AM   #4
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Original Poster
Rep: Reputation: Disabled
Hi,
Thank you very much, it works very well. May i know the logic behind the -o FORMAT, that is 0,1.2,2.2 ?

Actually i have a total of 12 files to be joined together. After joining the first 2 files, a 3rd file will be joined and so on.

Code:
results from joining 2 files:
A   A
B   
C   C
D
    E
    F
H
3rd file:
B
E
G

Code:
Desired output:
A   A
B       B
C   C
D
    E   E
    F
        G
H
Many thanks.
 
Old 03-14-2013, 09:39 AM   #5
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,856
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
> May i know the logic behind the -o FORMAT, that is 0,1.2,2.2 ?

That's what manual is good for (man 1 join)
0 is the key field,
1.n is the n-th field from file #1,
2.m is the m-th field from file #2
 
1 members found this post helpful.
Old 03-14-2013, 11:13 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
I generalized this problem to accommodate "n" input files.
In this proposed solution there are 5 input files.

InFile1 ...
Code:
apple
cherry
fig
lemon
mango
orange
InFile2 ...
Code:
banana
cherry
orange
peach
InFile3 ...
Code:
cherry
fig
grape
InFile4 ...
Code:
apple
fig
grape
lemon
mango
peach
InFile5 ...
Code:
banana
cherry
lemon
mango
peach
With these inputs, this code ...
Code:
# File identification
   Path=$(cut -d'.' -f1 <<< ${0})
OutFile=$Path"out.txt"

 InFile1=$Path"inp1.txt"
 InFile2=$Path"inp2.txt"
 InFile3=$Path"inp3.txt"
 InFile4=$Path"inp4.txt"
 InFile5=$Path"inp5.txt"
   Work1=$Path"w1.txt"
   Work2=$Path"w2.txt"
   Work3=$Path"w3.txt"
   Work4=$Path"w4.txt"
   Work5=$Path"w5.txt"
   Ustrs=$Path"ustrs.txt"


# Make a file of all unique strings (Ustrs)
sort $Path'inp'*'.txt' -u > $Ustrs

# nif = number of input files
nif=$(ls $Path'inp'*'.txt' | wc -l)

# Make working copies of each input file with
#  blank lines inserted for "missing" words.
for (( j=1;j<=nif;j++ )) ;
  do
    sort $Path'inp'$j'.txt' $Ustrs  \
    |uniq -c                        \
    |sed 's/ 1 .*/ /'               \
    |cut -c9-                       \
    >$Path'w'$j'.txt'
  done
 
# Paste all the work files together 
#  to form finished matrix.
paste $Path'w'*'.txt' >$OutFile
... produced this output matrix ...
Code:
apple			apple	
	banana			banana
cherry	cherry	cherry		cherry
fig		fig	fig	
		grape	grape	
lemon			lemon	lemon
mango			mango	mango
orange	orange			
	peach		peach	peach
Daniel B. Martin

Last edited by danielbmartin; 03-14-2013 at 03:37 PM. Reason: Cosmetic improvement
 
1 members found this post helpful.
Old 03-14-2013, 09:58 PM   #7
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Original Poster
Rep: Reputation: Disabled
Many thanks Daniel for the coming up with the scripts. I think i am almost there. I saved your script into script.sh and i edited your script to run my 12 files as follow:
Code:
# File identification
   Path=$(cut -d'.' -f1 <<< ${0})
OutFile=$Path"out.txt"

 filename1.txt=$Path"inp1.txt"
 filename2.txt=$Path"inp2.txt"
 filename3.txt=$Path"inp3.txt"
 filename4.txt=$Path"inp4.txt"
 filename5.txt=$Path"inp5.txt"
 filename6.txt=$Path"inp6.txt"
 filename7.txt=$Path"inp7.txt"
 filename8.txt=$Path"inp8.txt"
 filename9.txt=$Path"inp9.txt"
 filename10.txt=$Path"inp10.txt"
 filename11.txt=$Path"inp11.txt"
 filename12.txt=$Path"inp12.txt"
   Work1=$Path"w1.txt"
   Work2=$Path"w2.txt"
   Work3=$Path"w3.txt"
   Work4=$Path"w4.txt"
   Work5=$Path"w5.txt"
   Work6=$Path"w6.txt"
   Work7=$Path"w7.txt"
   Work8=$Path"w8.txt"
   Work9=$Path"w9.txt"
   Work10=$Path"w10.txt"
   Work11=$Path"w11.txt"
   Work12=$Path"w12.txt"
   Ustrs=$Path"ustrs.txt"


# Make a file of all unique strings (Ustrs)
sort $Path'inp'*'.txt' -u > $Ustrs

# nif = number of input files
nif=$(ls $Path'inp'*'.txt' | wc -l)

# Make working copies of each input file with
#  blank lines inserted for "missing" words.
for (( j=1;j<=nif;j++ )) ;
  do
    sort $Path'inp'$j'.txt' $Ustrs  \
    |uniq -c                        \
    |sed 's/ 1 .*/ /'               \
    |cut -c9-                       \
    >$Path'w'$j'.txt'
  done
 
# Paste all the work files together 
#  to form finished matrix.
paste $Path'w'*'.txt' >$OutFile
But the terminal showed me "Syntax error:redirection unexpected. What have i done wrongly in editing the script?
 
Old 03-14-2013, 10:31 PM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Cheah Boon Huat View Post
... the terminal showed me "Syntax error:redirection unexpected. What have i done wrongly in editing the script?
There is not enough information here to give an informed answer. I see nothing wrong with the changes you made.

1) Are all 12 input files and the script located in the same folder?

2) Have you tried to execute the script with only five input files?

3) Are the input files small enough to post here? If not, can they be put on a web page which I may access?

4) You said "I saved your script into script.sh" I don't know what that means. My program is named
/home/daniel/Desktop/LQfiles/dbm678.bin
The entire program is given below...

Daniel B. Martin

Code:
#!/bin/bash     Daniel B. Martin   Mar13
#
#   To execute this program, launch a terminal session and enter:
#   bash /home/daniel/Desktop/LQfiles/dbm678.bin
#
#  This program inspired by:
#  http://www.linuxquestions.org/questions/programming-9/
#    linux-join-command-for-multiple-files-4175454012/

# File identification
   Path=$(cut -d'.' -f1 <<< ${0})
OutFile=$Path"out.txt"

InFile1=$Path"inp1.txt"
InFile2=$Path"inp2.txt"
InFile3=$Path"inp3.txt"
InFile4=$Path"inp4.txt"
InFile5=$Path"inp5.txt"
  Work1=$Path"w1.txt"
  Work2=$Path"w2.txt"
  Work3=$Path"w3.txt"
  Work4=$Path"w4.txt"
  Work5=$Path"w5.txt"
  Ustrs=$Path"ustrs.txt"

echo
echo "Method of LQ member danielbmartin #1"
# Make a file of all unique strings (Ustrs)
sort $Path'inp'*'.txt' -u > $Ustrs

# nif = number of input files
nif=$(ls $Path'inp'*'.txt' | wc -l)

# Make working copies of each input file with
#  blank lines inserted for "missing" words.
for (( j=1;j<=nif;j++ )) ;
  do
     sort $Path'inp'$j'.txt' $Ustrs  \
    |uniq -c                         \
    |sed 's/ 1 .*/ /'                \
    |cut -c9-                        \
    >$Path'w'$j'.txt'
  done
 
# Paste all the work files together 
#  to form the finished product.
paste $Path'w'*'.txt' >$OutFile

echo; echo "OutFile ..."; cat $OutFile

echo; echo "Normal end of job."; echo
exit
 
Old 03-14-2013, 11:03 PM   #9
Cheah Boon Huat
LQ Newbie
 
Registered: Feb 2013
Posts: 10

Original Poster
Rep: Reputation: Disabled
Code:
1) Are all 12 input files and the script located in the same folder?

2) Have you tried to execute the script with only five input files?

3) Are the input files small enough to post here? If not, can they be put on a web page which I may access?

4) You said "I saved your script into script.sh" I don't know what that means. My program is named
/home/daniel/Desktop/LQfiles/dbm678.bin
The entire program is given below...
Yes, i have checked through question 1 & 2, but i get the same results.

For question 3, i couldn't post my files up as it is confidential.

For question 4, i tried to say i save your program into a file with the filename of 'script.sh' after i edit the program as showed above.

When i run your program in the terminal i just key in:
Code:
sh script.sh
Anyway, really thank you for your help, Daniel. I will use other alternative to solve this problem.
 
Old 03-15-2013, 02:49 AM   #10
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594

Rep: Reputation: 141Reputation: 141
@OP What is the OS and shell you use and how big are the datafiles.
 
Old 03-15-2013, 03:27 AM   #11
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594

Rep: Reputation: 141Reputation: 141
Does your script reference /bin/bash or /bin/sh in its bash bang line? The default system shell in Ubuntu is dash, not bash, so if you have #!/bin/sh then your script will be using a different shell than you expect. Dash does not have the <<< redirection operator.
Or do it the simpler,

Code:
Path=$(pwd)/

Or use the shell

Path=$PWD/
instead of 
Path=$(cut -d'.' -f1 <<< ${0})

Last edited by whizje; 03-15-2013 at 03:33 AM. Reason: typo
 
Old 03-15-2013, 09:13 AM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by whizje View Post
Or do it the simpler,

Code:
Path=$(pwd)/

Or use the shell

Path=$PWD/
instead of 
Path=$(cut -d'.' -f1 <<< ${0})
My bash is self-taught and admittedly weak, mostly picked up by copying code found on this forum.
This code ...
Code:
   Path=$(pwd)/
echo "Path="$Path
   Path=$PWD/
echo "Path="$Path
   Path=$(cut -d'.' -f1 <<< ${0})
echo "Path="$Path
... produced this on-screen display ...
Code:
Path=/home/daniel/
Path=/home/daniel/
Path=/home/daniel/Desktop/LQfiles/dbm678
... so your Path= and mine are not equivalent.

Please explain.

Daniel B. Martin
 
Old 03-15-2013, 09:24 AM   #13
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by whizje View Post
Does your script reference /bin/bash or /bin/sh in its bash bang line? The default system shell in Ubuntu is dash, not bash, so if you have #!/bin/sh then your script will be using a different shell than you expect. Dash does not have the <<< redirection operator.
My bash is self-taught and admittedly weak, mostly picked up by copying code found on this forum.
I code #!/bin/bash in the first line of my bash programs without understanding the reason.
Pure mimicry.

This is the first line of my program dbm678.bin ...
Code:
#!/bin/bash     Daniel B. Martin   Mar13
... but if I change it to this ...
Code:
#               Daniel B. Martin   Mar13
... the program still runs correctly.

My computer runs Ubuntu 10.04 LTS.

Please explain.

Daniel B. Martin
 
Old 03-15-2013, 11:05 AM   #14
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Code:
#   To execute this program, launch a terminal session and enter:
#   bash /home/daniel/Desktop/LQfiles/dbm678.bin
When you run a script this way, the #! line is ignored as a comment. It would only be used if you did:
Code:
/home/daniel/Desktop/LQfiles/dbm678.bin
ie, exectute the script directly without explicitly calling bash.

Quote:
Originally Posted by Cheah Boon Huat View Post
When i run your program in the terminal i just key in:
Code:
sh script.sh
This is also bypasses the #! line, since you invoked sh directly.
 
Old 03-15-2013, 12:09 PM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by ntubski View Post
Code:
#   To execute this program, launch a terminal session and enter:
#   bash /home/daniel/Desktop/LQfiles/dbm678.bin
When you run a script this way, the #! line is ignored as a comment. It would only be used if you did:
Code:
/home/daniel/Desktop/LQfiles/dbm678.bin
ie, execute the script directly without explicitly calling bash.

This also bypasses the #! line, since you invoked sh directly.
The first line in program dbm684 is #!/bin/bash Daniel B. Martin Mar13

When I enter bash /home/daniel/Desktop/LQfiles/dbm684.bin the program executes normally.
When I enter /home/daniel/Desktop/LQfiles/dbm684.bin the result is bash: /home/daniel/Desktop/LQfiles/dbm684.bin: Permission denied

It's unclear what benefit is obtained by the #!/bin/bash

Daniel B. Martin
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Join Linux Server With Multiple Domains ? prayingtosky Linux - Newbie 1 01-13-2011 08:05 PM
[SOLVED] Viewing Multiple Files with the less command Hi_This_is_Dev Linux - General 3 04-14-2010 05:02 PM
Command to join text files with headers chips11 Linux - Newbie 3 11-19-2008 03:43 PM
Unable to join domain using Net Join command in FC3 client jeb083079 Linux - Networking 9 07-30-2007 02:41 AM
join multiple files onewhoknows Linux - Software 10 06-01-2004 06:51 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration