LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-20-2010, 06:19 AM   #1
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Rep: Reputation: 19
merge columns from multiple files


Hi all

I want to merge columns (selectively) from several files and create a new file with the merge output.

I saw some suggestions to use pr/paste to join the columns and then awk to pick-up the columns.
Code:
pr -m -t -s\  file1 file2 | gawk '{print $4,$5,$6,$1}'
But I have hundreds of files and I cannot manually pick up columns using awk as given in above solution. I want to pick up every 2nd column of each of the source file.

Can someone suggest a automated way?

Regards
 
Old 12-20-2010, 06:33 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Depending on what you mean by column, colrm might be useful.
 
Old 12-20-2010, 06:44 AM   #3
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Original Poster
Rep: Reputation: 19
Hi

I can extract the required column from each file using awk in a for loop (or colrm). But how can I join the result of each of the for loop iteration columnwise and write to a file?

Regards
Vijay
 
Old 12-20-2010, 06:49 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,246

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Well as usual an example would always help as we are shooting in the dark here, but how about just:
Code:
awk '{print $2}' file1 file2 file3 ...
 
Old 12-20-2010, 06:55 AM   #5
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Got it! The code in your OP isn't an attempt to solve the problem, it's colucix's illustration of a general technique, copied from his post.

Do I now rightly understand that you want to copy the "second space-separated column"s from a set of hundreds of files into a the output file? Do all the files have the same number of lines?
 
Old 12-20-2010, 07:05 AM   #6
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Original Poster
Rep: Reputation: 19
Quote:
Originally Posted by grail View Post
Well as usual an example would always help as we are shooting in the dark here, but how about just:
Code:
awk '{print $2}' file1 file2 file3 ...
Hello grail

Code:
awk '{print $2}' file1 file2 file3 ...
does not concatenate the output columns columnwise. One column is printed "below" the others so it is just one vertical line of columns.

I want the final output to be
A1 B1 C1...
A2 B2 C2...
A3 B3 C3...

If the original files had
x A1 y z
x A2 y z
x A3 y z

x B1 y z
x B2 y z
x B3 y z

x C1 y z
x C2 y z
x C3 y z

and so on...
 
Old 12-20-2010, 07:07 AM   #7
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Original Poster
Rep: Reputation: 19
Quote:
Originally Posted by catkin View Post
Got it! The code in your OP isn't an attempt to solve the problem, it's colucix's illustration of a general technique, copied from his post.

Do I now rightly understand that you want to copy the "second space-separated column"s from a set of hundreds of files into a the output file? Do all the files have the same number of lines?
yes the number of lines are same right now.

But I may have to deal with variable number of lines at a later stage.

Thanks.
 
Old 12-20-2010, 07:09 AM   #8
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
How many lines in the hundreds of files?
 
Old 12-20-2010, 07:10 AM   #9
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Original Poster
Rep: Reputation: 19
I am considering this

This may work...
 
Old 12-20-2010, 07:12 AM   #10
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Original Poster
Rep: Reputation: 19
I am also considering writing rows instead of columns and then transposing the whole thing later. writing rows is easy
 
Old 12-20-2010, 07:26 AM   #11
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
If it does not require too much memory (I cannot test with hundreds of files, right now), what about something like this?
Code:
awk '{_[FNR]=(_[FNR] OFS $2)}END{for (i=1; i<=FNR; i++) {sub(/^ /,"",_[i]); print _[i]}}' file*
The sub function removes the extra OFS (space) at the beginning of each row.
 
Old 12-21-2010, 01:08 AM   #12
vijay_babu1981
Member
 
Registered: Nov 2004
Location: india
Distribution: fedora 10
Posts: 82

Original Poster
Rep: Reputation: 19
Hi colucix

This works fine, thanks. It is easy to understand also.

Now when I try to apply this to files with unequal number of rows I am facing problem.

If original files were:
x A1 y z
x A2 y z

x B1 y z
x B2 y z
x B3 y z

Output becomes
A1 B1
A2 B2
B3

I would prefer
A1 B1
A2 B2
[space]B3

I think this will be difficult as 'awk print' will not know how many blanks spaces to put before actually printing the variable. Anyway please let me know if you great guys have a solution for this also.

Regards

Last edited by vijay_babu1981; 12-21-2010 at 01:13 AM.
 
Old 12-21-2010, 04:34 AM   #13
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
This is more complex. The basic idea is to count how many fields have been previously added for each row number, and while the count of fields does not match the count of files (minus one) add spaces accordingly.

However, in order to have a correctly formatted table, I assumed the maximum length of the string in the second field to be 8 characters and every blank space added will be 8 characters length (see the format in the sprintf statements).

Here we go:
Code:
BEGIN {
  count = 1
}

FNR == 1 && FNR < NR {
  count++
}

{
  if ( _[FNR] ) {
    while ( c[FNR] != count-1 ) {
      _[FNR] = (_[FNR] " " sprintf("%8s",""))
      c[FNR]++
    }
    _[FNR] = (_[FNR] " " sprintf("%-8s",$2))
    c[FNR]++
  }
  else if ( count == 1 ) {
    _[FNR] = sprintf("%-8s",$2)
    c[FNR]++
  }
  else {
    while ( c[FNR] != count-1 ) {
      if ( _[FNR] )
        _[FNR] = (_[FNR] " " sprintf("%8s",""))
      else
        _[FNR] = sprintf("%8s","")
      c[FNR]++
    }
    _[FNR] = (_[FNR] " " sprintf("%-8s",$2))
    c[FNR]++
  }
}

END {
  for (i=1; i<=length(_); i++)
    print _[i]
}
If I have some files with a different number of rows, the output will be something like this:
Code:
$ awk -f test.awk file*
A1       B1       C1       D1       E1       F1       G1      
A2       B2       C2       D2       E2       F2       G2      
         B3                D3       E3       F3      
                           D4                F4      
                                             F5      
                                             F6
Feel free to ask for any clarification.

Last edited by colucix; 12-21-2010 at 07:52 AM. Reason: Added output example
 
Old 12-21-2010, 06:14 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,246

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Hey colucix, you inspired me to have a look at this one
Code:
{
    while( length(_[FNR]) < (ARGIND * 8))
        _[FNR]=_[FNR] sprintf("%8s","")

    _[FNR]=_[FNR] sprintf("%-8s",$2)
}

END{
    for(x=1; x < ARGC; x++)
        print _[x]
}
 
Old 12-21-2010, 06:35 AM   #15
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Wonderful, grail! It is the more compact version I was looking for, but I lost myself in the if/else condition approach. Just a little add-on to your code, to remove the extra spaces at the beginning of each line:
Code:
{
    while( length(_[FNR]) < (ARGIND * 8))
        _[FNR]=_[FNR] sprintf("%8s","")

    _[FNR]=_[FNR] sprintf("%-8s",$2)
}

END{
    for(x=1; x < ARGC; x++) {
        sub(/^ {8}/,"",_[x])
        print _[x]
    }
}
To enable the use of interval expressions we have to run awk with --re-interval, otherwise we can simply explicit the eight-spaces string in the regular expression:
Code:
sub(/^        /,"",_[x])
In alternative we might add a condition to check if _[FNR] already exist, but it will complicate the code making it similar to mine.

Well done, grail! You're awk skills have been improving a lot since I know you. +10 from me.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
extracting columns from multiple files with awk orcaja Linux - Newbie 7 02-14-2012 11:24 PM
merge multiple pdf files esteeven Linux - Software 8 11-21-2007 02:59 PM
awk command to merge columns from two separate files into single file? johnpaulodonnell Linux - Newbie 4 01-23-2007 11:10 AM
How To Merge multiple files into a single PDF ? kkempter Linux - Software 1 10-28-2005 02:02 PM
Is there a command to merge two files as two columns of one file? davee Linux - General 2 07-19-2005 11:52 AM


All times are GMT -5. The time now is 10:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration