LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-25-2018, 08:14 PM   #1
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 11

Rep: Reputation: Disabled
Appending the columns in multiple files using awk!!1


Hi all,
I have multiple files with .profile extension with the six columns each out of which first three columns are the same in each file. I want to make a single file having the first three similar columns and then extracting the last column from each of the ".profile" file and appending it to the output file. I also want to name the columns in the output file as the same as the filename. For example I have two files named "Artery_Aorta.ENSG00000000460.12.wgt.RDat.txt.profile" and "Artery_Aorta.ENSG00000000971.11.wgt.RDat.txt.profile". The content of both the files are
Code:
FID       IID  PHENO    CNT   CNT2    SCORE
  00010   0001002      2     28      9 -0.00843036
  00017   0001702      2     28      9 0.00710286
Code:
FID       IID  PHENO    CNT   CNT2    SCORE
  00010   0001002      2     12      2 -0.00285
  00017   0001702      2     12      2 -0.00285
I want my output file to be like
Code:
FID       IID  PHENO   ENSG00000000460.12 ENSG00000000971.11
  00010   0001002      2     -0.00843036      -0.00285
  00017   0001702      2      0.00710286       -0.00285
In short, my output file will contain the first three columns (once) similar for each file and the last "score" column in the original files to be renamed by the file name (for example, the file name is "Artery_Aorta.ENSG00000000460.12.wgt.RDat.txt.profile", the column name in the output file for it's score should be ENSG00000000460.12) in the output file.

I have tried the awk print command but instead of appending the columns to the output file it prints the score in the newline.
 
Old 11-25-2018, 08:47 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,133

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
That's what "print" is defined to do - look at "printf".
 
Old 11-25-2018, 11:43 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Please show what code you have tried so far? You can use the print or printf commands, the only tricky bit would be extracting the correct part of the filename
 
Old 11-26-2018, 12:40 AM   #4
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 11

Original Poster
Rep: Reputation: Disabled
Till now I have tried

Code:
paste *.profile | awk '{print $1, $2, $3, $6, $6 + 6}'
The output is
Code:
FID IID PHENO SCORE 6
00010 0001002 2 -0.00843036 5.99157
00017 0001702 2 0.00710286 6.0071
00028 0002801 2 -0.00125893 5.99874
00033 0003301 1 0.0006 6.0006
As the first three columns are same in each file, I want to print the sixth column of each of the file captured by "paste *.profile". but in this case it is adding "6" to the value of the previous 6th column score. Moreover, I also want to replace the name of the score column to the name of the file as mentioned in my original query.

Last edited by Aemm; 11-26-2018 at 03:34 AM.
 
Old 11-26-2018, 05:31 AM   #5
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Here is one solution:
Code:
counter=1;
for file in *.profile; do if (( counter==1 )) then awk 'NR==1{split(FILENAME,file_name,"."); print $1,$2,$3,file_name[2]"."file_name[3]; next} {print $1,$2,$3,$6}' $file >>"${file}_awkout"; (( counter++ )); else awk 'NR==1{split(FILENAME,file_name,".");print file_name[2]"."file_name[3]; next} {print $6};' $file >>"${file}_awkout" ; fi; done;
paste *.profile_awkout;

Last edited by l0f4r0; 11-26-2018 at 05:38 AM.
 
Old 11-26-2018, 05:49 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Not sure on your logic?? The output is exactly as you have asked for:

For every line return fields 1, 2, 3 and 6 and additionally add the number 6 to the entry in the last column

So your header line returns the required fields and as the string 'SCORE' is not a number it will equate to zero and have 6 added to it, hence the number 6 in last field (consider investigating NR / FNR values to fix this)

As for the lines with values, i think you will find that the first four fields returned are the correct ones and the last field has the got 6 added to it:
Code:
-0.00843036 + 6 = 5.99157

0.00710286  + 6 = 6.0071

-0.00125893 + 6 = 5.99874

0.0006      + 6 = 6.0006
You would need to explain how this differs from what you want?
 
Old 11-26-2018, 06:24 AM   #7
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by Aemm View Post
Till now I have tried
Code:
paste *.profile | awk '{print $1, $2, $3, $6, $6 + 6}'
I have already provided you with a solution but considering your post above, I think your logic wants something like below instead:
Code:
for file in *.profile; do colTempName=${file#*\.}; sed -ibak "1s/SCORE/${colTempName%%\.wgt*}/" $file; done;
paste *.profile | awk '{ printf "%s\t%s\t%s\t",$1,$2,$3; for(i=6;i<=NF;i+=6) {printf "%s\t",$i} printf "\n"}'
PS: it would be simpler if one could select $columns in awk to be displayed thanks to arithmetic expressions (the equivalent in bash for {6..NF..6}) but I don't know if it's possible. If somebody has the answer, I'm really interested!

Last edited by l0f4r0; 11-26-2018 at 06:37 AM.
 
1 members found this post helpful.
Old 11-26-2018, 07:49 PM   #8
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 11

Original Poster
Rep: Reputation: Disabled
Code:
for file in *.profile; do colTempName=${file#*\.}; sed -ibak "1s/SCORE/${colTempName%%\.wgt*}/" $file; done;
paste *.profile | awk '{ printf "%s\t%s\t%s\t",$1,$2,$3; for(i=6;i<=NF;i+=6) {printf "%s\t",$i} printf "\n"}'
The above code is working perfectly but there is one problem, it is generating the copy of each input file again having the extension of .profilebak? I don't want this as I have thousands of file upon which I will be running the code.

Quote:
You would need to explain how this differs from what you want?
Thank you Grail for the reply. Actually I wanted every sixth column of the input file to be printed (Not the addition of 6 in the 6th column values).

Last edited by Aemm; 11-26-2018 at 08:07 PM.
 
Old 11-27-2018, 01:42 AM   #9
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 854

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by Aemm View Post
Code:
for file in *.profile; do colTempName=${file#*\.}; sed -ibak "1s/SCORE/${colTempName%%\.wgt*}/" $file; done;
paste *.profile | awk '{ printf "%s\t%s\t%s\t",$1,$2,$3; for(i=6;i<=NF;i+=6) {printf "%s\t",$i} printf "\n"}'
The above code is working perfectly but there is one problem, it is generating the copy of each input file again having the extension of .profilebak? I don't want this as I have thousands of file upon which I will be running the code.
Yes, indeed, it was intended because as I decided to modify the input files (which was not explicitly part of your requirement), I didn't know if it was disturbing for you so I created backups with switch -i, see man:
Quote:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied). The default operation mode is to break symbolic and hard links. This can be changed with --follow-symlinks and --copy.
If you are OK to change the original files without any previous backup, then delete "bak" in sed (just keep "-i").

Last edited by l0f4r0; 11-27-2018 at 01:44 AM.
 
1 members found this post helpful.
Old 11-27-2018, 06:05 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Assuming gawk is your variant of awk you could do something like:
Code:
#!/usr/bin/awk -f

BEGIN{ newfile = "newfile" }

BEGINFILE{
  split(FILENAME, arr, ".")
  file = arr[2]"."arr[3]
  
  getline line < newfile
}

FNR == 1{ $6 = file }

{
  if(line){
    print line, $6 > newfile
    getline line < newfile
  }
  else
    print $1,$2,$3,$6 > newfile
  
}

ENDFILE{ close(newfile) }
Obviously you can use a better name for the output file

Once in a file you would run this as:
Code:
./script_name.awk *.profile
 
1 members found this post helpful.
Old 11-29-2018, 07:47 PM   #11
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 11

Original Poster
Rep: Reputation: Disabled
Thankyou Grail and l0f4r0. I have got my job done. Much appreciated!!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Merging columns from different files and renaming columns lharrisl Linux - Newbie 7 05-23-2016 07:36 AM
merge columns from multiple files in a directory based on match of two columns prasanthi yanamala Linux - Newbie 2 11-12-2015 10:11 AM
extracting columns from multiple files with awk orcaja Linux - Newbie 7 02-14-2012 10:24 PM
[SOLVED] AWK: add columns while keep format for other columns cristalp Programming 3 10-13-2011 06:14 AM
Appending a string to columns in a file raghu123 Programming 2 08-29-2008 01:19 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration