LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-13-2016, 10:17 AM   #1
lharrisl
LQ Newbie
 
Registered: May 2016
Posts: 3

Rep: Reputation: Disabled
Merging columns from different files and renaming columns


Hi all,

I have ~ 10 files containing ~ 5000 rows of values (these are under tracking_id) for 2 samples (q1_0 and q2_0) i.e. data for 2 samples in each file - a total of 20 samples. If I use the head command for one of these files you can get an idea of what they look like:

tracking_id q1_0 q2_0
PF3D7_0100100 20.0472 12.8313
PF3D7_0100200 31.0172 56.0184
PF3D7_0100300 0.598181 174.531
PF3D7_0100400 1.47331 84.0354
PF3D7_0100500 0 0
PF3D7_0100600 0 0.549631
PF3D7_0100700 0 0
PF3D7_0100800 0 0.145775
PF3D7_0100900 0.582414 2.27694

Now I have 10 files each file containing these 5000 values for 2 samples so I have 20 samples (columns) in total. What I want to be able to do is merge all these 10 files together into one file so that the file looks like this:

tracking_id q1_0(from file1) q2_0(from file1) q1_0(from file2) q2_0(from file2) ......
PF3D7_0100100
PF3D7_0100200
PF3D7_0100300
PF3D7_0100400
PF3D7_0100500
PF3D7_0100600
PF3D7_0100700
PF3D7_0100800
PF3D7_0100900

iI want the tracking_id for the 5000 values to only appear once in the first column then I want the corresponding values for each of the 20 samples (which are currently split across 10 files). At the moment every file says q1 and q2 at the top of the columns and I have them stored in different subdirectories which list what the actual samples are. What I would like to do is rename the column headers as well so that the file looks like this:

tracking_id Sample1 Sample2 Sample3 Sample4.....Sample 20
PF3D7_0100100
PF3D7_0100200
PF3D7_0100300
PF3D7_0100400
PF3D7_0100500
PF3D7_0100600
PF3D7_0100700
PF3D7_0100800
PF3D7_0100900

Could someone please tell me how to achieve this? I can use the command paste to append all the columns together but then I end up with the tracking_id column repeated many times in the file.

Many thanks for any help!

Lynne
 
Old 05-13-2016, 10:29 AM   #2
beroal
Member
 
Registered: Feb 2016
Location: Ukraine
Distribution: ArchLinux
Posts: 49

Rep: Reputation: Disabled
What you are trying to achieve sounds like a task for a SQL database or the R language.
 
Old 05-13-2016, 11:14 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,506
Blog Entries: 3

Rep: Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814
Quote:
Originally Posted by lharrisl View Post
... merge all these 10 files together into one file so that the file looks like this...
The examples look incomplete. Do you mean that each line should start with the tracking id (e.g. PF3D7_0100100) and then be followed with the corresponding 20 additional columns each containing data?

Code:
tracking_id             q1_0(from file1)         q2_0(from file1)  q1_0(from file2)         q2_0(from file2) ......
PF3D7_0100100   20.0472 12.8313  30.0533 13.888 ... and so on
If so, then that is very easy in perl with a hash of arrays. To do that, I'd read the files in order and use the tracking id as the hash key then the push() the q1_0 and q2_0 onto it, for each file, then print the whole thing out.
 
Old 05-13-2016, 12:02 PM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,506
Blog Entries: 3

Rep: Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814Reputation: 3814
Quote:
Originally Posted by lharrisl View Post
I can use the command paste to append all the columns together but then I end up with the tracking_id column repeated many times in the file.
On more close reading, if 'paste' does most of what you need, then perl would be too much and you can use 'awk' to skip the unnecessary columns.

Another option is to read the result of 'paste' into LibreOffice Calc, delete the unnecessary columns and then re-export it to Text CSV. Just be sure to check all the boxes for possible separator options, including space, when importing.
 
Old 05-13-2016, 12:20 PM   #5
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,444

Rep: Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786
You could simply cut the unwanted columns from the result of paste.
Code:
paste -d " " <input files> | cut -d " " --complement -f $(seq -s "," 4 3 28) > <output file>
This assumes that the input files are delimited by a space character.
You would still need to change your column headings.

Last edited by allend; 05-13-2016 at 12:23 PM.
 
Old 05-13-2016, 05:29 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017

Rep: Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197Reputation: 3197
If it is using columns I would probably just use awk.
 
Old 05-13-2016, 10:12 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,237

Rep: Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150
Try join instead of paste - in your case won't even need any flags. It's even possible to pipe the output to another join
Code:
join file1 file2 | join - file3
KISS
 
2 members found this post helpful.
Old 05-23-2016, 07:36 AM   #8
lharrisl
LQ Newbie
 
Registered: May 2016
Posts: 3

Original Poster
Rep: Reputation: Disabled
The join command worked very well and then I used R to rename the columns. Thank you everyone for your help!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
merge columns from multiple files in a directory based on match of two columns prasanthi yanamala Linux - Newbie 2 11-12-2015 10:11 AM
[SOLVED] bash suggestions to convert horizontal columns to vertical columns Guyverix Programming 14 01-24-2013 11:03 AM
SQL statements howto -- 3 columns input but 2 columns output fhleung Programming 3 11-29-2012 10:45 AM
[SOLVED] AWK: add columns while keep format for other columns cristalp Programming 3 10-13-2011 06:14 AM
[SOLVED] merging columns from different files leonardo2887 Linux - Newbie 4 10-11-2010 02:28 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration