Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
05-13-2016, 10:17 AM
|
#1
|
LQ Newbie
Registered: May 2016
Posts: 3
Rep:
|
Merging columns from different files and renaming columns
Hi all,
I have ~ 10 files containing ~ 5000 rows of values (these are under tracking_id) for 2 samples (q1_0 and q2_0) i.e. data for 2 samples in each file - a total of 20 samples. If I use the head command for one of these files you can get an idea of what they look like:
tracking_id q1_0 q2_0
PF3D7_0100100 20.0472 12.8313
PF3D7_0100200 31.0172 56.0184
PF3D7_0100300 0.598181 174.531
PF3D7_0100400 1.47331 84.0354
PF3D7_0100500 0 0
PF3D7_0100600 0 0.549631
PF3D7_0100700 0 0
PF3D7_0100800 0 0.145775
PF3D7_0100900 0.582414 2.27694
Now I have 10 files each file containing these 5000 values for 2 samples so I have 20 samples (columns) in total. What I want to be able to do is merge all these 10 files together into one file so that the file looks like this:
tracking_id q1_0(from file1) q2_0(from file1) q1_0(from file2) q2_0(from file2) ......
PF3D7_0100100
PF3D7_0100200
PF3D7_0100300
PF3D7_0100400
PF3D7_0100500
PF3D7_0100600
PF3D7_0100700
PF3D7_0100800
PF3D7_0100900
iI want the tracking_id for the 5000 values to only appear once in the first column then I want the corresponding values for each of the 20 samples (which are currently split across 10 files). At the moment every file says q1 and q2 at the top of the columns and I have them stored in different subdirectories which list what the actual samples are. What I would like to do is rename the column headers as well so that the file looks like this:
tracking_id Sample1 Sample2 Sample3 Sample4.....Sample 20
PF3D7_0100100
PF3D7_0100200
PF3D7_0100300
PF3D7_0100400
PF3D7_0100500
PF3D7_0100600
PF3D7_0100700
PF3D7_0100800
PF3D7_0100900
Could someone please tell me how to achieve this? I can use the command paste to append all the columns together but then I end up with the tracking_id column repeated many times in the file.
Many thanks for any help!
Lynne
|
|
|
05-13-2016, 10:29 AM
|
#2
|
Member
Registered: Feb 2016
Location: Ukraine
Distribution: ArchLinux
Posts: 49
Rep:
|
What you are trying to achieve sounds like a task for a SQL database or the R language.
|
|
|
05-13-2016, 11:14 AM
|
#3
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,506
|
Quote:
Originally Posted by lharrisl
... merge all these 10 files together into one file so that the file looks like this...
|
The examples look incomplete. Do you mean that each line should start with the tracking id (e.g. PF3D7_0100100) and then be followed with the corresponding 20 additional columns each containing data?
Code:
tracking_id q1_0(from file1) q2_0(from file1) q1_0(from file2) q2_0(from file2) ......
PF3D7_0100100 20.0472 12.8313 30.0533 13.888 ... and so on
If so, then that is very easy in perl with a hash of arrays. To do that, I'd read the files in order and use the tracking id as the hash key then the push() the q1_0 and q2_0 onto it, for each file, then print the whole thing out.
|
|
|
05-13-2016, 12:02 PM
|
#4
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,506
|
Quote:
Originally Posted by lharrisl
I can use the command paste to append all the columns together but then I end up with the tracking_id column repeated many times in the file.
|
On more close reading, if 'paste' does most of what you need, then perl would be too much and you can use 'awk' to skip the unnecessary columns.
Another option is to read the result of 'paste' into LibreOffice Calc, delete the unnecessary columns and then re-export it to Text CSV. Just be sure to check all the boxes for possible separator options, including space, when importing.
|
|
|
05-13-2016, 12:20 PM
|
#5
|
LQ 5k Club
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,444
|
You could simply cut the unwanted columns from the result of paste.
Code:
paste -d " " <input files> | cut -d " " --complement -f $(seq -s "," 4 3 28) > <output file>
This assumes that the input files are delimited by a space character.
You would still need to change your column headings.
Last edited by allend; 05-13-2016 at 12:23 PM.
|
|
|
05-13-2016, 05:29 PM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017
|
If it is using columns I would probably just use awk.
|
|
|
05-13-2016, 10:12 PM
|
#7
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,237
|
Try join instead of paste - in your case won't even need any flags. It's even possible to pipe the output to another join
Code:
join file1 file2 | join - file3
KISS
|
|
2 members found this post helpful.
|
05-23-2016, 07:36 AM
|
#8
|
LQ Newbie
Registered: May 2016
Posts: 3
Original Poster
Rep:
|
The join command worked very well and then I used R to rename the columns. Thank you everyone for your help!
|
|
|
All times are GMT -5. The time now is 10:45 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|