Merging columns from different files and renaming columns
Hi all,
I have ~ 10 files containing ~ 5000 rows of values (these are under tracking_id) for 2 samples (q1_0 and q2_0) i.e. data for 2 samples in each file - a total of 20 samples. If I use the head command for one of these files you can get an idea of what they look like: tracking_id q1_0 q2_0 PF3D7_0100100 20.0472 12.8313 PF3D7_0100200 31.0172 56.0184 PF3D7_0100300 0.598181 174.531 PF3D7_0100400 1.47331 84.0354 PF3D7_0100500 0 0 PF3D7_0100600 0 0.549631 PF3D7_0100700 0 0 PF3D7_0100800 0 0.145775 PF3D7_0100900 0.582414 2.27694 Now I have 10 files each file containing these 5000 values for 2 samples so I have 20 samples (columns) in total. What I want to be able to do is merge all these 10 files together into one file so that the file looks like this: tracking_id q1_0(from file1) q2_0(from file1) q1_0(from file2) q2_0(from file2) ...... PF3D7_0100100 PF3D7_0100200 PF3D7_0100300 PF3D7_0100400 PF3D7_0100500 PF3D7_0100600 PF3D7_0100700 PF3D7_0100800 PF3D7_0100900 iI want the tracking_id for the 5000 values to only appear once in the first column then I want the corresponding values for each of the 20 samples (which are currently split across 10 files). At the moment every file says q1 and q2 at the top of the columns and I have them stored in different subdirectories which list what the actual samples are. What I would like to do is rename the column headers as well so that the file looks like this: tracking_id Sample1 Sample2 Sample3 Sample4.....Sample 20 PF3D7_0100100 PF3D7_0100200 PF3D7_0100300 PF3D7_0100400 PF3D7_0100500 PF3D7_0100600 PF3D7_0100700 PF3D7_0100800 PF3D7_0100900 Could someone please tell me how to achieve this? I can use the command paste to append all the columns together but then I end up with the tracking_id column repeated many times in the file. Many thanks for any help! Lynne |
What you are trying to achieve sounds like a task for a SQL database or the R language.
|
Quote:
Code:
tracking_id q1_0(from file1) q2_0(from file1) q1_0(from file2) q2_0(from file2) ...... |
Quote:
Another option is to read the result of 'paste' into LibreOffice Calc, delete the unnecessary columns and then re-export it to Text CSV. Just be sure to check all the boxes for possible separator options, including space, when importing. |
You could simply cut the unwanted columns from the result of paste.
Code:
paste -d " " <input files> | cut -d " " --complement -f $(seq -s "," 4 3 28) > <output file> You would still need to change your column headings. |
If it is using columns I would probably just use awk.
|
Try join instead of paste - in your case won't even need any flags. It's even possible to pipe the output to another join
Code:
join file1 file2 | join - file3 |
The join command worked very well and then I used R to rename the columns. Thank you everyone for your help!
|
All times are GMT -5. The time now is 04:51 PM. |