Copy and replacing specific line from file1 to file2 line by line
I have two files, file1.traj and file2.traj. Both these files contain identical data and the data are arranged in same format in them. The first line of both files is a comment.
At line 7843 of both files there is a cartesian coordinate X, Y and Z ( three digits ). And at line 15685 there is another three digits. The number of lines in between two cartesian coordinates are 7841. And there are few hundreds of thousands of lines in a file. What I need to do is copy the X Y Z coordinate (three digits) from file1.traj at line 7843 and paste into file2.traj at the same line number as in file1.traj. The next line will be 15685 from file1.traj and replace at line 15685 at file2.traj. And I dont want other lines (data) in file2.traj get altered. This sequence shall be going on until the end of the file. Means copy and substitude the selected lines from file1.traj into file2.traj. I tried to use paste command but I cant do for specified line alone. Here i showed the data format in the file. I used the line number for clarity purpose. Code:
line.1 trajectory generated by ptraj |
If the lines containing the XYZ coordinates are the only ones with three numbers, you can try to retrieve them along with the line number using grep:
Code:
grep -En '^[ ]*[0-9.]+[ ]+[0-9.]+[ ]+[0-9.]+[ ]*$' file1.traj Once you've retrieved this information you can easily use sed with the c command to replace a specific line. Putting all together in a loop: Code:
while read number line |
Thanks so much.
This script works fine as I wanted. But there is only one thing lacking. That is about the position of the data. The decimal points are not aligned straight. The replaced data should be pushed two spaces to the right hand site. I trying to figure out this but in vein. Regards Vijay |
Correct. The read statement uses the white space as field delimiter so that any leading space is removed from the line. If I interpret things correctly, the problem is to retain the original format of the XYZ line with leading blank spaces (if any), right?
In this case you have to change the IFS variable (see man bash for details) that is the Input Field Separator. This is actually mandatory to get the correct results, since if the XYZ line does not contain leading spaces, the line number is not read properly from the grep's output (I just didn't notice this before). In other words, suppose the grep command give something like: Code:
7843:104.140 159.533 88.303 Code:
number="7843:104.140" line="159.533 88.303" Code:
number="7843" line="104.140 159.533 88.303" Sorry for the confusion. It's not easy to explain clearly. Anyway, this is the code: Code:
OLD_IFS="$IFS" |
Dear Colucix,
Your solution is perfect. It is working exactly how want it. Thanks so much for your kind. Cheers |
Dear Sir,
I have additional issue to ask related to the coding above. The file that I am operating nearly has got 7,842,000 lines. Means the replacement has to taken place every 7842 lines and it should go 1000 times. When I calculate the time taken to do this job, it is around 17 hours in supercomputer. So I wonder if there is possible to alter this code to speed up the process. Is that possible to extract only the coordinates (lines with three numbers) from file1.traj into separate file (lets say coordinate.txt) and use the data from this file to substitute the same line in the file2.traj? |
Awk should be much faster than a shell loop. Could you check (with a shorter files, say just 78421 lines) if this does what you want? (I tested it with a dictionary file, so I do believe it should work correctly.)
Code:
awk -v "other=file1.traj" ' |
And here's a C program you can use, if you really have that large input files. It's probably faster than any scripting version. (It reads the input files in parallel, too, so there is no delay in output.)
Code:
#include <stdio.h> Code:
gcc -Wall -O3 -o mergelines mergelines.c Code:
./mergelines 1 7841 1 file2.traj file1.traj > new.traj |
Thank you so much for your kind.
I tried the awk script so beneficent. It took about just 10 minutes to convert 10 files each with around 7 million lines of data. Seems awk is so powerful. How I could get a grip on this language? Could you suggest any website which give good explanation on awk? Regards Vijay |
Quote:
I personally use The GNU Awk User Manual a lot when writing awk scripts. I'd recommend first reading the Getting started section, then starting by writing some test scripts or scripts you already need or use for your data manipulation, and looking at the manual for interesting functions to use. I've especially found the Built-in variables section and the String functions section quite informative. Also, picking apart the awk scripts you find here might be fun. Note that GNU awk (gawk) is more powerful than most other awk implementations, since it contains additional functions for e.g. sorting which other awks do not have. (If you read the GNU awk manual carefully, it does say which features are standard and which are gawk extensions.) Then, when you feel a bit more comfortable, start looking at the examples in the manual. They are well explained, although a bit complex. I'd say they are more useful when you already are comfortable with writing simple awk scripts that modify or create data files. Hope this helps. |
If in India, you can buy "The UNIX programming environment" by Brian Kernighan and Rob Pike. It has got a very good general introduction to Unix (and awk) and is available in book stores.
|
All times are GMT -5. The time now is 08:13 PM. |