LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-22-2011, 03:10 AM   #1
vjramana
Member
 
Registered: Sep 2009
Posts: 89

Rep: Reputation: 0
Copy and replacing specific line from file1 to file2 line by line


I have two files, file1.traj and file2.traj. Both these files contain identical data and the data are arranged in same format in them. The first line of both files is a comment.

At line 7843 of both files there is a cartesian coordinate X, Y and Z ( three digits ). And at line 15685 there is another three digits. The number of lines in between two cartesian coordinates are 7841. And there are few hundreds of thousands of lines in a file.

What I need to do is copy the X Y Z coordinate (three digits) from file1.traj at line 7843 and paste into file2.traj at the same line number as in file1.traj. The next line will be 15685 from file1.traj and replace at line 15685 at file2.traj. And I dont want other lines (data) in file2.traj get altered. This sequence shall be going on until the end of the file. Means copy and substitude the selected lines from file1.traj into file2.traj.

I tried to use paste command but I cant do for specified line alone.

Here i showed the data format in the file. I used the line number for clarity purpose.

Code:
line.1    trajectory generated by ptraj
line.2       5.844   4.178   7.821   6.423   4.054   8.578   6.606   4.907   6.827   7.557
line.3       4.385   6.722   6.877   6.384   7.283   5.950   6.884   7.565   7.668   6.282
line.2       8.474   7.721   7.127   8.928   7.628   7.205   6.259   8.589   6.712   6.110
line.3       7.712   8.602   6.643   8.151   8.654   7.495   6.940   7.183   4.871   6.108
line.4       7.887   4.864   7.755   7.814   3.754   8.697   7.267   3.724   7.081   7.633
line.5       2.478   6.246   8.089   2.604   8.026   8.853   3.943   6.623   5.754   4.529
    .
    .
    .
    .           1.516  41.749  54.260   0.108  41.176  54.536  -0.626  40.627  53.818  -0.303
    .          41.920  42.179   3.556   3.251  41.623   3.530   2.472  42.558   2.678   3.304
    .          44.723   1.496   5.937  44.339   1.355   6.803  44.866   0.614   5.593  52.401
line.7842      86.323   2.974  52.385  85.816   3.785  51.879  85.808   2.359
line.7843     104.140 159.533  88.303
line.7844       4.792   5.052   8.317   5.279   4.463   8.898   5.663   5.341   7.220   6.267
line.7845       4.438   7.137   6.477   6.566   7.627   5.857   7.407   7.936   7.301   6.170
    .           8.741   7.647   7.020   9.023   7.315   7.107   6.475   8.171   6.435   6.413
    .           7.823   8.416   6.704   8.208   8.473   7.582   6.560   7.126   5.141   5.816
    .
    .
    .
    .          52.050   7.905  42.026  38.561   1.747  39.847  39.375   2.235  39.972  38.634
    .           1.382  38.965   0.810   0.477  39.394   0.717  -0.349  39.867   0.222   1.081
    .          39.847  43.073   5.033   2.756  43.387   5.428   1.942  42.256   4.598   2.511
line.15683     47.302   4.261   7.071  47.801   4.632   7.799  47.256   4.968   6.428  54.279
line.15684      0.498   3.477  53.964   0.612   2.580  53.500   0.612   4.021
line.15685    104.140 159.533  88.303
line.15686      4.970   4.868   7.979   5.342   4.250   8.612   5.988   5.450   7.184   6.903
line 15687      4.861   7.246   6.381   6.921   7.550   5.526   7.597   7.536   6.953   7.009
    .
    .
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 03-22-2011, 03:36 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
If the lines containing the XYZ coordinates are the only ones with three numbers, you can try to retrieve them along with the line number using grep:
Code:
grep -En '^[ ]*[0-9.]+[ ]+[0-9.]+[ ]+[0-9.]+[ ]*$' file1.traj
This takes in account leading and trailing spaces (if any) and any number of spaces between numbers. It assumes there are not tabs instead of spaces, otherwise use the generic pattern [:space:].

Once you've retrieved this information you can easily use sed with the c command to replace a specific line. Putting all together in a loop:
Code:
while read number line
do
  number=${number/:/}
  echo sed -i "${number}c ${line}" file2.traj
done < <(grep -En '^[ ]*[0-9.]+[ ]+[0-9.]+[ ]+[0-9.]+[ ]*$' file1.traj)
The echo statement is just for testing purposes, whereas the -i option of sed will edit the file in place. After having tested the loop works properly, remove the echo and run again. In any case keep a backup copy of the original file. You never know...!
 
Old 03-25-2011, 05:02 AM   #3
vjramana
Member
 
Registered: Sep 2009
Posts: 89

Original Poster
Rep: Reputation: 0
Thanks so much.
This script works fine as I wanted.
But there is only one thing lacking. That is about the position of the data.

The decimal points are not aligned straight. The replaced data should be pushed two spaces to the right hand site.

I trying to figure out this but in vein.

Regards
Vijay
 
Old 03-25-2011, 06:11 AM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Correct. The read statement uses the white space as field delimiter so that any leading space is removed from the line. If I interpret things correctly, the problem is to retain the original format of the XYZ line with leading blank spaces (if any), right?

In this case you have to change the IFS variable (see man bash for details) that is the Input Field Separator. This is actually mandatory to get the correct results, since if the XYZ line does not contain leading spaces, the line number is not read properly from the grep's output (I just didn't notice this before). In other words, suppose the grep command give something like:
Code:
7843:104.140 159.533  88.303
15685:  104.140 159.533  88.303
the read statement that uses blank space as delimiter will get:
Code:
number="7843:104.140" line="159.533  88.303"
number="15685:"       line="104.140 159.533  88.303"
respectively. Instead if the delimiter is : (colon) it will get:
Code:
number="7843"         line="104.140 159.533  88.303"
number="15685"        line="  104.140 159.533  88.303"
which is what we want. Another problem arises: the c command of sed removes the leading spaces unless you put a backslash in front of the line. But the line is referenced as a shell variable so that the first character after \ would be $. In this case the $ will be escaped and it will be interpreted literally resulting in a wrong substitution. For this reason you have to escape the backslash with another backslash.

Sorry for the confusion. It's not easy to explain clearly. Anyway, this is the code:
Code:
OLD_IFS="$IFS"
IFS=":"

while read number line
do
  sed -i "${number}c \\${line}" file2.traj
done < <(grep -En '^[ ]*[0-9.]+[ ]+[0-9.]+[ ]+[0-9.]+[ ]*$' file1.traj)

IFS="$OLD_IFS"
Cheers!
 
2 members found this post helpful.
Old 03-26-2011, 12:59 AM   #5
vjramana
Member
 
Registered: Sep 2009
Posts: 89

Original Poster
Rep: Reputation: 0
Dear Colucix,
Your solution is perfect. It is working exactly how want it.
Thanks so much for your kind.

Cheers
 
Old 03-26-2011, 06:09 PM   #6
vjramana
Member
 
Registered: Sep 2009
Posts: 89

Original Poster
Rep: Reputation: 0
Dear Sir,

I have additional issue to ask related to the coding above.

The file that I am operating nearly has got 7,842,000 lines. Means the replacement has to taken place every 7842 lines and it should go 1000 times. When I calculate the time taken to do this job, it is around 17 hours in supercomputer.

So I wonder if there is possible to alter this code to speed up the process.

Is that possible to extract only the coordinates (lines with three numbers) from file1.traj into separate file (lets say coordinate.txt) and use the data from this file to substitute the same line in the file2.traj?
 
Old 03-26-2011, 06:37 PM   #7
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Awk should be much faster than a shell loop. Could you check (with a shorter files, say just 78421 lines) if this does what you want? (I tested it with a dictionary file, so I do believe it should work correctly.)
Code:
awk -v "other=file1.traj" '
    BEGIN {
        split("", replacement)
        r = 0
        while ((getline line < other) > 0) {
            r++
            if ((r > 1) && (r % 7842 == 1))
                replacement[r] = line
        }
    }

    {
        if ((NR > 1) && (NR % 7842 == 1))
            print replacement[NR]
        else
            print
    }
' file2.traj > new.traj
Note that this script will first read thorough the entire file1.traj file (keeping only the replacement lines in memory). When using huge data files, it'll take a while before it starts saving data to new.traj.

Last edited by Nominal Animal; 03-26-2011 at 06:41 PM. Reason: Added missing parentheses around the modulus comparisons.
 
1 members found this post helpful.
Old 03-26-2011, 07:38 PM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
And here's a C program you can use, if you really have that large input files. It's probably faster than any scripting version. (It reads the input files in parallel, too, so there is no delay in output.)
Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>

int main(int argc, char *argv[])
{
    char     buffer1[65536];
    char     buffer2[65536];
    char    *line1;
    char    *line2;
    FILE    *in1;
    FILE    *in2;

    long     headerlines, lines1, lines2;
    long     lines, total;

    int      status = 0;
    char     dummy;

    if (argc != 6) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
        fprintf(stderr, "       %s <header> <copy1> <copy2> file1 file2 [ > output ]\n", argv[0]);
        fprintf(stderr, "Where\n");
        fprintf(stderr, "       <header>    is the number of initial header lines,\n");
        fprintf(stderr, "       <copy1>     is the number of lines to copy from file1,\n");
        fprintf(stderr, "       <copy2>     is the number of lines to copy from file2.\n");
        fprintf(stderr, "This program reads lines from file1 and file2 in parallel.\n");
        fprintf(stderr, "First, <header> lines are copied from file1 to output.\n");
        fprintf(stderr, "Then, for as long as there is input in file1 and file2,\n");
        fprintf(stderr, "<copy1> lines are copied from file1, then <copy2> lines from file2.\n");
        fprintf(stderr, "The output will end whenever either file1 or file2 runs out.\n");
        fprintf(stderr, "If there is still lines in one but not other, a warning is printed.\n");
        fprintf(stderr, "\n");
        return 1;
    }

    if (sscanf(argv[1], "%ld %c", &headerlines, &dummy) != 1) {
        fprintf(stderr, "%s: Invalid number of initial header lines.\n", argv[1]);
        return 1;
    }
    if (headerlines < 0L) {
        fprintf(stderr, "%s: Invalid number of initial header lines.\n", argv[1]);
        return 1;
    }

    if (sscanf(argv[2], "%ld %c", &lines1, &dummy) != 1) {
        fprintf(stderr, "%s: Invalid number of lines from %s.\n", argv[2], argv[4]); 
        return 1;
    }
    if (lines1 < 1L) {
        fprintf(stderr, "%s: Invalid number of lines from %s.\n", argv[2], argv[4]); 
        return 1;
    }

    if (sscanf(argv[3], "%ld %c", &lines2, &dummy) != 1) {
        fprintf(stderr, "%s: Invalid number of lines from %s.\n", argv[3], argv[5]); 
        return 1;
    }
    if (lines2 < 1L) {
        fprintf(stderr, "%s: Invalid number of lines from %s.\n", argv[3], argv[5]); 
        return 1;
    }

    in1 = fopen(argv[4], "rb");
    if (!in1) {
        char const *const error = strerror(errno);
        fprintf(stderr, "%s: %s.\n", argv[4], error);
        return 1;
    }

    in2 = fopen(argv[5], "rb");
    if (!in2) {
        char const *const error = strerror(errno);
        fprintf(stderr, "%s: %s.\n", argv[5], error);
        fclose(in1);
        return 1;
    }

    total = 0L;

    lines = headerlines;
    while (lines > 0L) {
        line1 = fgets(buffer1, sizeof(buffer1), in1);
        line2 = fgets(buffer2, sizeof(buffer2), in2);
        if (!line1 || !line2)
            break;

        lines--;
        total++;
        fputs(line1, stdout);
    }

    while (line1 && line2) {

        lines = lines1;
        while (line1 && line2 && lines > 0L) {
            line1 = fgets(buffer1, sizeof(buffer1), in1);
            line2 = fgets(buffer2, sizeof(buffer2), in2);
            if (!line1 || !line2)
                break;

            lines--;
            total++;
            fputs(line1, stdout);
        }
        if (lines != 0L || !line1 || !line2)
            break;

        lines = lines2;
        while (line1 && line2 && lines > 0L) {
            line1 = fgets(buffer1, sizeof(buffer1), in1);
            line2 = fgets(buffer2, sizeof(buffer2), in2);
            if (!line1 || !line2)
                break;

            lines--;
            total++;
            fputs(line2, stdout);
        }
        if (lines != 0L || !line1 || !line2)
            break;
    }

    if (ferror(in1))
        fprintf(stderr, "%s: Read error.\n", argv[4]);
    if (ferror(in2))
        fprintf(stderr, "%s: Read error.\n", argv[5]);
    if (ferror(in1) || ferror(in2)) {
        fclose(in1);
        fclose(in2);
        return 1;
    }

    if (line1 || !feof(in1)) {
        fprintf(stderr, "Warning: %s had excess lines.\n", argv[4]);
        status |= 2;
    }
    if (fclose(in1)) {
        char const *const error = strerror(errno);        
        fprintf(stderr, "Warning: %s: %s.\n", argv[4], error);
        status |= 4;
    }

    if (line2 || !feof(in2)) {
        fprintf(stderr, "Warning: %s had excess lines.\n", argv[5]);
        status |= 8;
    }
    if (fclose(in2)) {
        char const *const error = strerror(errno);
        fprintf(stderr, "Warning: %s: %s.\n", argv[5], error);
        status |= 16;
    }

    return status;
}
Save the above code as e.g. mergelines.c, and compile it using e.g.
Code:
gcc -Wall -O3 -o mergelines mergelines.c
Run ./mergelines to see the usage. To duplicate the function of my awk script, run
Code:
./mergelines 1 7841 1 file2.traj file1.traj > new.traj
Hope you find this useful.
 
Old 03-28-2011, 02:56 AM   #9
vjramana
Member
 
Registered: Sep 2009
Posts: 89

Original Poster
Rep: Reputation: 0
Thank you so much for your kind.

I tried the awk script so beneficent. It took about just 10 minutes to convert 10 files each with around 7 million lines of data.
Seems awk is so powerful. How I could get a grip on this language? Could you suggest any website which give good explanation on awk?

Regards

Vijay
 
Old 03-28-2011, 07:44 AM   #10
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by vjramana View Post
Thank you so much for your kind.

I tried the awk script so beneficent. It took about just 10 minutes to convert 10 files each with around 7 million lines of data.
Seems awk is so powerful. How I could get a grip on this language? Could you suggest any website which give good explanation on awk?
You're welcome!

I personally use The GNU Awk User Manual a lot when writing awk scripts.
I'd recommend first reading the Getting started section, then starting by writing some test scripts or scripts you already need or use for your data manipulation, and looking at the manual for interesting functions to use. I've especially found the Built-in variables section and the String functions section quite informative. Also, picking apart the awk scripts you find here might be fun.

Note that GNU awk (gawk) is more powerful than most other awk implementations, since it contains additional functions for e.g. sorting which other awks do not have. (If you read the GNU awk manual carefully, it does say which features are standard and which are gawk extensions.)

Then, when you feel a bit more comfortable, start looking at the examples in the manual. They are well explained, although a bit complex. I'd say they are more useful when you already are comfortable with writing simple awk scripts that modify or create data files.

Hope this helps.
 
Old 03-28-2011, 07:49 AM   #11
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 952

Rep: Reputation: 217Reputation: 217Reputation: 217
If in India, you can buy "The UNIX programming environment" by Brian Kernighan and Rob Pike. It has got a very good general introduction to Unix (and awk) and is available in book stores.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] adding line from file1 into a line of another file based on maching IDs rossk Programming 6 01-06-2011 12:06 AM
merge file1 at end of line file2 porkcharsui Linux - Newbie 2 03-23-2010 05:27 AM
[SOLVED] Filter through line/s to grab specific fields/data in the line with example shayno90 Linux - Newbie 11 10-14-2009 11:51 AM
[SOLVED] SED and Replacing Specific Line bridrod Linux - Newbie 6 08-24-2009 12:28 PM
php - Read file line by line and change a specific line. anrea Programming 2 01-28-2007 01:43 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration