Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
The numbers of lines in file1 and file2 are not equal.
I want to get an output file like
file3: 6 fields, and the first two fields are exactly the same as the first two fields in file1. For example, the line with the first two field "1 123" has a match in file2: "1 123 0 1 0 0", then print the whole line in file2:"1 123 0 1 0 0" to file3. If one line in file1 does not have a match in file2, e.g. "1 125", then print "1 125 0 0 0 0" to file3.
I am wondering if this can be done using awk or join or any other in linux? Since the files are very large, I really want it to be fast. Thanks a lot~~~
Note: Field 2 in both file1 and file2 has only number values, but field 1 in both files may have characters too. The two fields are sorted. And in both files, this kind of situation will not happen, no duplicates.
Also we do not need to consider the lines in file2 which do not have any match in file1, for example "1 126 2 1 0 0" (no match in file1), then this line should not be added to file3.
If it were me, I'd write a script to leap frog through the files. Since you know the two files are sorted and uniqued, this would be ideal in my opinion.
The basic process would be:
0) Read a line from file1
1) Read through file2 until you match or exceed the last read value from file1
2) If the first two fields in the two lines match, then copy the line from file2 into file3
3) Read through file1 until you match or exceed the last read value from file2 (assuming #2 failed)
4) If the first two fields in the two lines match, then copy the line from file2 into file3
5) Go back to #1
If the files aren't too large to be held in RAM, then you can simply load one file into an array, using fields one and two as the index values, then use that to print out the matching lines from the other file.
No, probably not, since it has to store the whole file in memory first. You'll have to use an algorithm like suicidaleggroll posted.
Unfortunately though that's getting a bit beyond my ability in awk. I'm not that experienced in multi-file processing with it. Perhaps grail will come along soon and show us how it's done.
It would probably be even better to use a languag like perl, but again that's something I don't know much of. I'm mostly just a bash person at this stage. I could write it up as a shell script, but that would be dog slow, and probably take hours to process.