Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
The numbers of lines in file1 and file2 are not equal.
I want to get an output file like
file3: 6 fields, and the first two fields are exactly the same as the first two fields in file1. For example, the line with the first two field "1 123" has a match in file2: "1 123 0 1 0 0", then print the whole line in file2:"1 123 0 1 0 0" to file3. If one line in file1 does not have a match in file2, e.g. "1 125", then print "1 125 0 0 0 0" to file3.
I am wondering if this can be done using awk or join or any other in linux? Since the files are very large, I really want it to be fast. Thanks a lot~~~
Note: Field 2 in both file1 and file2 has only number values, but field 1 in both files may have characters too. The two fields are sorted. And in both files, this kind of situation will not happen, no duplicates.
Also we do not need to consider the lines in file2 which do not have any match in file1, for example "1 126 2 1 0 0" (no match in file1), then this line should not be added to file3.
If it were me, I'd write a script to leap frog through the files. Since you know the two files are sorted and uniqued, this would be ideal in my opinion.
The basic process would be:
0) Read a line from file1
1) Read through file2 until you match or exceed the last read value from file1
2) If the first two fields in the two lines match, then copy the line from file2 into file3
3) Read through file1 until you match or exceed the last read value from file2 (assuming #2 failed)
4) If the first two fields in the two lines match, then copy the line from file2 into file3
5) Go back to #1
If the files aren't too large to be held in RAM, then you can simply load one file into an array, using fields one and two as the index values, then use that to print out the matching lines from the other file.
No, probably not, since it has to store the whole file in memory first. You'll have to use an algorithm like suicidaleggroll posted.
Unfortunately though that's getting a bit beyond my ability in awk. I'm not that experienced in multi-file processing with it. Perhaps grail will come along soon and show us how it's done.
It would probably be even better to use a languag like perl, but again that's something I don't know much of. I'm mostly just a bash person at this stage. I could write it up as a shell script, but that would be dog slow, and probably take hours to process.