Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The numbers of lines in file1 and file2 are not equal.
I want to get an output file like
file3: 6 fields, and the first two fields are exactly the same as the first two fields in file1. For example, the line with the first two field "1 123" has a match in file2: "1 123 0 1 0 0", then print the whole line in file2:"1 123 0 1 0 0" to file3. If one line in file1 does not have a match in file2, e.g. "1 125", then print "1 125 0 0 0 0" to file3.
I am wondering if this can be done using awk or join or any other in linux? Since the files are very large, I really want it to be fast. Thanks a lot~~~
Note: Field 2 in both file1 and file2 has only number values, but field 1 in both files may have characters too. The two fields are sorted. And in both files, this kind of situation will not happen, no duplicates.
Code:
1 123
1 123
...
Also we do not need to consider the lines in file2 which do not have any match in file1, for example "1 126 2 1 0 0" (no match in file1), then this line should not be added to file3.
If it were me, I'd write a script to leap frog through the files. Since you know the two files are sorted and uniqued, this would be ideal in my opinion.
The basic process would be:
0) Read a line from file1
1) Read through file2 until you match or exceed the last read value from file1
2) If the first two fields in the two lines match, then copy the line from file2 into file3
3) Read through file1 until you match or exceed the last read value from file2 (assuming #2 failed)
4) If the first two fields in the two lines match, then copy the line from file2 into file3
5) Go back to #1
If the files aren't too large to be held in RAM, then you can simply load one file into an array, using fields one and two as the index values, then use that to print out the matching lines from the other file.
No, probably not, since it has to store the whole file in memory first. You'll have to use an algorithm like suicidaleggroll posted.
Unfortunately though that's getting a bit beyond my ability in awk. I'm not that experienced in multi-file processing with it. Perhaps grail will come along soon and show us how it's done.
It would probably be even better to use a languag like perl, but again that's something I don't know much of. I'm mostly just a bash person at this stage. I could write it up as a shell script, but that would be dog slow, and probably take hours to process.
Thank you for the scripts, but it adds all the lines in file1 and file2 together. I only want those in file1. I am wondering if this can be done.
Please do this ...
1) Carefully proofread your problem statement. Did you use the word field in any place where you meant file?
2) Extend your sample input files and corresponding output file.
This is my bash solution that seems to do what you want. It will not be the fastest solution, but would complete in less time than this thread has been running.
Code:
#!/bin/bash
if1="tf1.txt"
if2="tf2.txt"
of1="tf3.txt"
while read line1 ; do
line2=$(grep -m 1 "$line1" $if2);
if [[ $line2 ]] ; then
echo "$line2" >> $of1;
else
echo "$line1 0 0 0 0" >> $of1;
fi
done < $if1
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.