ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
It's been a very long time since I've mucked around in Bash or Python so I've pretty much forgotten most of it. I've run into a problem I need to resolve at work. I can do it by hand, but it would take me hours upon hours to do it. I'd like to let the computer do the work for me, if at all possible...I'm just not sure how.
You see, I have two text files, "all.txt" and "address.txt". In the "all.txt" file I have email addresses, first name, and last name (approximately 10,000 lines) like so:
I need to write a script that will read each line of the "address.txt file and find its corresponding match in the "all.txt" file then print the whole line (email address, first name, and last name)into a file called "matched.txt". If a line in the "address.txt" fails to match a line in the "all.txt" file then I need it to be printed to a file called "no-match.txt".
Hope this makes sense.
What is the best way to go about this, speed, resource, and accuracy wise?
I tried a few things in Bash and Python, but it isn't working out well. I'm back to being a newbie again
Any help or advice would be sincerely appreciated!
#!/usr/bin/env python
from collections import defaultdict
h = defaultdict(str)
addr = open("address.txt").read().split()
for line in open("all.txt"):
s=line.rstrip().split(" ",1)
h[s[0]] = line
keys = h.keys()
same = set(addr) and set(keys)
diff = set(addr) - set(keys)
match = open("matched.txt","w")
for found in same:
match.write( h[found] )
match.close()
nomatch = open("no-match.txt","w")
for no in diff:
nomatch.write(no)
nomatch.close()
I came back because I found that my bash script didn't actually work 100%.
I saw your post and got excited and tried it out. It was so much faster than mine, but unfortunately it didn't work. I tried it with sample files.
all.txt = 3,102 lines
address.txt = 906 lines
After using your script:
matched.txt = 3,102 lines
no-match.txt = 1 line with many addresses (no new lines)
I skimmed over the files and counted at least 25 "no matches" so the matched.txt file should not equal the number of lines in all.txt.
Thanks though!
My bash script was far less elegant, but it almost worked:
Code:
#!/bin/bash
FILE1=address.txt
FILE2=all.txt
while read line; do
if grep $line $FILE2; then
echo $line >> matched.txt
else
echo $line >> no-matches.txt
fi
done < $FILE1
With this script I got:
890 matches
15 no matches
A total of 905 out of 906 lines in address.txt ... strange.
I'll have to fiddle more with this. I like your script though, it was much faster and probably less on the resources, but it was in-accurate.
I came back because I found that my bash script didn't actually work 100%.
I saw your post and got excited and tried it out. It was so much faster than mine, but unfortunately it didn't work. I tried it with s
you only provided a small bit of sample file to work with. And it does work with my code.
Why don't you provide more samples of both files..are they all the same structure? show your expected output also if possible. Its much faster than your bash script since yours need to call grep for EACH line. (O^2).
Last edited by ghostdog74; 01-24-2011 at 10:35 PM.
There is also the utility join installed often as part of the GNU text tools which will search through two files. Other useful text tools are presented here: GNU text utilities.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.