Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Is it possible that there are duplicates in one or both of the files? How many lines are in each file exactly?
Code:
cat file | wc -l
Plus, for example, let's say a title in file #1 is "The Road", but there is also a title like "The Road Home". The search for the 1st title will also get a false positive when it encounters the second title.
@ colucix - a few days ago, myself and a couple others found similar weirdness, and it may have been due to different versions of `grep` but I cannot recall exactly..
Is it possible that there are duplicates in one or both of the files? How many lines are in each file exactly?
Code:
cat file | wc -l
Plus, for example, let's say a title in file #1 is "The Road", but there is also a title like "The Road Home". The search for the 1st title will also get a false positive when it encounters the second title.
would using a sed command to get rid of all possible unwanted characters and spaces in both files make it easier?
My opinion? No. It shouldn't be necessary. When a string variable is quoted, all its characters *should* be made integral to the string as a whole.
Characters such as double-quotations might cause problems.. If you want to see if there's a difference, here's a sed which removes spaces, empty lines, single & double quotes, and colons, and places the results into a new file. You could run this on both files, and then try the grep again to see if there's any difference.
Code:
sed "s/ //g;s/'//g;s/://g;s/\"//g;/^$/d" file >> newfile
Have to say that the grep works just fine for me too using dummy data in colucix's post.
Also I like the following format for the while loop as piping in creates another shell which has given me grief in the past:
Code:
#!/bin/bash
while read line
do
grep "$line" HQNlist
done< Backlist
i did not think there were any duplicates, but I just caught one while scrolling down through the HQNlist. What's worse is there are 3 entries of the same title each with a different ISBN number.
smh
I want to strangle my boss.
...and theres another duplicate.
grep wont return multiple lines if there are duplicates?
grep wont return multiple lines if there are duplicates?
Yes, it *will* return the multiple lines -- every line that matches the regex will be returned -- which I thought that was the "problem" i.e. why there appeared to be more lines returned than were actually present in the HQN file.
But, there'd need to be a WHACK of dupes in one or both files, or a WHACK of examples like I gave about "The Road Home", for there to be many more results returned than titles present in the HQN file..
Yes, it *will* return the multiple lines -- every line that matches the regex will be returned -- which I thought that was the "problem" i.e. why there appeared to be more lines returned than were actually present in the HQN file.
But, there'd need to be a WHACK of dupes in one or both files, or a WHACK of examples like I gave about "The Road Home", for there to be many more results returned than titles present in the HQN file..
OIC.
If there is a title "The Road" it will return "The Road" and also "The Road Home"
I'm thinking that I need to figure out a new approach to extracting these numbers. Or just find each manually, I'd probably be half done by now.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.