LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-23-2010, 09:33 AM   #16
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983

Quote:
Originally Posted by hsp40oz View Post
this returned 5795 lines which is kind of strange being its a few hundred more than contained in the HQNlist
You sure there aren't duplicate titles in HQNlist?
 
Old 07-23-2010, 09:33 AM   #17
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
Is it possible that there are duplicates in one or both of the files? How many lines are in each file exactly?

Code:
cat file | wc -l
Plus, for example, let's say a title in file #1 is "The Road", but there is also a title like "The Road Home". The search for the 1st title will also get a false positive when it encounters the second title.
 
Old 07-23-2010, 09:35 AM   #18
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
@ colucix - a few days ago, myself and a couple others found similar weirdness, and it may have been due to different versions of `grep` but I cannot recall exactly..
 
Old 07-23-2010, 09:35 AM   #19
hsp40oz
LQ Newbie
 
Registered: Jul 2010
Posts: 9

Original Poster
Rep: Reputation: 0
would using a sed command to get rid of all possible unwanted characters and spaces in both files make it easier?
 
Old 07-23-2010, 09:40 AM   #20
hsp40oz
LQ Newbie
 
Registered: Jul 2010
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by GrapefruiTgirl View Post
Is it possible that there are duplicates in one or both of the files? How many lines are in each file exactly?

Code:
cat file | wc -l
Plus, for example, let's say a title in file #1 is "The Road", but there is also a title like "The Road Home". The search for the 1st title will also get a false positive when it encounters the second title.
$ cat Backlist | wc -l
855
$ cat HQNlist | wc -l
5345
 
Old 07-23-2010, 09:49 AM   #21
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
Quote:
Originally Posted by hsp40oz View Post
would using a sed command to get rid of all possible unwanted characters and spaces in both files make it easier?
My opinion? No. It shouldn't be necessary. When a string variable is quoted, all its characters *should* be made integral to the string as a whole.

Characters such as double-quotations might cause problems.. If you want to see if there's a difference, here's a sed which removes spaces, empty lines, single & double quotes, and colons, and places the results into a new file. You could run this on both files, and then try the grep again to see if there's any difference.

Code:
sed "s/ //g;s/'//g;s/://g;s/\"//g;/^$/d" file >> newfile
 
Old 07-23-2010, 09:54 AM   #22
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Have to say that the grep works just fine for me too using dummy data in colucix's post.
Also I like the following format for the while loop as piping in creates another shell which has given me grief in the past:
Code:
#!/bin/bash

while read line
do
	grep "$line" HQNlist
done< Backlist
 
Old 07-23-2010, 09:55 AM   #23
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
@ grail - that's a good idea that I should try to get into the habit of using, rather than the pipe. Sometimes it works (the pipe), and sometimes not!
 
Old 07-23-2010, 10:15 AM   #24
hsp40oz
LQ Newbie
 
Registered: Jul 2010
Posts: 9

Original Poster
Rep: Reputation: 0
i did not think there were any duplicates, but I just caught one while scrolling down through the HQNlist. What's worse is there are 3 entries of the same title each with a different ISBN number.

smh


I want to strangle my boss.


...and theres another duplicate.

grep wont return multiple lines if there are duplicates?
 
Old 07-23-2010, 10:17 AM   #25
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
here's my stab at it:
Code:
#sort Backlist | uniq > Backlist.sort; mv Backlist.sort Backlist #thanks sasha
size=`grep -c . Backlist`
i=1
while [ $i -le $size ]
do
 title=`sed -n "$i"p Backlist`
 grep "$title" HQNlist
 # echo `grep "$title" HQNlist` - count: `grep -c "$title" HQNlist` #thanks sasha
 i=`expr $i + 1`
done

Last edited by schneidz; 07-23-2010 at 10:52 AM.
 
Old 07-23-2010, 10:18 AM   #26
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by GrapefruiTgirl View Post
Code:
echo "$(cat Backlist)" | while read line; do
   grep -e "${line}" HQNlist >> results.omfg
done
don't have to create a subshell using a pipe.

Code:
while read -r line
do
  ......
done <Backlist
 
Old 07-23-2010, 10:21 AM   #27
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
Quote:
Originally Posted by hsp40oz View Post
grep wont return multiple lines if there are duplicates?
Yes, it *will* return the multiple lines -- every line that matches the regex will be returned -- which I thought that was the "problem" i.e. why there appeared to be more lines returned than were actually present in the HQN file.

But, there'd need to be a WHACK of dupes in one or both files, or a WHACK of examples like I gave about "The Road Home", for there to be many more results returned than titles present in the HQN file..
 
Old 07-23-2010, 10:34 AM   #28
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
^ fyi: i added some comments to my reply #25 to help mitigate duplicates.
 
Old 07-23-2010, 10:38 AM   #29
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555Reputation: 555
EgaDs schneidz, I wonder if there is a more horrid color you could have chosen for those comments?
 
Old 07-23-2010, 10:43 AM   #30
hsp40oz
LQ Newbie
 
Registered: Jul 2010
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by GrapefruiTgirl View Post
Yes, it *will* return the multiple lines -- every line that matches the regex will be returned -- which I thought that was the "problem" i.e. why there appeared to be more lines returned than were actually present in the HQN file.

But, there'd need to be a WHACK of dupes in one or both files, or a WHACK of examples like I gave about "The Road Home", for there to be many more results returned than titles present in the HQN file..
OIC.

If there is a title "The Road" it will return "The Road" and also "The Road Home"

I'm thinking that I need to figure out a new approach to extracting these numbers. Or just find each manually, I'd probably be half done by now.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
shell script question $variable in loop icecubeflower Linux - Newbie 2 03-31-2009 09:09 AM
shell script , while loop ykc Programming 5 03-30-2009 07:50 AM
Shell Script skipping a loop dnvikram Programming 2 01-23-2009 02:29 PM
Loop in Shell Script delamatrix Programming 4 07-24-2008 05:20 PM
optional exit from loop, shell script RudraB Programming 2 07-17-2008 03:30 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration