LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 06-17-2004, 12:03 AM   #1
funkymunky
Member
 
Registered: Jun 2003
Location: Austin, Texas
Distribution: Fedora Core 8, 32-bit
Posts: 126

Rep: Reputation: 15
Wink Shell Script: Searching for words in Pdf files


HI!


Iv stumbled upon a treasure trove of ebooks, 5gb in all, on a local hard drive. But since they are not arranged topic-wise, i want to write a shell script that will arrange them topically.
For this i need some utility that would read a pdf file, so that i can grep for key words in the output and send the file to the appropriate folder.
Does anyone here know of such a utility? Much like a "cat" for pdf?

Thanx in anticipation

Mayank

Last edited by funkymunky; 06-17-2004 at 12:49 AM.
 
Old 06-17-2004, 12:22 AM   #2
320mb
Senior Member
 
Registered: Nov 2002
Location: pikes peak
Distribution: Slackware, LFS
Posts: 2,577

Rep: Reputation: 47
http://man.linuxquestions.org/index....ction=0&type=2

Code:
man awk/gawk
 
Old 06-17-2004, 12:36 AM   #3
funkymunky
Member
 
Registered: Jun 2003
Location: Austin, Texas
Distribution: Fedora Core 8, 32-bit
Posts: 126

Original Poster
Rep: Reputation: 15
thanx..thats one way..any other options?
 
Old 06-17-2004, 01:08 AM   #4
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,482

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
pdftotext should help (http://www.foolabs.com/xpdf/about.html)
 
Old 06-17-2004, 02:57 PM   #5
keefaz
Senior Member
 
Registered: Mar 2004
Distribution: Slackware
Posts: 4,338

Rep: Reputation: 73
grep can grep pdf file on another way
grep -i stringToFind docToSearch.pdf
 
Old 06-17-2004, 07:04 PM   #6
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,482

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Keefaz,

Pdf files are binary files with strings encoded, grep is unlikely to find anything but gibberish in them ...
 
Old 06-17-2004, 10:25 PM   #7
funkymunky
Member
 
Registered: Jun 2003
Location: Austin, Texas
Distribution: Fedora Core 8, 32-bit
Posts: 126

Original Poster
Rep: Reputation: 15
yes, thats why i was looking for a util. pdftotext is fine for my use, it takes the starting and ending pages as arguments. i can make a temporary text file, and then grep for strings in it

thanx jlliagre
 
Old 06-18-2004, 02:28 PM   #8
keefaz
Senior Member
 
Registered: Mar 2004
Distribution: Slackware
Posts: 4,338

Rep: Reputation: 73
jlliagre
I know that pdf files are binary file lol, but grep can test if a string is in a pdf file. Try yourself

grep -i "a string" yourfile.pdf

see the output, grep can do a test
[edit]
if nothing output echo $?, the test is false string not found

Last edited by keefaz; 06-18-2004 at 02:31 PM.
 
Old 06-18-2004, 03:34 PM   #9
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,482

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Keefaz,

I understand you happen to have successfully tested grep on a file, luckily not encoded nor compressed, but that doesn't make your suggestion correct.
Even in your case, strings embedded in pdfs can be cut in arbitrarily positions, so you are likely to miss many hits.
Moreover, eBooks, for obvious reasons, are usually compressed, so grep is not there an option.

Thanks for trying.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script to remove backup ~ files hallamigo Linux - General 3 09-13-2010 03:47 PM
shell script that checks for existence of files Rotwang Linux - General 3 12-02-2005 02:11 PM
reading idle time with perl/shell script daryl314 Linux - General 1 12-27-2004 01:11 PM
reading PDF files vineet Linux - Newbie 5 06-26-2004 12:10 PM
Reading PDF files in linux Chijtska Linux - General 4 02-01-2002 07:55 PM


All times are GMT -5. The time now is 01:10 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration