LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-28-2013, 08:02 AM   #1
Hoxygen232
LQ Newbie
 
Registered: Jan 2013
Posts: 28

Rep: Reputation: Disabled
How to search contents of multiple pdf files and return the pdf's file name?


Hi,

I tried this:
Code:
PDF=$(find /"$DIRECTORY"/ -name '*.pdf' -exec pdftotext {} - \; | grep 'palindrom')
in this way "echo "$PDF" prints only some text from the .pdf file in which was found my word "palindrom", but I also want to know the .pdf file name in which he found the word.

In /"$DIRECTORY"/ there are many folders, .pdf and .txt files so I need to return only the .pdf files whose text conversion matches my word "palindrom"


Thanks

Last edited by Hoxygen232; 04-28-2013 at 08:04 AM.
 
Old 04-28-2013, 08:16 AM   #2
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,953

Rep: Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309Reputation: 1309
actually you can use grep -R or grep -r without find and it will print all the filenames for you
 
Old 04-28-2013, 08:55 AM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,511

Rep: Reputation: 626Reputation: 626Reputation: 626Reputation: 626Reputation: 626Reputation: 626
@pan64: Have you ever looked inside a PDF file? The text that gets displayed is not there in plain ASCII.

@Hoxygen232: You will need to write a short script to do what you want:
Code:
#!/bin/bash
String="$1"
shift
for F in "$@"; do
    pdftotext "$F" - | grep -H --label="$F" "$String"
done
Put that in a file named "pdfgrep" in some directory in your $PATH and make it executable. Then you can do:
Code:
find /"$DIRECTORY"/ -name '*.pdf' -exec pdfgrep 'palindrom' {} +
 
1 members found this post helpful.
Old 04-28-2013, 08:59 AM   #4
Hoxygen232
LQ Newbie
 
Registered: Jan 2013
Posts: 28

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
actually you can use grep -R or grep -r without find and it will print all the filenames for you
ok but how? Besides I need to convert every pdf to txt before using grep that's why I used find.
 
Old 04-28-2013, 09:39 AM   #5
Hoxygen232
LQ Newbie
 
Registered: Jan 2013
Posts: 28

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
@pan64: Have you ever looked inside a PDF file? The text that gets displayed is not there in plain ASCII.

@Hoxygen232: You will need to write a short script to do what you want:
Code:
#!/bin/bash
String="$1"
shift
for F in "$@"; do
    pdftotext "$F" - | grep -H --label="$F" "$String"
done
Put that in a file named "pdfgrep" in some directory in your $PATH and make it executable. Then you can do:
Code:
find /"$DIRECTORY"/ -name '*.pdf' -exec pdfgrep 'palindrom' {} +
perfect, it works
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Search text inside PDF files yogomix Linux - Desktop 7 09-15-2014 05:12 AM
[SOLVED] merge pdf files with each file as a index entry in the big pdf ununun Linux - General 3 05-12-2014 10:32 AM
Read contents of PDF file surwassu Linux - Newbie 17 06-13-2011 05:31 AM
How to search pdf files? Doug Zhang Linux - Software 7 01-12-2010 10:00 AM
howto: pdf file renaming according to contents tacca Linux - Software 1 05-28-2007 01:40 PM


All times are GMT -5. The time now is 04:45 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration