[SOLVED] is there a way to grep thru *.pdf

atjurhs · 01-15-2016, 11:11 AM

hey guys, I have a directory with 20+ pdf files and I need to find info within 1 or more of them. the info will contain a specific string, let's say aBcDeFg. I know

[code\] grep -isnr aBcDeFg *.pdf [code]

won't work. is there something that will grep thru a bunch of pdf files?

thanks!

tabby

schneidz · 01-15-2016, 11:13 AM

maybe strings can help ?

Tonus · 01-15-2016, 12:11 PM

Could it be ok to convert them first to text ?

TB0ne · 01-15-2016, 12:15 PM

Quote:

Originally Posted by atjurhs

hey guys, I have a directory with 20+ pdf files and I need to find info within 1 or more of them. the info will contain a specific string, let's say aBcDeFg. I know

[code\] grep -isnr aBcDeFg *.pdf [code]

won't work. is there something that will grep thru a bunch of pdf files?

thanks!

tabby

First, the CODE tags are [ CODE] to start, and [/ CODE] to stop.

As far as your issue goes, try pdftotext, which will convert that PDF file into text...which you can then pump through grep for a string. A simple loop:

Code:

 for file in /pdf/path/*.pdf; do pdftotext "$file"; done

..will convert them all into text. Change the 'pdftotext' to be your grep:

Code:

 for file in /pdf/path/*.txt; do grep aBcDeFg "$file"; done

atjurhs · 01-15-2016, 12:50 PM

TBOne thanks for your help! i'll give it a try...

and thanks for the code tags, I always forget them, i'll put them on a sticky...

tabby

atjurhs · 01-15-2016, 01:12 PM

yep, that worked, took a little while to convert them, thanks

tabby