LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   searching content of pdf documents (https://www.linuxquestions.org/questions/linux-software-2/searching-content-of-pdf-documents-239830/)

bran 10-07-2004 10:53 AM

searching content of pdf documents
 
Hi there

I was wondering if anyone knows a software that allows me to search for keywords in different pdf, openoffice and text documents simultaneously. In other words, something like a google for my local disk content.

Thanks in advance,

bran

kaise_sose 10-08-2004 01:41 AM

you can use the grep command

grep [options] "thing you are looking for" "file(s) to look in"

eg

> grep -r "asdf" /home/

would look for the string "asdf" in /home/ and all the subdir's

AFAIK this will only work on text files. (well it will read all the files but if they aren't text it will read garbage)
I don't think it will work on pdf's or other filetypes tho.

It might still extract info out of openoffice docs because the text is still in there somewhere just has other formatting crap in the file too (which shouldn't match any normal search anyway).

maroonbaboon 10-08-2004 07:36 AM

For PDF there is a tool called 'pdftotext' which extracts the text from a PDF file. Then you can use grep as already described, e.g.

pdftotext somefile.pdf - | grep -i someword

Not sure what you can do with an OpenOffice file.

justin_p 10-08-2004 08:35 AM

the problem with pdf's is that they are oftern scanned in and you can't really mess with them. I'll have to check out the pdftotext thing.


All times are GMT -5. The time now is 07:30 PM.