[SOLVED] grep search does not find Libre files

byebyemrgates · 10-18-2018, 10:38 PM

Can someone please explain to this newbie: why can't I find a .odt file with grep serching for contents:
as an example,

Code:

  ~/Desktop $ grep -iRl "dear" ./
./TA response
./Installers/Acrobat_2015_Web_WWMUI.exe

As you can see i found two files on the Desktop - "TA response.txt" and something in the .exe file
However, I have a couple of .odt files on the Desktop which contain the word "dear". Why are they not found?
(I also can't find .odt files using SerachMonkey or Catfish gui)
Thanks!

Turbocapitalist · 10-18-2018, 10:52 PM

grep search only text. Files in the OpenDocument Format are XML, which is text, but they are compressed using the Zip utility. So if you would like to search OpenDocument Format using grep, use unzip in front of it.

Code:

unzip -c filename.odt | grep -a whateverpattern

However, that probably won't be of much use because there are few linebreaks in the XML contained in OpenDocument Format files. I'd say you are better off setting up recoll, it's not that difficult.

syg00 · 10-18-2018, 10:53 PM

Did you even bother to search why ?. Read the first sentence of this article.
The tools recommended in your similar thread are aware of these issues.

Turbocapitalist · 10-18-2018, 10:56 PM

Quote:

Originally Posted by syg00

Read the first sentence of this article.

Wow. Wikipedia really finally cleaned up all that sabotage of the ODF pages. Ten years ago Microsofters were camping all over that page, doing their best to delete any useful information and fill it with disinformation and redundancies.

Field95 · 10-18-2018, 11:24 PM

Taking a test.odt file with contents of "Hello World"

Code:

$ unzip test.odt
 extracting: mimetype                
 extracting: Thumbnails/thumbnail.png  
   creating: Configurations2/accelerator/
   creating: Configurations2/popupmenu/
   creating: Configurations2/toolpanel/
   creating: Configurations2/menubar/
   creating: Configurations2/images/Bitmaps/
   creating: Configurations2/toolbar/
   creating: Configurations2/floater/
   creating: Configurations2/statusbar/
   creating: Configurations2/progressbar/
  inflating: content.xml             
  inflating: meta.xml                
  inflating: manifest.rdf            
  inflating: settings.xml            
  inflating: styles.xml              
  inflating: META-INF/manifest.xml 

$ grep "Hello World" content.xml
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" 
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" 
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" 
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" 
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" 
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" 
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" 
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" 
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" 
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" 
xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" 
xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" 
xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" 
xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" 
xmlns:drawooo="http://openoffice.org/2010/draw" 
xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" 
xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" 
xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" 
xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org
/TR/css3-text/" office:version="1.2"><office:scripts/><office:font-face-decls><style:font-face style:name="Lohit 
Devanagari1" svg:font-family="&apos;Lohit Devanagari&apos;"/><style:font-face style:name="Liberation Serif" 
svg:font-family="&apos;Liberation Serif&apos;" style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Liberation Sans" svg:font-family="&apos;Liberation Sans&apos;" style:font-family-
generic="swiss" style:font-pitch="variable"/><style:font-face style:name="Lohit Devanagari" svg:font-family="&
apos;Lohit Devanagari&apos;" style:font-family-generic="system" style:font-pitch="variable"/><style:font-face 
style:name="WenQuanYi Zen Hei" svg:font-family="&apos;WenQuanYi Zen Hei&apos;" style:font-family-generic="system" 
style:font-pitch="variable"/></office:font-face-decls><office:automatic-styles><style:style style:name="P1" 
style:family="paragraph" style:parent-style-name="Standard"><style:text-properties officeooo:rsid="0013bb02" 
officeooo:paragraph-rsid="0013bb02"/></style:style></office:automatic-styles><office:body><office:text>
<text:sequence-decls><text:sequence-decl text:display-outline-level="0" text:name="Illustration"/><text:sequence-
decl text:display-outline-level="0" text:name="Table"/><text:sequence-decl text:display-outline-level="0" 
text:name="Text"/><text:sequence-decl text:display-outline-level="0" text:name="Drawing"/></text:sequence-decls>
<text:p text:style-name="P1">Hello World</text:p></office:text></office:body>
</office:document-content>

Combining some steps could look like this:

Code:

unzip -c ../test.odt content.xml | grep "Hello World"

No guarantee of different versions.

Investigate the filetype you're using and go through it step by step. Then combine the steps into one

nodir · 10-18-2018, 11:34 PM

there is zgrep too, but i wouldn't know if it helps here.

Turbocapitalist · 10-18-2018, 11:35 PM

Quote:

Originally Posted by nodir

there is zgrep too, but i wouldn't know if it helps here.

zgrep uses gzip not zip.

DavidMcCann · 10-19-2018, 10:22 AM

Field95 has the basic answer. It can be extended to search several files in one folder and just to give you a list of the files with the sought term, rather than every paragraph containing it.

Code:

for file in path/*.odt; do unzip -c $file | grep -iq searchtext && echo $file; done

The loop takes each .odt file in the folder path. Each is then unzipped and passed to grep. The -q parameter with grep means that it doesn't output anything, but just stops the moment it finds searchtext. If it's successful, the file name is printed by echo.

I confess I didn't invent this one myself, and I'm eternally grateful the the one who did!

Field95 · 10-20-2018, 06:24 PM

Quote:

Originally Posted by DavidMcCann

Field95 has the basic answer. It can be extended to search several files in one folder and just to give you a list of the files with the sought term, rather than every paragraph containing it.

Code:

for file in path/*.odt; do unzip -c $file | grep -iq searchtext && echo $file; done

The loop takes each .odt file in the folder path. Each is then unzipped and passed to grep. The -q parameter with grep means that it doesn't output anything, but just stops the moment it finds searchtext. If it's successful, the file name is printed by echo.

I confess I didn't invent this one myself, and I'm eternally grateful the the one who did!

One of the few times --label seems useful

Code:

for i in *.odt; do unzip -c "$i" | grep --label="$i" -l "searchtext"; done

I might include a -I to ignore binary files that might crop up when searching multiple files in these odt files

Code:

       --label=LABEL
              Display input actually coming from standard input as input coming from file LABEL.  This is especially useful when implementing tools like zgrep, e.g., gzip  -cd  foo.gz  |
              grep --label=foo -H something.  See also the -H option.
       -l, --files-with-matches
              Suppress normal output; instead print the name of each input file from which output would normally have been printed.  The scanning will stop on the first match.

       -I     Process a binary file as if it did not contain matching data; this is equivalent to the --binary-files=without-match option.