LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-18-2018, 10:38 PM   #1
byebyemrgates
Member
 
Registered: Nov 2017
Location: Blue Mountains, Australia!
Distribution: Mint 20, Ubuntu 20
Posts: 164

Rep: Reputation: Disabled
Question grep search does not find Libre files


Can someone please explain to this newbie: why can't I find a .odt file with grep serching for contents:
as an example,
Code:
  ~/Desktop $ grep -iRl "dear" ./
./TA response
./Installers/Acrobat_2015_Web_WWMUI.exe
As you can see i found two files on the Desktop - "TA response.txt" and something in the .exe file
However, I have a couple of .odt files on the Desktop which contain the word "dear". Why are they not found?
(I also can't find .odt files using SerachMonkey or Catfish gui)
Thanks!
 
Old 10-18-2018, 10:52 PM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,295
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
grep search only text. Files in the OpenDocument Format are XML, which is text, but they are compressed using the Zip utility. So if you would like to search OpenDocument Format using grep, use unzip in front of it.

Code:
unzip -c filename.odt | grep -a whateverpattern
However, that probably won't be of much use because there are few linebreaks in the XML contained in OpenDocument Format files. I'd say you are better off setting up recoll, it's not that difficult.

Last edited by Turbocapitalist; 10-18-2018 at 11:34 PM. Reason: missed the -c
 
2 members found this post helpful.
Old 10-18-2018, 10:53 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Did you even bother to search why ?. Read the first sentence of this article.
The tools recommended in your similar thread are aware of these issues.
 
1 members found this post helpful.
Old 10-18-2018, 10:56 PM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,295
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
Quote:
Originally Posted by syg00 View Post
Read the first sentence of this article.
Wow. Wikipedia really finally cleaned up all that sabotage of the ODF pages. Ten years ago Microsofters were camping all over that page, doing their best to delete any useful information and fill it with disinformation and redundancies.
 
1 members found this post helpful.
Old 10-18-2018, 11:24 PM   #5
Field95
LQ Newbie
 
Registered: Sep 2018
Location: xmpp:zemri@dismail.de
Posts: 13

Rep: Reputation: Disabled
Taking a test.odt file with contents of "Hello World"

Code:
$ unzip test.odt
 extracting: mimetype                
 extracting: Thumbnails/thumbnail.png  
   creating: Configurations2/accelerator/
   creating: Configurations2/popupmenu/
   creating: Configurations2/toolpanel/
   creating: Configurations2/menubar/
   creating: Configurations2/images/Bitmaps/
   creating: Configurations2/toolbar/
   creating: Configurations2/floater/
   creating: Configurations2/statusbar/
   creating: Configurations2/progressbar/
  inflating: content.xml             
  inflating: meta.xml                
  inflating: manifest.rdf            
  inflating: settings.xml            
  inflating: styles.xml              
  inflating: META-INF/manifest.xml 

$ grep "Hello World" content.xml
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" 
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" 
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" 
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" 
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" 
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" 
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" 
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" 
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" 
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" 
xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" 
xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" 
xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" 
xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" 
xmlns:drawooo="http://openoffice.org/2010/draw" 
xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" 
xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" 
xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" 
xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org
/TR/css3-text/" office:version="1.2"><office:scripts/><office:font-face-decls><style:font-face style:name="Lohit 
Devanagari1" svg:font-family="&apos;Lohit Devanagari&apos;"/><style:font-face style:name="Liberation Serif" 
svg:font-family="&apos;Liberation Serif&apos;" style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Liberation Sans" svg:font-family="&apos;Liberation Sans&apos;" style:font-family-
generic="swiss" style:font-pitch="variable"/><style:font-face style:name="Lohit Devanagari" svg:font-family="&
apos;Lohit Devanagari&apos;" style:font-family-generic="system" style:font-pitch="variable"/><style:font-face 
style:name="WenQuanYi Zen Hei" svg:font-family="&apos;WenQuanYi Zen Hei&apos;" style:font-family-generic="system" 
style:font-pitch="variable"/></office:font-face-decls><office:automatic-styles><style:style style:name="P1" 
style:family="paragraph" style:parent-style-name="Standard"><style:text-properties officeooo:rsid="0013bb02" 
officeooo:paragraph-rsid="0013bb02"/></style:style></office:automatic-styles><office:body><office:text>
<text:sequence-decls><text:sequence-decl text:display-outline-level="0" text:name="Illustration"/><text:sequence-
decl text:display-outline-level="0" text:name="Table"/><text:sequence-decl text:display-outline-level="0" 
text:name="Text"/><text:sequence-decl text:display-outline-level="0" text:name="Drawing"/></text:sequence-decls>
<text:p text:style-name="P1">Hello World</text:p></office:text></office:body>
</office:document-content>
Combining some steps could look like this:

Code:
unzip -c ../test.odt content.xml | grep "Hello World"
No guarantee of different versions.

Investigate the filetype you're using and go through it step by step. Then combine the steps into one
 
2 members found this post helpful.
Old 10-18-2018, 11:34 PM   #6
nodir
Member
 
Registered: May 2016
Posts: 222

Rep: Reputation: Disabled
there is zgrep too, but i wouldn't know if it helps here.
 
1 members found this post helpful.
Old 10-18-2018, 11:35 PM   #7
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,295
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
Quote:
Originally Posted by nodir View Post
there is zgrep too, but i wouldn't know if it helps here.
zgrep uses gzip not zip.
 
2 members found this post helpful.
Old 10-19-2018, 10:22 AM   #8
DavidMcCann
LQ Veteran
 
Registered: Jul 2006
Location: London
Distribution: PCLinuxOS, Debian
Posts: 6,137

Rep: Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314
Field95 has the basic answer. It can be extended to search several files in one folder and just to give you a list of the files with the sought term, rather than every paragraph containing it.
Code:
for file in path/*.odt; do unzip -c $file | grep -iq searchtext && echo $file; done
The loop takes each .odt file in the folder path. Each is then unzipped and passed to grep. The -q parameter with grep means that it doesn't output anything, but just stops the moment it finds searchtext. If it's successful, the file name is printed by echo.

I confess I didn't invent this one myself, and I'm eternally grateful the the one who did!
 
1 members found this post helpful.
Old 10-20-2018, 06:24 PM   #9
Field95
LQ Newbie
 
Registered: Sep 2018
Location: xmpp:zemri@dismail.de
Posts: 13

Rep: Reputation: Disabled
Quote:
Originally Posted by DavidMcCann View Post
Field95 has the basic answer. It can be extended to search several files in one folder and just to give you a list of the files with the sought term, rather than every paragraph containing it.
Code:
for file in path/*.odt; do unzip -c $file | grep -iq searchtext && echo $file; done
The loop takes each .odt file in the folder path. Each is then unzipped and passed to grep. The -q parameter with grep means that it doesn't output anything, but just stops the moment it finds searchtext. If it's successful, the file name is printed by echo.

I confess I didn't invent this one myself, and I'm eternally grateful the the one who did!

One of the few times --label seems useful

Code:
for i in *.odt; do unzip -c "$i" | grep --label="$i" -l "searchtext"; done
I might include a -I to ignore binary files that might crop up when searching multiple files in these odt files

Code:
       --label=LABEL
              Display input actually coming from standard input as input coming from file LABEL.  This is especially useful when implementing tools like zgrep, e.g., gzip  -cd  foo.gz  |
              grep --label=foo -H something.  See also the -H option.
       -l, --files-with-matches
              Suppress normal output; instead print the name of each input file from which output would normally have been printed.  The scanning will stop on the first match.

       -I     Process a binary file as if it did not contain matching data; this is equivalent to the --binary-files=without-match option.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep & find search hangups maybebaby Linux - Newbie 6 05-10-2016 02:54 PM
[SOLVED] how to use cp find and grep together to copy a list of files using find with grep babhijit Linux - Newbie 10 07-03-2013 12:25 PM
[SOLVED] Recursive search for strings in files with a certain date: find -name or grep -R? wolverene13 Linux - Newbie 6 10-01-2011 05:05 PM
how to search files apart from GREP?? kapilbajpai88 Linux - Newbie 6 07-24-2008 11:15 AM
can you specify which files to grep search? sneakyimp Linux - Software 4 10-12-2005 08:28 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration