LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-01-2021, 02:08 AM   #1
Alok Rai
Member
 
Registered: Aug 2015
Posts: 247

Rep: Reputation: Disabled
string search in .odt document directories


Is it possible to use some grep-like command to search for a particular string in a whole directory of .odt documents?

I imagine that one could search individual documents using "search", one by painful one - but zapping a whole directory?

I am using LM 19 , XFCE variant.
 
Old 03-01-2021, 02:24 AM   #2
lvm_
Member
 
Registered: Jul 2020
Posts: 970

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
File indexers which come with major desktop environments - baloo for KDE and tracker for Gnome can do it, but they are CPU and disk hogs. Not sure if XFCE has a similar tool. Actually odt file is just a buch of zipped xml files, so quick and dirty way would be to write a script unzipping them and grepping inside xml starting from /content.xml or maybe using a more advanced xml grep tool like xml_grep
 
Old 03-01-2021, 02:43 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,147

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
When looking for text I don't care about the xml - just unzip and pipe to normal grep. Stick it in a loop of choice.
 
Old 03-01-2021, 03:22 AM   #4
Alok Rai
Member
 
Registered: Aug 2015
Posts: 247

Original Poster
Rep: Reputation: Disabled
I fear I might have sounded more capable than I am! But - how would I unzip all the .odt files in a particular directory (and sub-directories) and then pipe them to grep?

And yes, thanks for the warning - I will take a careful backup of everything before I try any operation on them.
 
Old 03-01-2021, 03:26 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,147

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
Quick search found this - have a look at the script. Shouldn't affect any of your current files adversely.
 
Old 03-01-2021, 03:38 AM   #6
Alok Rai
Member
 
Registered: Aug 2015
Posts: 247

Original Poster
Rep: Reputation: Disabled
I probably need a Bash tutorial in order to make sense of this -

#!/bin/bash

find . -type f -name "*.od*" | while read i ; do
[ "$1" ] || { echo "You forgot search string!" ; exit 1 ; }
unzip -ca "$i" 2>/dev/null | grep -iq "$*"
if [ $? -eq 0 ] ; then
echo "string found in $i" | nl
fi
done

- but thanks! Will work at it.
 
Old 03-01-2021, 06:11 AM   #7
hish2021
Member
 
Registered: Jan 2021
Posts: 117

Rep: Reputation: Disabled
Another tool is recoll. It's also an indexer. You can set it to run when you want it to and it's very customizable. It should be in your distro's repos.
 
Old 03-01-2021, 10:33 AM   #8
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,378

Rep: Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757
As an alternative, using libreoffice to convert .odt files to ,txt files
Code:
#!/bin/bash

# Script to search all .odt files in top directory specified in $1 for string specified in $2

topdir=$1
shopt -s globstar

for f in "$topdir"/**/*.odt; do
  libreoffice --convert-to "txt:Text (encoded):UTF8" --outdir "/tmp" "$f" 1>/dev/null
  tmpfile=${f/%odt/txt}
  tmpfile=/tmp/${tmpfile##*/}
  if $(grep -iq "$2" "$tmpfile"); then
    echo "Found $2 in $f"
  fi
  rm "$tmpfile"
done

shopt -u globstar
 
Old 03-01-2021, 10:44 AM   #9
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
Actually, the package unzip also provides the zipgrep command.

There's also odt2txt which is much faster than converting with the libreoffice command line (I mean the standalone package of that name, not the symbolic link /usr/bin/odt2txt provided by unoconv which is more or less the same as the headless libreoffice).

Last edited by shruggy; 03-01-2021 at 10:57 AM.
 
1 members found this post helpful.
Old 03-01-2021, 11:14 AM   #10
Alok Rai
Member
 
Registered: Aug 2015
Posts: 247

Original Poster
Rep: Reputation: Disabled
Thank you, wonderful community! I have several options now: I can convert .odt files to .txt files, and then just run a search using grep. Or, I could experiment with using recoll. Or ack.

Thanks once again.
 
Old 03-01-2021, 11:36 AM   #11
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
ack? I don't think ack can search inside compressed files.
 
Old 03-02-2021, 12:18 AM   #12
Alok Rai
Member
 
Registered: Aug 2015
Posts: 247

Original Poster
Rep: Reputation: Disabled
Thanks, shruggy. ack can't, can it. I believe recoll can.
 
Old 03-02-2021, 12:21 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,147

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
I reckon shruggys suggestion of zipgrep is a better option. I have some very bad experiences of indexers - to the extent I disable them at every system install.
 
Old 03-02-2021, 01:53 AM   #14
Alok Rai
Member
 
Registered: Aug 2015
Posts: 247

Original Poster
Rep: Reputation: Disabled
I just looked up zipgrep, and it sounds like the answer to my prayers! One further question, though - is it possible to use zipgrep to search through a whole directory containing .odt files? Or would I first have to zip up the directory, so that I have a file called, say, documents.zip - and then run zipgrep on it?
 
Old 03-02-2021, 04:49 AM   #15
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
/usr/bin/zipgrep is just a shell script, and AFAICS, it only works with one zip file. But you can invoke it in a loop:
Code:
for f in documents/*.odt
do zipgrep pattern "$f"
done
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to capture 1000 lines before a string match and 1000 line a string match including line of string match ? sysmicuser Linux - Newbie 12 11-14-2017 05:21 AM
[SOLVED] Search for a word in multiple ODT files NotAComputerGuy Linux - Software 6 05-10-2016 12:26 PM
[SOLVED] Problem opening odt document in LibreOffice Subhraman Sarkar Linux - General 8 01-27-2015 08:36 AM
[SOLVED] copy string a to string b and change string b with toupper() and count the chars beep3r Programming 3 10-22-2010 07:22 PM
find string in filename and use string to create directories daberkow Linux - Newbie 11 05-01-2009 02:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration