LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-27-2019, 03:27 PM   #1
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Rep: Reputation: Disabled
FIND all files that contain specific text - grep cmd


Using Ubuntu-18.04 / Linux terminal:

I'm sorry - I'm having great difficulty using the 'grep' command effectively. I have spent countless hours searching the internet and trying this and that with no satisfaction.

My problem would be solved if this could be done for me:

Please provide the command line that would find for me all the files that contain the word 'Duplex' in the file (i.e. not in the file title necessarily, but in the body of the text in the file). These files happen to be *.odt files but I would also like to be able to include *.doc and *.docx files. The files are in this directory:

/home/duncan/Documents

If you could provide the specific / exact (grep-based) command line to help me with this I would be ever so grateful. I have spent months and months on this, without satisfaction - - for example, this:

find /home/duncan/Documents -type f -exec grep -l 'Duplex' {} \;

does not work - i.e. it does not find the files that I know the word searched for exists in - it will find the word searched for in *.txt files, but not the [*.odt / *.ott / *.doc / *.docx] files.

Thank you very much for your time and expertise.
Duncan (UK)

Last edited by Duncan49; 02-28-2019 at 11:21 AM.
 
Old 02-27-2019, 03:40 PM   #2
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,708

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Code:
find /home/duncan/Documents -type f -exec grep -l 'Duplex' {} \;
That command says:
Look for the string 'Duplex' in the list of file names resulting from the find command...which is not, apparently, what you want. To verify this, just do the find command without the -exec to see what ends up in the {}.
Code:
find /home/duncan/Documents -type f
Note that the command will include the names of files in subdirectories of Documents as well

If the files you want to search are all in that directory, you don't really need to find them at all...just grep the files in the directory.
Code:
cd /home/duncan/Documents
grep -l 'Duplex' *
IF you are only looking for one word, quotes are not necessary, but it never hurts to use them.
You can always do
Code:
grep -l 'Duplex' /home/duncan/Documents/*
if you don't want to cd to the directory.

I presume that you understand that the -l will return only a list of file name that contain the string 'Duplex' and not the lines themselves. See man grep
 
3 members found this post helpful.
Old 02-27-2019, 05:01 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,103

Rep: Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117
Quote:
Originally Posted by Duncan49 View Post
I have spent countless hours searching the internet and trying this and that with no satisfaction.
In which case your time would be better spent honing your search skills. A simple "grep text in odt files" returned, as the very first hit, the information that odt (and all office files) are compressed - this is generally known in the wider community. grep is a text (only) searching tool.
Simply unzipping and piping to grep generally works, but there are occurrences of the text being split that grep (and you as the user) will be unaware of.
 
2 members found this post helpful.
Old 02-27-2019, 06:53 PM   #4
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,708

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Quote:
Originally Posted by syg00 View Post
In which case your time would be better spent honing your search skills. A simple "grep text in odt files" returned, as the very first hit, the information that odt (and all office files) are compressed - this is generally known in the wider community. grep is a text (only) searching tool.
Simply unzipping and piping to grep generally works, but there are occurrences of the text being split that grep (and you as the user) will be unaware of.
I missed that in the OP (not sure I even knew that...I've gotten away from using "*Office" software much at all anymore) I am aware that searching for text .doc/.docx files is sometimes inaccurate for the "split text" reason.

I got caught up in the apparent mis-use of the find command...and now that I think about it, I may have been incorrect about that, as well. find ... -exec grep -l {list of file names} should work as the OP wanted, with the caveat you've already pointed out...and the fact that the result will just be a list of file names ('cause of the -l), so how would you know if it wasn't working...
 
2 members found this post helpful.
Old 02-28-2019, 04:05 AM   #5
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Original Poster
Rep: Reputation: Disabled
Angry

Quote:
Originally Posted by syg00 View Post
In which case your time would be better spent honing your search skills. A simple "grep text in odt files" returned, as the very first hit, the information that odt (and all office files) are compressed - this is generally known in the wider community. grep is a text (only) searching tool.
Simply unzipping and piping to grep generally works, but there are occurrences of the text being split that grep (and you as the user) will be unaware of.
Your response is noted. Perhaps you are correct about my search skills - it is not for lack of trying that I have not been able to find a solution. You will, I suppose, have noted that this is the "newbee" section - perhaps you don't have the patience for Newbees and should not stoop to our level? Your say "this is generally known in the wider community. grep is a text (only) searching tool" - well, that is why we have a newbee section - for problems that may appear simple to very, very clever people like you. My advice to you, stay away from the newbee section - you are more of a hindrance than a help.
 
Old 02-28-2019, 04:09 AM   #6
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Original Poster
Rep: Reputation: Disabled
Smile

Quote:
Originally Posted by scasey View Post
Code:
find /home/duncan/Documents -type f -exec grep -l 'Duplex' {} \;
That command says:
Look for the string 'Duplex' in the list of file names resulting from the find command...which is not, apparently, what you want. To verify this, just do the find command without the -exec to see what ends up in the {}.
Code:
find /home/duncan/Documents -type f
Note that the command will include the names of files in subdirectories of Documents as well

If the files you want to search are all in that directory, you don't really need to find them at all...just grep the files in the directory.
Code:
cd /home/duncan/Documents
grep -l 'Duplex' *
IF you are only looking for one word, quotes are not necessary, but it never hurts to use them.
You can always do
Code:
grep -l 'Duplex' /home/duncan/Documents/*
if you don't want to cd to the directory.

I presume that you understand that the -l will return only a list of file name that contain the string 'Duplex' and not the lines themselves. See man grep
===================================
Thank you very much for you patience (with a newbee) and for supplying such a comprehensive answer. I do appreciate it - especially after the rudeness of one of the other respondees. Once again, thank you.
 
Old 02-28-2019, 07:44 AM   #7
Honest Abe
Member
 
Registered: May 2018
Distribution: CentOS 7, OpenSUSE 15
Posts: 415
Blog Entries: 1

Rep: Reputation: 202Reputation: 202Reputation: 202
find and grep for a string ? Why not
Code:
grep -nrw "STRING" /PATH
?

considering all are document/text files ?
 
2 members found this post helpful.
Old 02-28-2019, 10:01 AM   #8
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,708

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Quote:
Originally Posted by Duncan49 View Post
===================================
Thank you very much for you patience (with a newbee) and for supplying such a comprehensive answer. I do appreciate it - especially after the rudeness of one of the other respondees. Once again, thank you.
You are most welcome. Please note that the LQ rules prohibit personal attacks. I apply that with a lesson my Daddy taught me more than 50 years ago: "If you can't say anything nice, don't say anything at all."

syg00 was not "rude." They provided important information to you (and me!) and shared (read taught) you a little about the effective use of search engines. I'm not sure what about their post you thought was rude.

This is how the site works. Those who've been around awhile help those who are new. Again, you are welcome.
 
1 members found this post helpful.
Old 02-28-2019, 10:02 AM   #9
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,877
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Quote:
Originally Posted by Duncan49 View Post
Your response is noted. Perhaps you are correct about my search skills - it is not for lack of trying that I have not been able to find a solution. You will, I suppose, have noted that this is the "newbee" section - perhaps you don't have the patience for Newbees and should not stoop to our level? Your say "this is generally known in the wider community. grep is a text (only) searching tool" - well, that is why we have a newbee section - for problems that may appear simple to very, very clever people like you. My advice to you, stay away from the newbee section - you are more of a hindrance than a help.
Quote:
Originally Posted by Duncan49 View Post
===================================
Thank you very much for you patience (with a newbee) and for supplying such a comprehensive answer. I do appreciate it - especially after the rudeness of one of the other respondees. Once again, thank you.
@Duncan49,

While I can understand that you may not appreciate the content of responses from all members and do see that you've shared your opinion directly in reply. I feel that persisting with commenting about this situation in additional posts is not appropriate behavior for anyone, including yourself.

Further to this point, while you may not appreciate syg00's reply, you should note that they identified a very important detail related to searching MS Office file types.

From our Site Rules, which also contain good posting behavior guidelines:
Quote:
Challenge others' points of view and opinions, but do so respectfully and thoughtfully ... without insult and personal attack. Differing opinions is one of the things that make this site great
Therefore please keep it civil as you continue with your posts and please avoid offering continued comments about other members.

Based on your original post: you stated your intentions and described the command that you tried.

In reviewing that command, it seems reasonable for me, therefore my initial question would have been, "What output you were seeing and what evidence led you to conclude that this form of the command was not working?"

The further truth is, any modifications to your find command that I could offer, would similarly ignore the detail about Office file types, hence my initial attempts might not have aided you any better.

In the meantime, scasey has offered explanations and tips on improving the command, along the lines which I or other members might have offered.

You indicated that you appreciated this. Excellent news.

The question now is whether or not these combined replies from LQ members have solved your question? If so, you can use the Thread Tools shown at the top of the form and mark this thread as Solved.
 
1 members found this post helpful.
Old 02-28-2019, 10:36 AM   #10
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Honest Abe View Post
find and grep for a string ? Why not
Code:
grep -nrw "STRING" /PATH
?

considering all are document/text files ?
Hi - many thanks for your suggestion. It found *.txt files but not [*.ott or *.doc / *.docx] files. Is there any way one can do this - i.e. find the text in [ *.ott / *.odt / *.doc / *.docx ] files?

Last edited by Duncan49; 02-28-2019 at 11:18 AM.
 
Old 02-28-2019, 10:42 AM   #11
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rtmistler View Post
@Duncan49,

While I can understand that you may not appreciate the content of responses from all members and do see that you've shared your opinion directly in reply. I feel that persisting with commenting about this situation in additional posts is not appropriate behavior for anyone, including yourself.

Further to this point, while you may not appreciate syg00's reply, you should note that they identified a very important detail related to searching MS Office file types.

From our Site Rules, which also contain good posting behavior guidelines:Therefore please keep it civil as you continue with your posts and please avoid offering continued comments about other members.

Based on your original post: you stated your intentions and described the command that you tried.

In reviewing that command, it seems reasonable for me, therefore my initial question would have been, "What output you were seeing and what evidence led you to conclude that this form of the command was not working?"

The further truth is, any modifications to your find command that I could offer, would similarly ignore the detail about Office file types, hence my initial attempts might not have aided you any better.

In the meantime, scasey has offered explanations and tips on improving the command, along the lines which I or other members might have offered.

You indicated that you appreciated this. Excellent news.


The question now is whether or not these combined replies from LQ members have solved your question? If so, you can use the Thread Tools shown at the top of the form and mark this thread as Solved.
=====================
Thank you for your response.
1) In reply to this: "What output you were seeing and what evidence led you to conclude that this form of the command was not working?" - my input / command line did not find any files. It now appears this may be because it will only search *.txt files when I was looking for the word in *.ott (i.e. Libre Office) files.

2) I look forward to posting a full explanation of what works when I find the solution - as you see, I still can't find the way in which to [search / find-the-word] in the documents I work with - i.e. Libre Office.

Thank you, DB

Last edited by Duncan49; 02-28-2019 at 10:44 AM.
 
Old 02-28-2019, 10:53 AM   #12
Honest Abe
Member
 
Registered: May 2018
Distribution: CentOS 7, OpenSUSE 15
Posts: 415
Blog Entries: 1

Rep: Reputation: 202Reputation: 202Reputation: 202
We would like to see some efforts. A simple web search took me to the links which concurs with what others have definitely elaborated more, especially about the office files.
The files are accessible to YOU and we can't do your task (searching) for you.
My suggestion would be to do a little more digging, try things out and post some results.
 
Old 02-28-2019, 11:09 AM   #13
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Original Poster
Rep: Reputation: Disabled
Unhappy

Quote:
Originally Posted by Honest Abe View Post
We would like to see some efforts. A simple web search took me to the links which concurs with what others have definitely elaborated more, especially about the office files.
The files are accessible to YOU and we can't do your task (searching) for you.
My suggestion would be to do a little more digging, try things out and post some results.
================
I have spent many months trying to solve this problem - I wonder if you can understand how frustrating it is to spend so many hours (nay, months) and be none the wiser at the end of it. It may be easy for you to understand the Linux terminal, but it is not for me - I am 70 years old and probably stupid to boot - this is not my fault - it is what it is - and that is why I thought I would find a little understanding in a "Newbee" forum - and I am so grateful to have found some are most helpful - others less so. If you had wanted to see the effort you are welcome to come here to my home in Scotland and check for yourself all the fruitless work I have done. None of the "simple web searches" helped me with my problem - I must've typed - literally - thousands of command lines, to no avail - and it now appears that this may be because only *.txt files are searched - I did not know this and can't find anywhere I have read that indicates this. I don't use a text editor - I use Libre Office writer. I am so tired of clever people calling me a fool - I can't help it - and that is why I ask for help. It is an unkind act to push a fool away who is trying to learn. Thank you.

Last edited by Duncan49; 02-28-2019 at 11:17 AM.
 
Old 02-28-2019, 11:16 AM   #14
Duncan49
Member
 
Registered: Oct 2018
Posts: 43

Original Poster
Rep: Reputation: Disabled
I have found that this:

grep -rwi '/path/to/somewhere/' -e 'pattern'

finds the 'pattern' I search for ONLY IN *.txt files - it does not find it in Libre Office files (which I use mostly).

I will let you know as soon as I find out how this can be done - i.e. searching in *.ott and *.odt files.

Thank you
DB

Last edited by Duncan49; 02-28-2019 at 11:17 AM.
 
Old 02-28-2019, 11:20 AM   #15
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,708

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Quote:
Originally Posted by Duncan49 View Post
1) In reply to this: "What output you were seeing and what evidence led you to conclude that this form of the command was not working?" - my input / command line did not find any files. It now appears this may be because it will only search *.txt files when I was looking for the word in *.ott (i.e. Libre Office) files.
(Emphasis added). There is nothing in the command you posted that would limit the search to .txt files. I suspect the problem is what syg00 pointed out...that LibreOffice files are compressesed (do a less on one of them to see what they look like; that is, what grep is seeing), so that the string you're searching for is not in the file.

As for the .doc(x) files, the issue is that they are "encoded" (not the right word; maybe "formatted" is better) by MS Word, such that single words are often "split." Sometimes there's formatting between every letter of a word. Again, try less to see what grep is seeing.

grep is not suited to search "office" documents, as has been stated.

Is it possible to use LibreOffice at the command line to do the search instead of grep?
Maybe...

PS: No one is calling you a fool...we're just trying to help, youngster

Last edited by scasey; 02-28-2019 at 11:54 AM.
 
1 members found this post helpful.
  


Reply

Tags
grep


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] FIND the file/s that contain specific text Duncan49 Linux - Newbie 5 02-27-2019 07:42 AM
grep a text in files and print the file name who don't contain such text whossa Linux - Newbie 5 04-13-2012 07:49 AM
Issue sg_modes cmd at cmd line, want to see the cmd in binary form NuUser Linux - Newbie 1 03-28-2012 08:08 AM
[SOLVED] How to find files that contain one string, but don't contain another. PatrickDickey Linux - Newbie 2 09-11-2011 06:00 AM
FIND: Only directories which contain specific files brian.m Linux - General 2 05-07-2009 09:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration