[SOLVED] FIND all files that contain specific text - grep cmd
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
FIND all files that contain specific text - grep cmd
Using Ubuntu-18.04 / Linux terminal:
I'm sorry - I'm having great difficulty using the 'grep' command effectively. I have spent countless hours searching the internet and trying this and that with no satisfaction.
My problem would be solved if this could be done for me:
Please provide the command line that would find for me all the files that contain the word 'Duplex' in the file (i.e. not in the file title necessarily, but in the body of the text in the file). These files happen to be *.odt files but I would also like to be able to include *.doc and *.docx files. The files are in this directory:
/home/duncan/Documents
If you could provide the specific / exact (grep-based) command line to help me with this I would be ever so grateful. I have spent months and months on this, without satisfaction - - for example, this:
find /home/duncan/Documents -type f -exec grep -l 'Duplex' {} \;
does not work - i.e. it does not find the files that I know the word searched for exists in - it will find the word searched for in *.txt files, but not the [*.odt / *.ott / *.doc / *.docx] files.
Thank you very much for your time and expertise.
Duncan (UK)
find /home/duncan/Documents -type f -exec grep -l 'Duplex' {} \;
That command says:
Look for the string 'Duplex' in the list of file names resulting from the find command...which is not, apparently, what you want. To verify this, just do the find command without the -exec to see what ends up in the {}.
Code:
find /home/duncan/Documents -type f
Note that the command will include the names of files in subdirectories of Documents as well
If the files you want to search are all in that directory, you don't really need to find them at all...just grep the files in the directory.
Code:
cd /home/duncan/Documents
grep -l 'Duplex' *
IF you are only looking for one word, quotes are not necessary, but it never hurts to use them.
You can always do
Code:
grep -l 'Duplex' /home/duncan/Documents/*
if you don't want to cd to the directory.
I presume that you understand that the -l will return only a list of file name that contain the string 'Duplex' and not the lines themselves. See man grep
I have spent countless hours searching the internet and trying this and that with no satisfaction.
In which case your time would be better spent honing your search skills. A simple "grep text in odt files" returned, as the very first hit, the information that odt (and all office files) are compressed - this is generally known in the wider community. grep is a text (only) searching tool.
Simply unzipping and piping to grep generally works, but there are occurrences of the text being split that grep (and you as the user) will be unaware of.
In which case your time would be better spent honing your search skills. A simple "grep text in odt files" returned, as the very first hit, the information that odt (and all office files) are compressed - this is generally known in the wider community. grep is a text (only) searching tool.
Simply unzipping and piping to grep generally works, but there are occurrences of the text being split that grep (and you as the user) will be unaware of.
I missed that in the OP (not sure I even knew that...I've gotten away from using "*Office" software much at all anymore) I am aware that searching for text .doc/.docx files is sometimes inaccurate for the "split text" reason.
I got caught up in the apparent mis-use of the find command...and now that I think about it, I may have been incorrect about that, as well. find ... -exec grep -l {list of file names} should work as the OP wanted, with the caveat you've already pointed out...and the fact that the result will just be a list of file names ('cause of the -l), so how would you know if it wasn't working...
In which case your time would be better spent honing your search skills. A simple "grep text in odt files" returned, as the very first hit, the information that odt (and all office files) are compressed - this is generally known in the wider community. grep is a text (only) searching tool.
Simply unzipping and piping to grep generally works, but there are occurrences of the text being split that grep (and you as the user) will be unaware of.
Your response is noted. Perhaps you are correct about my search skills - it is not for lack of trying that I have not been able to find a solution. You will, I suppose, have noted that this is the "newbee" section - perhaps you don't have the patience for Newbees and should not stoop to our level? Your say "this is generally known in the wider community. grep is a text (only) searching tool" - well, that is why we have a newbee section - for problems that may appear simple to very, very clever people like you. My advice to you, stay away from the newbee section - you are more of a hindrance than a help.
find /home/duncan/Documents -type f -exec grep -l 'Duplex' {} \;
That command says:
Look for the string 'Duplex' in the list of file names resulting from the find command...which is not, apparently, what you want. To verify this, just do the find command without the -exec to see what ends up in the {}.
Code:
find /home/duncan/Documents -type f
Note that the command will include the names of files in subdirectories of Documents as well
If the files you want to search are all in that directory, you don't really need to find them at all...just grep the files in the directory.
Code:
cd /home/duncan/Documents
grep -l 'Duplex' *
IF you are only looking for one word, quotes are not necessary, but it never hurts to use them.
You can always do
Code:
grep -l 'Duplex' /home/duncan/Documents/*
if you don't want to cd to the directory.
I presume that you understand that the -l will return only a list of file name that contain the string 'Duplex' and not the lines themselves. See man grep
===================================
Thank you very much for you patience (with a newbee) and for supplying such a comprehensive answer. I do appreciate it - especially after the rudeness of one of the other respondees. Once again, thank you.
===================================
Thank you very much for you patience (with a newbee) and for supplying such a comprehensive answer. I do appreciate it - especially after the rudeness of one of the other respondees. Once again, thank you.
You are most welcome. Please note that the LQ rules prohibit personal attacks. I apply that with a lesson my Daddy taught me more than 50 years ago: "If you can't say anything nice, don't say anything at all."
syg00 was not "rude." They provided important information to you (and me!) and shared (read taught) you a little about the effective use of search engines. I'm not sure what about their post you thought was rude.
This is how the site works. Those who've been around awhile help those who are new. Again, you are welcome.
Your response is noted. Perhaps you are correct about my search skills - it is not for lack of trying that I have not been able to find a solution. You will, I suppose, have noted that this is the "newbee" section - perhaps you don't have the patience for Newbees and should not stoop to our level? Your say "this is generally known in the wider community. grep is a text (only) searching tool" - well, that is why we have a newbee section - for problems that may appear simple to very, very clever people like you. My advice to you, stay away from the newbee section - you are more of a hindrance than a help.
Quote:
Originally Posted by Duncan49
===================================
Thank you very much for you patience (with a newbee) and for supplying such a comprehensive answer. I do appreciate it - especially after the rudeness of one of the other respondees. Once again, thank you.
@Duncan49,
While I can understand that you may not appreciate the content of responses from all members and do see that you've shared your opinion directly in reply. I feel that persisting with commenting about this situation in additional posts is not appropriate behavior for anyone, including yourself.
Further to this point, while you may not appreciate syg00's reply, you should note that they identified a very important detail related to searching MS Office file types.
From our Site Rules, which also contain good posting behavior guidelines:
Quote:
Challenge others' points of view and opinions, but do so respectfully and thoughtfully ... without insult and personal attack. Differing opinions is one of the things that make this site great
Therefore please keep it civil as you continue with your posts and please avoid offering continued comments about other members.
Based on your original post: you stated your intentions and described the command that you tried.
In reviewing that command, it seems reasonable for me, therefore my initial question would have been, "What output you were seeing and what evidence led you to conclude that this form of the command was not working?"
The further truth is, any modifications to your find command that I could offer, would similarly ignore the detail about Office file types, hence my initial attempts might not have aided you any better.
In the meantime, scasey has offered explanations and tips on improving the command, along the lines which I or other members might have offered.
You indicated that you appreciated this. Excellent news.
The question now is whether or not these combined replies from LQ members have solved your question? If so, you can use the Thread Tools shown at the top of the form and mark this thread as Solved.
Hi - many thanks for your suggestion. It found *.txt files but not [*.ott or *.doc / *.docx] files. Is there any way one can do this - i.e. find the text in [ *.ott / *.odt / *.doc / *.docx ] files?
While I can understand that you may not appreciate the content of responses from all members and do see that you've shared your opinion directly in reply. I feel that persisting with commenting about this situation in additional posts is not appropriate behavior for anyone, including yourself.
Further to this point, while you may not appreciate syg00's reply, you should note that they identified a very important detail related to searching MS Office file types.
From our Site Rules, which also contain good posting behavior guidelines:Therefore please keep it civil as you continue with your posts and please avoid offering continued comments about other members.
Based on your original post: you stated your intentions and described the command that you tried.
In reviewing that command, it seems reasonable for me, therefore my initial question would have been, "What output you were seeing and what evidence led you to conclude that this form of the command was not working?"
The further truth is, any modifications to your find command that I could offer, would similarly ignore the detail about Office file types, hence my initial attempts might not have aided you any better.
In the meantime, scasey has offered explanations and tips on improving the command, along the lines which I or other members might have offered.
You indicated that you appreciated this. Excellent news.
The question now is whether or not these combined replies from LQ members have solved your question? If so, you can use the Thread Tools shown at the top of the form and mark this thread as Solved.
=====================
Thank you for your response.
1) In reply to this: "What output you were seeing and what evidence led you to conclude that this form of the command was not working?" - my input / command line did not find any files. It now appears this may be because it will only search *.txt files when I was looking for the word in *.ott (i.e. Libre Office) files.
2) I look forward to posting a full explanation of what works when I find the solution - as you see, I still can't find the way in which to [search / find-the-word] in the documents I work with - i.e. Libre Office.
We would like to see some efforts. A simple web search took me to the links which concurs with what others have definitely elaborated more, especially about the office files.
The files are accessible to YOU and we can't do your task (searching) for you.
My suggestion would be to do a little more digging, try things out and post some results.
We would like to see some efforts. A simple web search took me to the links which concurs with what others have definitely elaborated more, especially about the office files.
The files are accessible to YOU and we can't do your task (searching) for you.
My suggestion would be to do a little more digging, try things out and post some results.
================
I have spent many months trying to solve this problem - I wonder if you can understand how frustrating it is to spend so many hours (nay, months) and be none the wiser at the end of it. It may be easy for you to understand the Linux terminal, but it is not for me - I am 70 years old and probably stupid to boot - this is not my fault - it is what it is - and that is why I thought I would find a little understanding in a "Newbee" forum - and I am so grateful to have found some are most helpful - others less so. If you had wanted to see the effort you are welcome to come here to my home in Scotland and check for yourself all the fruitless work I have done. None of the "simple web searches" helped me with my problem - I must've typed - literally - thousands of command lines, to no avail - and it now appears that this may be because only *.txt files are searched - I did not know this and can't find anywhere I have read that indicates this. I don't use a text editor - I use Libre Office writer. I am so tired of clever people calling me a fool - I can't help it - and that is why I ask for help. It is an unkind act to push a fool away who is trying to learn. Thank you.
1) In reply to this: "What output you were seeing and what evidence led you to conclude that this form of the command was not working?" - my input / command line did not find any files. It now appears this may be because it will only search *.txt files when I was looking for the word in *.ott (i.e. Libre Office) files.
(Emphasis added). There is nothing in the command you posted that would limit the search to .txt files. I suspect the problem is what syg00 pointed out...that LibreOffice files are compressesed (do a less on one of them to see what they look like; that is, what grep is seeing), so that the string you're searching for is not in the file.
As for the .doc(x) files, the issue is that they are "encoded" (not the right word; maybe "formatted" is better) by MS Word, such that single words are often "split." Sometimes there's formatting between every letter of a word. Again, try less to see what grep is seeing.
grep is not suited to search "office" documents, as has been stated.
Is it possible to use LibreOffice at the command line to do the search instead of grep? Maybe...
PS: No one is calling you a fool...we're just trying to help, youngster
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.