LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-08-2023, 08:49 AM   #1
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Rep: Reputation: 16
grep both boolean and recursive


I am interested in using grep to find the occurrence of two words in a directory comprising sub-directories and their regular files. Example

francesco@vaio:~/softw/CHARMM_FF$ grep -E 'OC2D1.*NONBONDED | NONBONDED.*OC2D1' ~/softw/CHARMM_FF

where CHARMM_FF is such a directory. Could '-r' be added somewhere, or what else?
Thanks
chiendarret
 
Old 12-08-2023, 08:57 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,310
Blog Entries: 3

Rep: Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722
Do you want a logical AND or a logical OR?

You can do an AND with xargs:

Code:
grep -r -l -E 'FirstPattern' ~/softw/CHARMM_FF/ | xargs grep -H -E 'SecondPattern'
You can do an OR with an operator:

Code:
grep -r -l -E 'FirstPattern|SecondPattern' ~/softw/CHARMM_FF/
Not sure about XOR though.
 
1 members found this post helpful.
Old 12-08-2023, 09:59 AM   #3
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,601

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546

The regex pattern "first.*second|second.*first" will do what you ask, if the words are on the same line OR the compiler has been told "." should include newline.

To do the latter with grep, you can use the -z flag, e.g:
Code:
grep -Erlz 'first.*second|second.*first' directory
Depending on the size of the files and where the words are likely to be, it might be more efficient to use ".*?" instead of ".*", or to set a maximum distance, e.g. with ".{0,1000}", or indeed to use multiple greps - but the example in post #2 should probably be using "-Z" (uppercase) for grep and "-0" (zero) for xargs to handle filenames reliably:
Code:
grep -rlZ 'first' directory | xargs -0 grep -l 'second'
(And if matching individual words - without any regex syntax - one could also use "-Fw" in the above example, so -FwrlZ and -Fwl respectively.)

 
Old 12-08-2023, 10:09 AM   #4
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
Thanks, it saved me much time, but not for all cases.

Quote:
$ grep -r -l -E 'OC2D1' ~/softw/CHARMM_FF/ | xargs grep -H -E 'NONBONDED'
correctly found the file containing NONBONDED data for atom type OC2D1. Great!
....................

Quote:
$ grep -r -l -E 'OBL' ~/softw/CHARMM_FF/ | xargs grep -H -E 'NONBONDED'
did not find NONBONDED data for atom type OBL, while


Quote:
$ grep -r -l -E 'NONBONDED' ~/softw/CHARMM_FF/ | xargs grep -H -E 'OBL'
found NONBONDED section of the file lacking data for any atom type. That is, it seems to me not to have acted as 'AND'

By saying that, I assume that above codes search in the given order, i.e., first before the pipe and than after the pipe.

But probably I am wrong in some way in using your code. My aim is to find data for atom type OBL within NONBONDED

Keeping in mind that section NONBONDED is always the last one in any file.

Thanks

Last edited by chiendarret; 12-08-2023 at 10:11 AM.
 
Old 12-08-2023, 10:18 AM   #5
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,310
Blog Entries: 3

Rep: Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722
Some sample data would needed then, sanitized if necessary. Please show a few lines which include stuff that won't be found along with several permutations of stuff that should be found.
 
Old 12-08-2023, 10:24 AM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,310
Blog Entries: 3

Rep: Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722
Or do you mean per line rather than per file?

Code:
grep -r -H -E 'FirstPattern' ~/softw/CHARMM_FF/ | grep -E 'SecondPattern'
 
Old 12-09-2023, 08:23 AM   #7
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
Given the occurrence of CHARMM and
Quote:
My aim is to find data for atom type OBL within NONBONDED

Keeping in mind that section NONBONDED is always the last one in any file.
then the file format is probably like this.

As grep is line oriented, I would not consider it to be the right tool for the OP to be using.
 
1 members found this post helpful.
Old 12-09-2023, 08:43 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,310
Blog Entries: 3

Rep: Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722
Quote:
Originally Posted by allend View Post
then the file format is probably like this.
It looks difficult to duplicate as a structure and thus search. Maybe try an associative array of associative arrays with lists. I'd try Perl but perhaps YottaDB or similar key-value system might be in order?

I would guess there are Perl modules or sample programs out there already and maybe some Python.
 
Old 12-09-2023, 02:45 PM   #9
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
why not try
Code:
grep -ir 'OC2D1' ~/softw/CHARMM_FF | grep -i 'NONBONDED'
That should produce a list that contains both terms for the entire directory tree.
 
Old 12-10-2023, 05:12 AM   #10
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Quote:
Originally Posted by allend View Post
Given the occurrence of CHARMM and

then the file format is probably like this.

As grep is line oriented, I would not consider it to be the right tool for the OP to be using.
yes, I would go with awk, perl or python. But I'm not really familiar with this format, so I don't really know what the right way would be.
 
Old 12-10-2023, 05:53 AM   #11
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,794

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Quote:
found NONBONDED section of the file lacking data for any atom type.
There seems to be some structure...
In contrast to grep, awk (or perl or python) can parse a structure.
Please provide an input sample!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why a regular grammar cannot be both left-recursive, and right-recursive? ajiten Programming 17 08-15-2023 01:25 PM
Creating an alias in ksh that uses grep and includes 'grep -v grep' doug248 Linux - Newbie 2 08-05-2012 02:07 PM
[SOLVED] chmod directories or files with recursive. not both spezticle Linux - General 3 04-15-2012 02:55 AM
Boolean Algebra and App sebelk Linux - Software 1 05-11-2009 03:05 AM
non Recursive query and Recursive query prashsharma Linux - Server 1 06-27-2007 09:33 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration