LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Identify words having repeated character strings (https://www.linuxquestions.org/questions/programming-9/identify-words-having-repeated-character-strings-930345/)

danielbmartin 02-21-2012 03:30 PM

Quote:

Originally Posted by Cedrik (Post 4608579)
Code:

grep -P '(?=(....))(.).*\'

Perfect! Thank you!

Daniel B. Martin

Cedrik 02-21-2012 03:42 PM

You can remove the last parentheses (like my edited previous post)
They are not needed and slow down performance (they were left from my trial and error attempts)

ps: is it me or there is a forum bug in this page ?

(edit: fixed now :))

danielbmartin 02-21-2012 04:22 PM

Quote:

Originally Posted by Cedrik (Post 4608597)
... is it me or there is a forum bug in this page ?

Something is broken.

Daniel B. Martin

danielbmartin 02-22-2012 07:22 AM

Quote:

Originally Posted by danielbmartin (Post 4608626)
Something is broken.

... and has been fixed. Good.

Daniel B. Martin

uhelp 02-22-2012 07:42 AM

Quote:

Originally Posted by danielbmartin (Post 4607498)
I have not ignored your advice. To my eyes, the cat improves code readability. Since it performs no logic I assume the cost of the cat is negligible, particularly in code where it is the first of a long string of piped commands. I have not tested this assumption and may be wrong.
Daniel B. Martin

Well a "grep 'something' inputFile doesn't either perform any logic to get it's input.

And you are spawning a subprocess.
This script might stop, if there are no more process ids available while the script without the useless use of cat would still work.
Admitted a very rare condition.

But imagine running the cat construct in a long running loop you WILL see a a heavy performance penalty.

I can't get the point of readability.
The "grep what where" pattern is to basic to be unreadable imho

The more you use the more errors can arise.
Are you aware of the buffering mechanism which bash does when it comes to piping?
The source of subtle failures.
Lucky debugge!

Adhere to the KISS KeepItStupidSimple pattern.
Do only do what really is required.

Cedrik 02-22-2012 07:45 AM

Trying to explain the reg exp...
Code:

(?=(....))..*\1
With (....) match, the reg exp would capture 4 chars in buffer and advance position in the string by 4 chars
So with 'ratatat' example
Code:

echo ratatat | perl -ne 'print "$1\n" while /(....)/g'
rata

With (?=(....)), the reg exp captures 4 chars in buffer and position stays in place, the next search will advance position by one char
Code:

echo ratatat | perl -ne 'print "$1\n" while /(?=(....))/g'
rata
atat
tata
atat

The dot following (?=(....)) enforces the match only if there is at least one char before the captured buffer match, else any >=4 chars words would match with only .*\1

danielbmartin 02-22-2012 09:35 AM

Quote:

Originally Posted by uhelp (Post 4609165)
I can't get the point of readability.
The "grep what where" pattern is to basic to be unreadable imho

Coding style is a matter of personal preference. Readable to you might not be to me, and vice-versa. In some workplaces a particular style may be mandated for the sake of uniformity. My programming is purely recreational so I am free to suit my own preferences.

Quote:

Are you aware of the buffering mechanism which bash does when it comes to piping?
Most of my Linux pipes are embedded in REXX programs, so I assume bash is not involved.

Quote:

Do only do what really is required.
Agreed, but your idea of "required" may be different from mine. I prize code readability. As a Linux newbie, readable code may be more important to me than you or others.

Daniel B. Martin

danielbmartin 02-23-2012 11:51 AM

Thanks to everyone who contributed to this thread.
Having received several excellent solutions, this problem will be marked SOLVED!

Daniel B. Martin


All times are GMT -5. The time now is 10:02 PM.