LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Problems with regex in sed command (https://www.linuxquestions.org/questions/linux-newbie-8/problems-with-regex-in-sed-command-803766/)

_Linux_Learner 04-23-2010 09:58 AM

Problems with regex in sed command
 
Hi all,

I am going through regex concepts these days but its really confusing... So forgive me if I ask some childish questions here...

I have the following file named Test.txt
Code:

Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...

Now I am running the following commands and the results I am getting are really hard for me to predict....

command 1:
Code:

sed 's/Hary*/ hahahaha /g' Test.txt
Result1:
Code:

hahahaha ry boss linux is a tuff job but u can get it only with consistency.... so  hahahaha ry make sure that u r doing it...

Now look at the amazing regex expressions  hahahaha ry.... they are tolling me badly...

command 2:
Code:

sed 's/Hary?/ hahahaha /g' Test.txt
Result 2:
Code:

Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...

Command 3:
Code:

sed 's/\(Hary\)*/ # /g' Test.txt
Result 3:
Code:

# H # a # r # r # y #  # b # o # s # s #  # l # i # n # u # x #  # i # s #  # a #  # t # u # f # f #  # j # o # b #  # b # u # t #  # u #  # c # a # n #  # g # e # t #  # i # t #  # o # n # l # y #  # w # i # t # h #  # c # o # n # s # i # s # t # e # n # c # y # . # . # . # . #  # s # o #  # H # a # r # r # y #  # m # a # k # e #  # s # u # r # e #  # t # h # a # t #  # u #  # r #  # d # o # i # n # g #  # i # t # . # . # . #
 #
 # N # o # w #  # l # o # o # k #  # a # t #  # t # h # e #  # a # m # a # z # i # n # g #  # r # e # g # e # x #  # e # x # p # r # e # s # s # i # o # n # s #  # H # a # r # r # y # . # . # . # . #  # t # h # e # y #  # a # r # e #  # t # o # l # l # i # n # g #  # m # e #  # b # a # d # l # y # . # . # . #  #  #

Command 4:
Code:

sed 's/\(Hary\)?/ # /g' Test.txt
Result 4:
Code:

Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...

Command 5:
Code:

sed 's/\(Hary\)\*/ # /g' Test.txt
Result 5:
Code:

Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...

Command 6:
Code:

sed 's/\(Hary\)\?/ # /g' Test.txt
Result 6:
Code:

# H # a # r # r # y #  # b # o # s # s #  # l # i # n # u # x #  # i # s #  # a #  # t # u # f # f #  # j # o # b #  # b # u # t #  # u #  # c # a # n #  # g # e # t #  # i # t #  # o # n # l # y #  # w # i # t # h #  # c # o # n # s # i # s # t # e # n # c # y # . # . # . # . #  # s # o #  # H # a # r # r # y #  # m # a # k # e #  # s # u # r # e #  # t # h # a # t #  # u #  # r #  # d # o # i # n # g #  # i # t # . # . # . #
 #
 # N # o # w #  # l # o # o # k #  # a # t #  # t # h # e #  # a # m # a # z # i # n # g #  # r # e # g # e # x #  # e # x # p # r # e # s # s # i # o # n # s #  # H # a # r # r # y # . # . # . # . #  # t # h # e # y #  # a # r # e #  # t # o # l # l # i # n # g #  # m # e #  # b # a # d # l # y # . # . # . #  #  #

I have gone through a number of regex tutorials on google but still unclear with the above results...

Please help me...
Thanks in advance..

Regards
_Linux_Learner

nuwen52 04-23-2010 11:54 AM

Okay, it might be useful to know what you intended to do with each of the commands. But, for now, I can give an understanding of the first one.
Code:

sed 's/Hary*/ hahahaha /g' Test.txt
The regex string being "Hary*" means match all things with "Har" which also have zero or more "y"s behind it. So, in the case of it finding the name "Harry", it would match the first three letters "Har" and then zero or more "y"s behind it (in this case, zero). So, it would then replace that with " hahahaha ", which gives " hahahaha ry" as the final string. The /g just means do this to all occurrences.

Did you perhaps mean for the regex string to be "Harry*"?

pixellany 04-23-2010 01:51 PM

*Learner;

With regard to your original post: Too much information!! It does not take that many examples to illustrate a question or a problem.

The problem **might** be a simple as recognizing that "*" inside of a SED regex is NOT a wildcard.

If this and the post above don't solve the issue, then post a SHORT before and after example of the results you are looking for.

_Linux_Learner 04-23-2010 06:37 PM

Quote:

Originally Posted by pixellany (Post 3945556)
*Learner;

With regard to your original post: Too much information!! It does not take that many examples to illustrate a question or a problem.

The problem **might** be a simple as recognizing that "*" inside of a SED regex is NOT a wildcard.

If this and the post above don't solve the issue, then post a SHORT before and after example of the results you are looking for.

I understood none of the above results.. so please give a clear explanation.. I will be greatfull to all of you..

Thanks in advance..

Regards
_Linux_Learner

MTK358 04-23-2010 07:36 PM

"*" does NOT mean "any character". It means 0 or more of the previous character.

So "Hary*" means "Har" followed by 0 or more "y"s.

Then sed replaces whatever matched with the second string.

So "Hary*" matches "Harry boss linux ...", and because sed then swaps out the match with the replacement text, it becomes " hahahaha ry boss linux ..."

_Linux_Learner 04-23-2010 10:23 PM

Quote:

Originally Posted by MTK358 (Post 3945869)
"*" does NOT mean "any character". It means 0 or more of the previous character.

So "Hary*" means "Har" followed by 0 or more "y"s.

Then sed replaces whatever matched with the second string.

So "Hary*" matches "Harry boss linux ...", and because sed then swaps out the match with the replacement text, it becomes " hahahaha ry boss linux ..."

Hi,

The above explanation is OK. But please explain the results of other 5 commands that I posted...

If * is checking for 0 or more occurrences of y than why ? is not checking the 0 or 1 occurrence of y.. It should also give the sqame result...

Thanks in advance..

Regards
_Linux_Learner

pixellany 04-23-2010 11:37 PM

Please don't expect us to explain every detail..... Sometimes the best advice you get is when people help you to understand basic concepts.

What I have found is that I often have to run tests until I understand what is going on.

Look up the definitions of the various "metacharacters" and try things until the basic concepts become clear.

After posting this, I'll double-check, but:
* means any number of the previous regex
? means that the previous regex either occurs--or does not occur. But does Not occur more than once.

Simple tests will illustrate the difference.

grail 04-24-2010 02:31 AM

@pixellany - did some searching as it was baffling me a little why it wasn't working with "?":

Quote:

\? - As *, but only matches zero or one. It is a GNU extension.

pixellany 04-24-2010 06:25 AM

grail;

I think we're saying the same thing....

"?", meaning "optional"--AKA "occuring zero or once", is part of extended Regexes. To use it in SED, you have escape it ---as you have shown--- or turn on extended Regexes using the SED -r flag.

We seem to have lost our OP

MTK358 04-24-2010 06:29 AM

I almost always use extended regular expressions (sed -r). They are more regular and expressive ;)

grail 04-24-2010 08:58 AM

pixellany, What I was trying to point out is that sed requires slosh (\) or escape prior to the use of ? in a regex.

So the OPs example of - sed 's/Hary?/ hahahaha /g' Test.txt
correctly returns no change as the text "Hary?" is not in the text.

However, this does work:
Code:

sed 's/Hary\?/ hahahaha /g' Test.txt
And will replace "Har" with " hahahaha " for all occurrences

pixellany 04-24-2010 09:01 AM

Quote:

Originally Posted by grail (Post 3946283)
pixellany, What I was trying to point out is that sed requires slosh (\) or escape prior to the use of ? in a regex

Not if you use the -r flag......

grail 04-24-2010 09:22 AM

Sorry I seem to have suffered from RTFA (A for answer) :redface:

Also, I did not know that previously, still sharpening my sedjitsu

pixellany 04-24-2010 09:28 AM

As part of the *n(i|u)x master plan for obfuscation (MPO), various utilities have different ways of turning on extended regexes (EREs)
sed -r
grep -E
egrep
awk (uses ERE by default)

other examples?


All times are GMT -5. The time now is 04:56 AM.