LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-23-2010, 09:58 AM   #1
_Linux_Learner
Member
 
Registered: Feb 2010
Distribution: Ubuntu
Posts: 87

Rep: Reputation: 15
Unhappy Problems with regex in sed command


Hi all,

I am going through regex concepts these days but its really confusing... So forgive me if I ask some childish questions here...

I have the following file named Test.txt
Code:
Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...
Now I am running the following commands and the results I am getting are really hard for me to predict....

command 1:
Code:
sed 's/Hary*/ hahahaha /g' Test.txt
Result1:
Code:
 hahahaha ry boss linux is a tuff job but u can get it only with consistency.... so  hahahaha ry make sure that u r doing it...

Now look at the amazing regex expressions  hahahaha ry.... they are tolling me badly...
command 2:
Code:
sed 's/Hary?/ hahahaha /g' Test.txt
Result 2:
Code:
Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...
Command 3:
Code:
sed 's/\(Hary\)*/ # /g' Test.txt
Result 3:
Code:
 # H # a # r # r # y #   # b # o # s # s #   # l # i # n # u # x #   # i # s #   # a #   # t # u # f # f #   # j # o # b #   # b # u # t #   # u #   # c # a # n #   # g # e # t #   # i # t #   # o # n # l # y #   # w # i # t # h #   # c # o # n # s # i # s # t # e # n # c # y # . # . # . # . #   # s # o #   # H # a # r # r # y #   # m # a # k # e #   # s # u # r # e #   # t # h # a # t #   # u #   # r #   # d # o # i # n # g #   # i # t # . # . # . # 
 # 
 # N # o # w #   # l # o # o # k #   # a # t #   # t # h # e #   # a # m # a # z # i # n # g #   # r # e # g # e # x #   # e # x # p # r # e # s # s # i # o # n # s #   # H # a # r # r # y # . # . # . # . #   # t # h # e # y #   # a # r # e #   # t # o # l # l # i # n # g #   # m # e #   # b # a # d # l # y # . # . # . #   #   #
Command 4:
Code:
sed 's/\(Hary\)?/ # /g' Test.txt
Result 4:
Code:
Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...
Command 5:
Code:
sed 's/\(Hary\)\*/ # /g' Test.txt
Result 5:
Code:
Harry boss linux is a tuff job but u can get it only with consistency.... so Harry make sure that u r doing it...

Now look at the amazing regex expressions Harry.... they are tolling me badly...
Command 6:
Code:
sed 's/\(Hary\)\?/ # /g' Test.txt
Result 6:
Code:
 # H # a # r # r # y #   # b # o # s # s #   # l # i # n # u # x #   # i # s #   # a #   # t # u # f # f #   # j # o # b #   # b # u # t #   # u #   # c # a # n #   # g # e # t #   # i # t #   # o # n # l # y #   # w # i # t # h #   # c # o # n # s # i # s # t # e # n # c # y # . # . # . # . #   # s # o #   # H # a # r # r # y #   # m # a # k # e #   # s # u # r # e #   # t # h # a # t #   # u #   # r #   # d # o # i # n # g #   # i # t # . # . # . # 
 # 
 # N # o # w #   # l # o # o # k #   # a # t #   # t # h # e #   # a # m # a # z # i # n # g #   # r # e # g # e # x #   # e # x # p # r # e # s # s # i # o # n # s #   # H # a # r # r # y # . # . # . # . #   # t # h # e # y #   # a # r # e #   # t # o # l # l # i # n # g #   # m # e #   # b # a # d # l # y # . # . # . #   #   #
I have gone through a number of regex tutorials on google but still unclear with the above results...

Please help me...
Thanks in advance..

Regards
_Linux_Learner
 
Old 04-23-2010, 11:54 AM   #2
nuwen52
Member
 
Registered: Feb 2009
Distribution: Debian, CentOS 5, Gentoo, FreeBSD, Fedora, Mint, Slackware64
Posts: 208

Rep: Reputation: 46
Okay, it might be useful to know what you intended to do with each of the commands. But, for now, I can give an understanding of the first one.
Code:
sed 's/Hary*/ hahahaha /g' Test.txt
The regex string being "Hary*" means match all things with "Har" which also have zero or more "y"s behind it. So, in the case of it finding the name "Harry", it would match the first three letters "Har" and then zero or more "y"s behind it (in this case, zero). So, it would then replace that with " hahahaha ", which gives " hahahaha ry" as the final string. The /g just means do this to all occurrences.

Did you perhaps mean for the regex string to be "Harry*"?
 
Old 04-23-2010, 01:51 PM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
*Learner;

With regard to your original post: Too much information!! It does not take that many examples to illustrate a question or a problem.

The problem **might** be a simple as recognizing that "*" inside of a SED regex is NOT a wildcard.

If this and the post above don't solve the issue, then post a SHORT before and after example of the results you are looking for.
 
Old 04-23-2010, 06:37 PM   #4
_Linux_Learner
Member
 
Registered: Feb 2010
Distribution: Ubuntu
Posts: 87

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by pixellany View Post
*Learner;

With regard to your original post: Too much information!! It does not take that many examples to illustrate a question or a problem.

The problem **might** be a simple as recognizing that "*" inside of a SED regex is NOT a wildcard.

If this and the post above don't solve the issue, then post a SHORT before and after example of the results you are looking for.
I understood none of the above results.. so please give a clear explanation.. I will be greatfull to all of you..

Thanks in advance..

Regards
_Linux_Learner
 
Old 04-23-2010, 07:36 PM   #5
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
"*" does NOT mean "any character". It means 0 or more of the previous character.

So "Hary*" means "Har" followed by 0 or more "y"s.

Then sed replaces whatever matched with the second string.

So "Hary*" matches "Harry boss linux ...", and because sed then swaps out the match with the replacement text, it becomes " hahahaha ry boss linux ..."
 
Old 04-23-2010, 10:23 PM   #6
_Linux_Learner
Member
 
Registered: Feb 2010
Distribution: Ubuntu
Posts: 87

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by MTK358 View Post
"*" does NOT mean "any character". It means 0 or more of the previous character.

So "Hary*" means "Har" followed by 0 or more "y"s.

Then sed replaces whatever matched with the second string.

So "Hary*" matches "Harry boss linux ...", and because sed then swaps out the match with the replacement text, it becomes " hahahaha ry boss linux ..."
Hi,

The above explanation is OK. But please explain the results of other 5 commands that I posted...

If * is checking for 0 or more occurrences of y than why ? is not checking the 0 or 1 occurrence of y.. It should also give the sqame result...

Thanks in advance..

Regards
_Linux_Learner
 
Old 04-23-2010, 11:37 PM   #7
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Please don't expect us to explain every detail..... Sometimes the best advice you get is when people help you to understand basic concepts.

What I have found is that I often have to run tests until I understand what is going on.

Look up the definitions of the various "metacharacters" and try things until the basic concepts become clear.

After posting this, I'll double-check, but:
* means any number of the previous regex
? means that the previous regex either occurs--or does not occur. But does Not occur more than once.

Simple tests will illustrate the difference.

Last edited by pixellany; 04-23-2010 at 11:41 PM.
 
Old 04-24-2010, 02:31 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,009

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
@pixellany - did some searching as it was baffling me a little why it wasn't working with "?":

Quote:
\? - As *, but only matches zero or one. It is a GNU extension.
 
Old 04-24-2010, 06:25 AM   #9
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
grail;

I think we're saying the same thing....

"?", meaning "optional"--AKA "occuring zero or once", is part of extended Regexes. To use it in SED, you have escape it ---as you have shown--- or turn on extended Regexes using the SED -r flag.

We seem to have lost our OP

Last edited by pixellany; 04-24-2010 at 09:02 AM. Reason: typo
 
Old 04-24-2010, 06:29 AM   #10
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I almost always use extended regular expressions (sed -r). They are more regular and expressive
 
Old 04-24-2010, 08:58 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,009

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
pixellany, What I was trying to point out is that sed requires slosh (\) or escape prior to the use of ? in a regex.

So the OPs example of - sed 's/Hary?/ hahahaha /g' Test.txt
correctly returns no change as the text "Hary?" is not in the text.

However, this does work:
Code:
sed 's/Hary\?/ hahahaha /g' Test.txt
And will replace "Har" with " hahahaha " for all occurrences
 
Old 04-24-2010, 09:01 AM   #12
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by grail View Post
pixellany, What I was trying to point out is that sed requires slosh (\) or escape prior to the use of ? in a regex
Not if you use the -r flag......
 
Old 04-24-2010, 09:22 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,009

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Sorry I seem to have suffered from RTFA (A for answer)

Also, I did not know that previously, still sharpening my sedjitsu
 
Old 04-24-2010, 09:28 AM   #14
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
As part of the *n(i|u)x master plan for obfuscation (MPO), various utilities have different ways of turning on extended regexes (EREs)
sed -r
grep -E
egrep
awk (uses ERE by default)

other examples?

Last edited by pixellany; 04-24-2010 at 09:29 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with sed regex homer_3 Linux - General 1 08-18-2009 01:57 PM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 05:10 AM
Newbie SED / AWK / Regex command help request Critcho Linux - Newbie 10 03-19-2007 11:22 AM
sed RegEx problems InJesus Programming 6 01-12-2007 11:48 AM
Help with Sed and regex cmfarley19 Programming 6 11-18-2004 01:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration