LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help with regular expression, /[a-z][\.\?!]\s+[A-Z]/ (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-regular-expression-%5Ba-z%5D%5B%5C-%5C-%5D%5Cs-%5Ba-z%5D-942362/)

blueskynet 04-29-2012 02:30 AM

Need help with regular expression, /[a-z][\.\?!]\s+[A-Z]/
 
Hi there,
This is my first time here so I hope I don't post this thread in a wrong place.
Can someone help me with this regular expression /[a-z][\.\?!]\s+[A-Z]/ ? I don't really know how this part,[\.\?!] work.

Thank you.

druuna 04-29-2012 02:36 AM

Hi and welcome to LQ!

Certain characters are special and need to be escaped when you want the literal meaning.

A . (dot) tells a regular expression to look for any character, \. on the other hand means a literal . The ? is also special (zero or one), \? means a literal ?

[\.\?!] => look for a dot or a question mark or an exclamation point. (see post #4)

Also have a look here: Regular-Expressions.info

Hope this helps.

grail 04-29-2012 03:21 AM

I would add that it is the user being over cautious to as when in a character list, [], they lose there special meaning :0

David the H. 04-29-2012 03:35 AM

More than just being over-cautious, it actually adds "\" to the characters in the bracket list. Depending on the input, it could even change your results.

I'd like to know what the purpose is for this regex anyway. It appears to be designed to match sentence breaks; lowercase letter, followed by punctuation, spaces, and a capital letter. But since the match also consumes the letters and the punctuation, the application of it would have to be treated with care.

syg00 04-29-2012 03:40 AM

Whoa - all the heavy-hitters.

Time for me to exit stage-left .... :D

David the H. 04-29-2012 03:55 AM

Some sed examples to point out the difference.

Code:


$ echo 'a. B  c? D  e! F  g\ H' | sed -r 's/[a-z][.?!]\s+[A-Z]/---/g'
---  ---  ---  g\ H

$ echo 'a. B  c? D  e! F  g\ H' | sed -r 's/[a-z][\.\?!]\s+[A-Z]/---/g'
---  ---  ---  ---

If you want to preserve any part of the match for the output, you have to use backreferences.

Code:


$ echo 'a. B  c? D  e! F  g\ H' | sed -r 's/([a-z])[.?!]\s+([A-Z])/\1---\2/g'
a---B  c---D  e---F  g\ H

$ echo 'a. B  c? D  e! F  g\ H' | sed -r 's/([a-z])[\.\?!]\s+([A-Z])/\1---\2/g'
a---B  c---D  e---F  g---H


David the H. 04-29-2012 03:58 AM

Quote:

Originally Posted by syg00 (Post 4665823)
Whoa - all the heavy-hitters.

Time for me to exit stage-left .... :D

...says the person with by far the highest post count in the thread. ;)

grail 04-29-2012 04:46 AM

Yeah I'm with David ... I feel we are still the juniors :) (maybe not by age?)

syg00 04-29-2012 05:54 AM

lol - post count is such a mindlessly inane metric.
I keep telling jeremy to delete it all together. Post quality is what elevates all the (other) responders to this (and other) thread.

Quality, not quantity. I've learnt heaps from all of you.

blueskynet 04-29-2012 01:41 PM

Thank you very much. This is actually one of my homework question. The info you guys shared are very helpful. I will keep try out to learn more about this 'ugly' stuff :)
Best to all!


All times are GMT -5. The time now is 05:12 AM.