LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Some questions on regular expression(shouldn't be hard,so please help) (https://www.linuxquestions.org/questions/programming-9/some-questions-on-regular-expression-shouldn%27t-be-hard-so-please-help-613471/)

huahd 01-14-2008 08:43 AM

Some questions on regular expression(shouldn't be hard,so please help)
 
Hi!
i'm reading the book Linux:The Textbook by Syed Sarwar.Following the examples on the book, i tried the grep/egrep command using bash on my Fedora8, and then encountered some problems.
all the grep/egrep commands below are used to search the text file students, whose content is:

$cat students
John Johnsen john.johnsen@tp.com 503.555.1111
Hassaan Sarwar hsarwar@k12.st.or 503.444.2132
David Kendall d_kendall@msnbc.org 229.111.2013
John Johnsen jjohnsen@psu.net 301.999.8888
Kelly Kimberly kellyk@umich.gov 555.123.9999
Maham Sarwar msarwark@k12.st.or 713.888.0000
Jamie Davidson j.davidson@uet.edu 515.001.2932
Nabeel Sarwar nsarwar@xyz.net 434.555.1212

OK! question #1:
should all regular expressions be quoted using single or double quotes?
if i type $grep ^[A-H] students, does it simply go through the file to see if there is a line containing string "^[A-H]", rather than treat it as a regular expression?

question #2:
if the answer to #1 is yes, do '' and "" mean the same to each other?

question #3:
look at the result:

$grep '[a-z]\{4\}' students
John Johnsen john.johnsen@tp.com 503.555.1111
Hassaan Sarwar hsarwar@k12.st.or 503.444.2132
David Kendall d_kendall@msnbc.org 229.111.2013
John Johnsen jjohnsen@psu.net 301.999.8888
Kelly Kimberly kellyk@umich.gov 555.123.9999
Maham Sarwar msarwark@k12.st.or 713.888.0000
Jamie Davidson j.davidson@uet.edu 515.001.2932
Nabeel Sarwar nsarwar@xyz.net 434.555.1212

why are all of the lines printed out? i thought the command means only to print those lines containing exactly 4 lowercase letters consecutively. that's what {n} means according to my man page GREP(1).

question #4:
again the command $grep -n '[a-z]\{4\}' students
why are there backslashes?
i executed $grep -n '[a-z]{4}' students and saw nothing.
i checked my man page GREP(1), which says: in basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
so does that mean my grep only support basic regular expression? however, GREP(1) also says: grep understands two different versions of regular expression syntax: "basic” and "extended.” In GNU grep, there is no difference in available functionality using either syntax. In other implementations, basic regular expressions are less powerful. so it seems that my grep is not a GNU grep, but some kind of other implementation.
is that correct? if so,what implementation is my grep? and what is GNU grep anyway?

thanks!:newbie:

Telemachos 01-14-2008 11:33 AM

You should probably spend a little time reading through a good tutorial on grep itself. I like this one: http://www.panix.com/~elflord/unix/grep.html Here are some brief answers though:

1 & 2: You need to put quote marks around all but the simplest grep patterns, and the single and double quote marks have slightly different meanings. You will generally want the single ones. For more, see the section in that tutorial on quoting.

3: All the lines match because of of those lines does have 4 consecutive lower-case letters. You are probably thinking that they shouldn't match if there are more than four letters, but the regex matches as soon as it hits the four lower-case letters in a row. Change the expression to this
Code:

grep '[a-z]\{8\}' students
and you will get different results. If you want four letters and then word end, you need to specify that in your search.

4: Various "magic" characters have to have a backslash in front of them in order to activate (or sometimes to turn off) their magic. Again, the tutorial will tell you more. GNU grep is one version of grep. There are many different versions, and they sometimes have different default behaviors about important things like backslashes and magic characters.

huahd 01-15-2008 07:04 AM

thanks telemachos, i've read the grep tutorial
that helped me a lot:)


All times are GMT -5. The time now is 02:56 PM.