Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
i always get confused when using regular expression. i'am new to shell scripting, and anytime i think i have acquired reasonable understanding of shell script, by this regex i get confused and i always forget everything.
Now i am back to basics of RegEx AGAIN...
Here is the search text: (From Unix Shells By Example 4th Ed)
$ cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
I want to search just word which starts with 's' and ends with 'n'. In this example it is 'southern'.
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
You are mixing matching and output.
Regular expressions are greedy, so the regular expression s.*n matches southern SO Suan Chin. It would have matched southern foo bar as well.
With \bs.*n\b you made sure that the match would be on southern, and not on southern SO Suan Chin.
But it doesn't make any difference, because grep outputs the entire line where a match is found. You will always get the entire line, not the matching word.
You'd better use sed for this:
cat file.txt | sed -n 's/\(^s[a-z]*n\).*/\1/p'
What it does:
-n: don't print until a match is found
\(...\): create a match which is referred to by \1
^: only the first word
s[a-z]*n: must be characters starting with 's', and arbitrary number of a-z and ending with 'n'
.*: make sure the line is matched
\1: print only the first match
p: really do print if a match is found
Please remember that I am not a sed expert, there are member here who can output a Shakespeare sonnet writing down 20 characters of sed code.
Incidentally, gnugrep also offers the -P option, which allows the use of Perl-Compatible Regular Expressions. And in PCRE, you can turn off greediness by following the normally-greedy token with a question mark. Then it will return the shortest possible match instead of the longest.
grep -oP '\bs.*?n\b' infile
And here's a version you can use that will only match whole words that start with s and end with n, even if there's another n inside them (e.g. 'sundown').
grep -oE '\bs\w*n\b' infile
'\w' is a synonym for a "word" character, that is "a-zA-Z0-9_". So it will fail to match if there are any non-word characters between the s and n.
You could also do some clever stuff with \B, which is the inverse of \b, and will only match the zero-width space between two word (or two non-word) characters.
Finally, though, since grep operates in a line-wise fashion, even with the -o option it will still output every matching substring on the line, if there are multiples. You can't make it stop at the first instance (unless the expression is anchored with ^/$), although you can make it stop after the first line, with the -m option.
For finer control when working with column-delimited text like this, you'll probably want to use awk instead.
I found '\bs.*?n\b' very helpful as it return all the words between 's' and 'n' like 'southern', 'south9ern' and 'south#ern'. Whereas '\bs\w*n\b' will only return 'southern' and 'south9ern', which is natural as \w = [a-zA-Z0-9_].
Also, both -E and -P returns same results and that in man page -P is said to be experimental, so I find -E will be safe to use.