Regular expression help. "?"

matthewg42 · 12-08-2006, 09:27 AM

When you have a list of RE elements in [square brackets], the square-brackets part matches any of the charatcers between the brackets. So [(a.*b)] matches any line with a, b, . or *.

A good way to test your expressions it to turn on colourised output:

Code:

export GREP_OPTIONS=--colour=auto

And then to type in the grep command with no input file specified and nothing piped in. Then just start typing. When you hit return, if your input line matches you'll see it repeated with the matching parts coloured in:

Code:

$ export GREP_OPTIONS=--colour=auto
$ egrep '[(a.*b)(b.*a)]'
abcdefg.* (we like it)
abcdefg.* (we like it)
nothing works here

Note that the "nothing works here" line was not echoed back - it did not match the expression.

When you're done, press control-d (at the start of a line) to finish your test.

There's probably a better way to do this, but here's how I'd make an expression meaning "match any line where it contains an a but no b, OR a b but no a":

Code:

^[^b]*a[^b]*$|^[^a]*b[^a]*$

Looks ugly, but here's how it works:

The expression is split in two parts with the | character, which means that the whole expression matches if one or other sub-expressions matches. For example a|b|c is the same as [abc].
OK the first of the two sub-expressions is:
Code:
```
^[^b]*a[^b]*$
```
Let's break it down. The ^ at the start of the expression means "only match this expression if it is anchored to the start of a line".
The [^b] part means "match any character which is not b" (yes, it's a little confusing that the ^ means "not" here and "start of line" outside [square brackets]).
The * means "the previous bit 0 or more times" (the previous bit being the [^b] - any non-b character).
The a is treated as a literal "a" character.
The [^b]* as before.
The $ means "end of line".
So that whole thing ^[^b]*a[^b]*$ means "a whole line containing an a, but no b's".
Now you should be able to see that the whole expression says "a line containing an a but no b's or a line containing a b but no a's".

Enjoy

zetabill · 12-08-2006, 10:58 AM

Interesting thread. It's been a while since I took my bash class but I thought that

Code:

ls | grep a*

is only going match any line with zero or more instances of "a"... and that's it. From the way I understood, on the bash command line the "|" (pipe) means take the output from pre-pipe and feed, as a file, the output to post-pipe. So why does grep even have the opportunity to match the glob because it already has the output from ls? I never knew that but it's good to know that now. I'll be sure to put "quotes" around my regular expressions from now on. I also didn't know that there are more than one type of regular expression.

Good stuff.

I do know this, though:
grep = Global Regular Expression Parser

matthewg42 · 12-08-2006, 01:02 PM

[QUOTE=zetabill]Interesting thread. It's been a while since I took my bash class but I thought that

Code:

ls | grep a*

is only going match any line with zero or more instances of "a"... and that's it.
Please re-read post #8. It will usually work, but not for the reason you might think, and when you have more than one file the current working directory it won't work at all.

Quote:

Originally Posted by zetabill

From the way I understood, on the bash command line the "|" (pipe) means take the output from pre-pipe and feed, as a file, the output to post-pipe.

That's a little ambiguous because some programs don't take their input from standard input, or don't write their output to standard output. For example, most of the GNU core utilities (grep included ) will take their input from standard input only if no files are specified as part of the command.

It is more precise to say that the pipe connects the standard output file handle of the command to the left of the pipe symbol to the standard input file handle of the command to the right of the pipe. Thus pipes are only useful for programs which read from stdin and/or write to stdout.

Quote:

Originally Posted by zetabill

So why does grep even have the opportunity to match the glob because it already has the output from ls?

The shell tries to expand glob patterns before passing that argument list to programs. So in the example above, if a* matches any files, that list of files is passed to grep, and grep will not see the pattern. Again, I refer you to post #8, which has a detailed example of what is going on.

When the shell interprets a command with pipes in it, it doesn't execute the processes in the order they appear on the command line, and then move the output through them. The pipes are connected together before any of the processes create any output and then the data flows through the pipeline.

Quote:

Originally Posted by zetabill

I never knew that but it's good to know that now. I'll be sure to put "quotes" around my regular expressions from now on. I also didn't know that there are more than one type of regular expression.

Good stuff.

I do know this, though:
grep = Global Regular Expression Parser

Another nice one. I wonder how many different "reasons" for the name grep there are?

cdex · 12-08-2006, 01:59 PM

Regarding the bash i just want to mention the Advanced Bash-Scripting Guide. It saved me a lot of headaches (next to the manpage

) -> some things i find faster in the man page some are better explained in the guide ...

makyo · 12-08-2006, 02:26 PM

Hi.

Quote:

Originally Posted by matthewg42

... I wonder how many different "reasons" for the name grep there are?

Like the Highlander: There Can Be Only One.

... cheers, makyo

Quote:

Originally Posted by Brian Kernighan

The name comes from the ed command g/regular-expression/p ...
The UNIX Programming Environment, Kernighan and Pike, 1984, Prentice-Hall, page 18

Also

Quote:

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The correct title of this article is grep. The initial letter is shown capitalized due to technical restrictions.

grep is a command line utility that was originally written for use with the Unix operating system. The default behavior of grep takes a regular expression on the command line, reads standard input or a list of files, and outputs the lines containing matches for the regular expression.

The name comes from a command in the Unix text editor ed that takes the form:

g/re/p

which means "search globally for lines matching the regular expression, and print them". There are various command line switches available when using grep that modify the default behavior.

Other (incorrect) backronyms of the name exist, including: General Regular Expression Parser, General Regular Expression Print, Global Regular Expression Parser, and Global Regular Expression Print, though the last example is not entirely wrong.
http://en.wikipedia.org/wiki/Grep

BWK's credentials include:

Quote:

Brian Kernighan
From Wikipedia, the free encyclopedia
(Redirected from Kernighan)
Jump to: navigation, search

Brian Wilson Kernighan, (IPA pronunciation: ['kɛrnɪˌhæn], the 'g' is silent); born 1942, is a computer scientist who worked at the Bell Labs and contributed to the design of the pioneering AWK and AMPL programming languages. He is also the author of the famous Hello, world program.

Kernighan's name became widely known through co-authorship of the first book on the C programming language with Dennis Ritchie. Kernighan has said that he had no part in the design of the C language: "It's entirely Dennis Ritchie's work". He authored many Unix programs, including ditroff. -- more at:
http://en.wikipedia.org/wiki/Kernighan

( edit 1: correct quote )

m4a1rifle · 12-09-2006, 12:26 AM

im using ubuntu in vmware.
when using grep '^[a-z]$' dict-file
the output is ABCDEFGHJIKLMOOPQRSTUVWXYabcdefghijklmnopqrstuvwxyz
thts capital A-Y then lowercase a-z

is it wmware fault?

Mr. ameya sathe · 03-05-2008, 05:05 AM

Given:
Regular expression
[[:alpha:]][0-9]*[[:alpha:]]

Using the above expression, I grep on the file
/usr/share/dict/linux.words
on the bash prompt.
i.e.

Code:

 grep '[[:alpha:]][0-9]*[[:alpha:]]' /usr/share/dict/linux.words

Then, the following output was observed-

10-point
10th
11-point
12-point

etc..

How come this output is seen?
Isn't the output supposed to be similar to

ab
a1b

i.e. an alphabet followed by digit followed by an alphabet
OR
an alphabet followed by an alphabet.

acid_kewpie · 03-05-2008, 05:07 AM

please don't drag up dead threads. new question = new thread.

kees-jan · 03-05-2008, 05:15 AM

Grep searches for substring. You'll note that all strings found contain two consecutive alphabet characters (not unexpected when searching a dictionary).

Use ^ to match the beginning of a line, and $ to match the end of the line. Note that $ also has special meaning to the shell, so you might want to add some quotes.

Groetjes,

Kees-Jan

jukebox55 · 03-05-2008, 07:46 AM

yes its interesting stuff, i first learned of regular expressions when i read 'oreilly - bash in a nutshell', i knew it was some way of using patterns to match text, but i didnt really take it all in.

the term 'regular expression' didnt help either!

thanks to Matthew et al ive got a better handle on it (although i still dont fully understand it, and i suppose its what matthew was saying, you have to practice and read, practice and re-read.)

a big problem for me was realizing that the shell works in its own way and may misinterpret the regular expression if the user was not careful, hence the need for quoting etc.

the other problem i have is difficult to explain, but its regarding structure of commandlines etc, its hazy at the moment, but with each good explaination i understand a little more

Mr. ameya sathe · 03-05-2008, 11:09 AM

oh ho!!! while searching through Man pages,theory & linuxquestions.org
I have realised that,there are things like
1. Wildcard characters, which have been popularly used in DOS
2.There are regular expressions as taught in subjects like Theory of Computational Science.
3. There are also, Extended regular expressions(R.E.).
4. There is Pattern Matching which looks very similar to R.E. but, which are not.
5. And, finally there is Perl-style regular expressions.

Oh GOD!! Save Me. What should I learn??