LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   simple regex question (https://www.linuxquestions.org/questions/linux-newbie-8/simple-regex-question-4175462407/)

asherbarasher 05-17-2013 05:29 PM

simple regex question
 
hi everyone.
i am very new to linux, so i think this is the best forum to post such a question.

I'm trying to construct regex to work with sed, i need it to match words with every third character z.
i don't understand why this doesn't work:
ls | sed -n '/..z*$/p'
Dot should stand for any single character, but this match just everything.
Also if you can point me to any good tutorial for beginners in regex
it will be very helpful.
THank you in advance.

chrism01 05-17-2013 07:04 PM

Start here http://www.grymoire.com/Unix/Sed.html#uh-0.
Watch that use of '*', it matches anything.

Diantre 05-18-2013 01:30 AM

Quote:

Originally Posted by asherbarasher (Post 4953390)
i don't understand why this doesn't work:
ls | sed -n '/..z*$/p'

It's because 'z*' matches either no 'z', one 'z' or more than one 'z'. Maybe this will work better:

Code:

ls | sed -n '/..z.*$/p'

syg00 05-18-2013 02:33 AM

Most (all ?) regex questions are simple. It's the resolution that gets complex ... :eek:

asherbarasher 05-18-2013 04:30 AM

Quote:

Originally Posted by chrism01 (Post 4953411)
Start here http://www.grymoire.com/Unix/Sed.html#uh-0.
Watch that use of '*', it matches anything.

Thanks for this link, i don't know how i didn't find it but its great tutorial.

Quote:

It's because 'z*' matches either no 'z', one 'z' or more than one 'z'. Maybe this will work better:
No it doesn't work either, it just lists everything in directory.

syg00 05-18-2013 04:37 AM

In which case you should provide such so we can evaluate what is happening based on your data.

I was about to comment that last I looked the grymoire site was somewhat dated, but I see a recent attribution. Goodness.

asherbarasher 05-18-2013 04:51 AM

Quote:

Originally Posted by syg00 (Post 4953604)
In which case you should provide such so we can evaluate what is happening based on your data.

I was about to comment that last I looked the grymoire site was somewhat dated, but I see a recent attribution. Goodness.

Actually, i don't need it for work task i just wonder how to make this kind of regular expression.
I thought it should be pretty easy but stuck unexpectedly. :)

Diantre 05-18-2013 03:48 PM

Quote:

Originally Posted by asherbarasher (Post 4953600)
No it doesn't work either, it just lists everything in directory.

Then samples of your data would be useful, as syg00 points out.

The regex actually works, in my system it lists .tar.gz, .zip, and a couple of files ending in 'z', in a directory containing several types of files.

David the H. 05-19-2013 11:59 AM

You shouldn't be parsing ls for filenames anyway. For simple name pattern matching you can almost always use simple globbing patterns.

Code:

printf '%s\n' ??z*
For matching by more advanced criteria, use find

asherbarasher 05-19-2013 12:09 PM

Quote:

Then samples of your data would be useful, as syg00 points out.
The regex actually works, in my system it lists .tar.gz, .zip, and a couple of files ending in 'z', in a directory containing several types of files.
I've no problems with first and last, i use ^ and $ operators respectively. But when i try to point to second symbol or third here is the problem.
Here's example:
[root@lab2 bin]# cd /usr/bin
[root@lab2 bin]# ls | sed -n '/..z.*$/p'
abrt-action-analyze-backtrace
abrt-action-analyze-c
abrt-action-analyze-core
abrt-action-analyze-oops
abrt-action-analyze-python
bluetooth-wizard
bunzip2
compiz
compiz-gtk
egroupwarewizard
eu-size
funzip
gettextize
gpg-zip
groupwarewizard
groupwisewizard
gunzip
hg-viz
htfuzzy
--omitted--
*****************
And same with grep.
As you can see, it lists everything in the directory. My question is how can i filter output, based on definition of every second or third or whatever character.

David the H. 05-19-2013 01:10 PM

It's simple. You use shell globbing, as I said before. or find.

Tools like grep/sed/awk are designed for text processing, not filename matching. Do not try to filter the output of ls for names or metadata.


One thing to remember about regex, by the way, is that it's unanchored by default. You do not need to use ^/$ unless you specifically need the match those positions exactly, and you don't need to give any more than is necessary to uniquely match the string. (e.g. '^..z' will return any string with 'z' as the third character.)

Globbing is more limited though, in that the pattern must match the entire string, usually with the use of "*" wildcards.

divyashree 05-19-2013 01:29 PM

Quote:

Originally Posted by asherbarasher (Post 4953390)
hi everyone.
i am very new to linux, so i think this is the best forum to post such a question.

I'm trying to construct regex to work with sed, i need it to match words with every third character z.
i don't understand why this doesn't work:
ls | sed -n '/..z*$/p'
Dot should stand for any single character, but this match just everything.
Also if you can point me to any good tutorial for beginners in regex
it will be very helpful.
THank you in advance.

This should work definitely as you want
with grep:

Code:

ls  |egrep  '^..z.*$'
with sed:
Code:

ls  |sed  -n '/^..z.*$/p'

Diantre 05-19-2013 01:31 PM

Quote:

Originally Posted by asherbarasher (Post 4954338)
As you can see, it lists everything in the directory.

I honestly don't know why it's not working for you. In my system, the following commands show exactly the same output (in /usr/bin):

Code:

$ ls | sed -n '/^..z/p'
$ ls | grep '^..z'
$ printf '%s\n' ??z*
$ find . -iname '??z*'

The four commands show all entries where the third letter is a 'z':

Code:

bzz
fiz
lrz
lrzip
lrztar
lrzuntar
lsz
maze
mkzftree
mozilla
p7zip
size
size86
unzip
unzipsfx

And as David the H. points out, it's better not to use grep and sed for this purpose.

rabirk 05-19-2013 06:59 PM

I can't help with giving you the regex, but for a reference, try Regular-Expressions.info .

chrism01 05-19-2013 08:46 PM

This is the book on regex http://regex.info/book.html :)


All times are GMT -5. The time now is 08:15 AM.