LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 07-30-2005, 04:55 PM   #1
mhoch3
LQ Newbie
 
Registered: Jul 2005
Posts: 7

Rep: Reputation: 0
sed to extract multiple matches in a line?


Dear all,

How can I extract all occurrences of a given expression from a line?

so for instance from:

then the three of them went swimming until they were quite tired

and the regexp

the[a-z]

I would like to get

thenthemthey

I'm not too worried about overlapping matches - I'm extracting entries from a list and there are delimiters which never appear in the entries, not even in escaped form. Lucky me!

Regards,

Mydrofiol
 
Old 07-30-2005, 05:17 PM   #2
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 47
Off the top of my head: how about substituting everything that doesn't match (that is, matches the `inverse' regexp) with (empty)?

hth --Jonas
 
Old 07-30-2005, 05:37 PM   #3
mhoch3
LQ Newbie
 
Registered: Jul 2005
Posts: 7

Original Poster
Rep: Reputation: 0
How do you match the inverse of a regexp?

I'm familiar with the use of [^q] to match all chars except q but I've never encountered (yet often wished for!) a not on longer strings. The problem I always supposed is that practically everything doesn't match the regexp.

e.g. in: the boy and his dog
" an"
doesn't match and
and I always believed that sed doesn't cope with overlapping matches, by which I mean that if you look for
b..
in
baboon
you will catch bab but not boo

(and indeed I've just checked..:

echo baboon | sed 's/b..//g'
returns
oon
which is presumably a stifled attempt at muttering the name of G. Hoon, General in charge of the British armed forces. I wonder what my machine knows about him...
)

I know there's a theorem that states that the inverse of every regular language is a regular language, but that theorem never said that representing the inverse can be done in the concise and elegant way I expect from sed.
 
Old 07-31-2005, 09:16 AM   #4
mhoch3
LQ Newbie
 
Registered: Jul 2005
Posts: 7

Original Poster
Rep: Reputation: 0
Although to be fair, I did once have a situation where the regexps for the bits that I didn't want in each line were nice and so I could do what you suggest. This unfortunately isn't straightforward this time, and surely there has to be a better way? If not it should be added to sed. sed extract. sedex. Bound to be good.
 
Old 07-31-2005, 10:13 AM   #5
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 47
Well, the inverse regular *expression* may be hairy, but the Finite Automaton inversion is easy: AcceptStates = AllStates - AcceptStates

So, write a regexp package

Otherwise, try RTM--I seem to recall that it should be reasonably easy w. sed (but not how)

anyone else?

hth --Jonas
 
Old 07-31-2005, 11:28 AM   #6
twsnnva
Member
 
Registered: Oct 2003
Location: Newport News, Va
Distribution: Debian
Posts: 246

Rep: Reputation: 30
Can you use 'grep -o'?

Code:
Thomas@lightning:~$ cat test.txt
then the three of them went swimming until they were quite tired
Thomas@lightning:~$ cat test.txt | grep -o the[a-z]
then
them
they
Thomas@lightning:~$
That will give you a list, if that's not the format you want you could run it through a for loop.

Code:
Thomas@lightning:~$ for i in `cat test.txt|grep -o the[a-z]`; do echo -n $i; done ; echo
thenthemthey
Thomas@lightning:~$
 
Old 08-01-2005, 05:47 AM   #7
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 47
twsnnva:

Of course--grep -o is *the* easy way to do that. One thing though: if test.txt was multiline, Something needs to be done to mark line endings (afai see it).

Anyways, that's for the OP to decide;

OP: I suggest you read the `smart questions' faq--at least the part about describing the goal, not the step.

--Jonas
 
Old 08-01-2005, 08:18 AM   #8
twsnnva
Member
 
Registered: Oct 2003
Location: Newport News, Va
Distribution: Debian
Posts: 246

Rep: Reputation: 30
Quote:
Of course--grep -o is *the* easy way to do that.
You got me there. Though I'm sure we can think of a more complicated way if we put our heads together.

Quote:
One thing though: if test.txt was multiline, Something needs to be done to mark line endings (afai see it).
Put everything in a loop that processes each line individually.
 
Old 08-01-2005, 03:32 PM   #9
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 47
Quote:
Originally posted by twsnnva
Put everything in a loop that processes each line individually.
douh!

I would think it takes a slight speed hit though.

"Premature optimization is the root of all evil."

Knuth, right?

--Jonas
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
printing pattern match and not whole line that matches pattern Avatar33 Programming 13 05-06-2009 06:17 AM
bash: routine outputting both matches and non-matches separately??? Bebo Programming 8 07-19-2004 06:52 AM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 06:12 AM
sed - multiple matches on the same line mjoc27x Programming 6 04-17-2003 07:22 AM
How to extract a part of a line by sed? J_Szucs Programming 2 02-15-2003 06:49 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 05:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration