Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
07-30-2005, 04:55 PM
|
#1
|
LQ Newbie
Registered: Jul 2005
Posts: 7
Rep:
|
sed to extract multiple matches in a line?
Dear all,
How can I extract all occurrences of a given expression from a line?
so for instance from:
then the three of them went swimming until they were quite tired
and the regexp
the[a-z]
I would like to get
thenthemthey
I'm not too worried about overlapping matches - I'm extracting entries from a list and there are delimiters which never appear in the entries, not even in escaped form. Lucky me!
Regards,
Mydrofiol
|
|
|
07-30-2005, 05:17 PM
|
#2
|
Senior Member
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524
Rep:
|
Off the top of my head: how about substituting everything that doesn't match (that is, matches the `inverse' regexp) with (empty)?
hth --Jonas
|
|
|
07-30-2005, 05:37 PM
|
#3
|
LQ Newbie
Registered: Jul 2005
Posts: 7
Original Poster
Rep:
|
How do you match the inverse of a regexp?
I'm familiar with the use of [^q] to match all chars except q but I've never encountered (yet often wished for!) a not on longer strings. The problem I always supposed is that practically everything doesn't match the regexp.
e.g. in: the boy and his dog
" an"
doesn't match and
and I always believed that sed doesn't cope with overlapping matches, by which I mean that if you look for
b..
in
baboon
you will catch bab but not boo
(and indeed I've just checked..:
echo baboon | sed 's/b..//g'
returns
oon
which is presumably a stifled attempt at muttering the name of G. Hoon, General in charge of the British armed forces. I wonder what my machine knows about him...
)
I know there's a theorem that states that the inverse of every regular language is a regular language, but that theorem never said that representing the inverse can be done in the concise and elegant way I expect from sed.
|
|
|
07-31-2005, 09:16 AM
|
#4
|
LQ Newbie
Registered: Jul 2005
Posts: 7
Original Poster
Rep:
|
Although to be fair, I did once have a situation where the regexps for the bits that I didn't want in each line were nice and so I could do what you suggest. This unfortunately isn't straightforward this time, and surely there has to be a better way? If not it should be added to sed. sed extract. sedex. Bound to be good.
|
|
|
07-31-2005, 10:13 AM
|
#5
|
Senior Member
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524
Rep:
|
Well, the inverse regular *expression* may be hairy, but the Finite Automaton inversion is easy: AcceptStates = AllStates - AcceptStates
So, write a regexp package
Otherwise, try RTM--I seem to recall that it should be reasonably easy w. sed (but not how)
anyone else?
hth --Jonas
|
|
|
07-31-2005, 11:28 AM
|
#6
|
Member
Registered: Oct 2003
Location: Newport News, Va
Distribution: Debian
Posts: 246
Rep:
|
Can you use 'grep -o'?
Code:
Thomas@lightning:~$ cat test.txt
then the three of them went swimming until they were quite tired
Thomas@lightning:~$ cat test.txt | grep -o the[a-z]
then
them
they
Thomas@lightning:~$
That will give you a list, if that's not the format you want you could run it through a for loop.
Code:
Thomas@lightning:~$ for i in `cat test.txt|grep -o the[a-z]`; do echo -n $i; done ; echo
thenthemthey
Thomas@lightning:~$
|
|
|
08-01-2005, 05:47 AM
|
#7
|
Senior Member
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524
Rep:
|
twsnnva:
Of course--grep -o is *the* easy way to do that. One thing though: if test.txt was multiline, Something needs to be done to mark line endings (afai see it).
Anyways, that's for the OP to decide;
OP: I suggest you read the `smart questions' faq--at least the part about describing the goal, not the step.
--Jonas
|
|
|
08-01-2005, 08:18 AM
|
#8
|
Member
Registered: Oct 2003
Location: Newport News, Va
Distribution: Debian
Posts: 246
Rep:
|
Quote:
Of course--grep -o is *the* easy way to do that.
|
You got me there. Though I'm sure we can think of a more complicated way if we put our heads together.
Quote:
One thing though: if test.txt was multiline, Something needs to be done to mark line endings (afai see it).
|
Put everything in a loop that processes each line individually.
|
|
|
08-01-2005, 03:32 PM
|
#9
|
Senior Member
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524
Rep:
|
Quote:
Originally posted by twsnnva
Put everything in a loop that processes each line individually.
|
douh!
I would think it takes a slight speed hit though.
"Premature optimization is the root of all evil."
Knuth, right?
--Jonas
|
|
|
All times are GMT -5. The time now is 10:58 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|