Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How can I extract all occurrences of a given expression from a line?
so for instance from:
then the three of them went swimming until they were quite tired
and the regexp
the[a-z]
I would like to get
thenthemthey
I'm not too worried about overlapping matches - I'm extracting entries from a list and there are delimiters which never appear in the entries, not even in escaped form. Lucky me!
I'm familiar with the use of [^q] to match all chars except q but I've never encountered (yet often wished for!) a not on longer strings. The problem I always supposed is that practically everything doesn't match the regexp.
e.g. in: the boy and his dog
" an"
doesn't match and
and I always believed that sed doesn't cope with overlapping matches, by which I mean that if you look for
b..
in
baboon
you will catch bab but not boo
(and indeed I've just checked..:
echo baboon | sed 's/b..//g'
returns
oon
which is presumably a stifled attempt at muttering the name of G. Hoon, General in charge of the British armed forces. I wonder what my machine knows about him...
)
I know there's a theorem that states that the inverse of every regular language is a regular language, but that theorem never said that representing the inverse can be done in the concise and elegant way I expect from sed.
Although to be fair, I did once have a situation where the regexps for the bits that I didn't want in each line were nice and so I could do what you suggest. This unfortunately isn't straightforward this time, and surely there has to be a better way? If not it should be added to sed. sed extract. sedex. Bound to be good.
Thomas@lightning:~$ cat test.txt
then the three of them went swimming until they were quite tired
Thomas@lightning:~$ cat test.txt | grep -o the[a-z]
then
them
they
Thomas@lightning:~$
That will give you a list, if that's not the format you want you could run it through a for loop.
Code:
Thomas@lightning:~$ for i in `cat test.txt|grep -o the[a-z]`; do echo -n $i; done ; echo
thenthemthey
Thomas@lightning:~$
Of course--grep -o is *the* easy way to do that. One thing though: if test.txt was multiline, Something needs to be done to mark line endings (afai see it).
Anyways, that's for the OP to decide;
OP: I suggest you read the `smart questions' faq--at least the part about describing the goal, not the step.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.