LinuxQuestions.org - regex substring

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - regex substring (https://www.linuxquestions.org/questions/programming-9/regex-substring-4175620142/)

Hi
I have a string: stuff_bitmore_needed_stuffnotneeded_083.txt

The objective is to extract the 'needed' part of the string however it may
not always be 6 characters, it could be more or less.

Is there a regex solution to my problem I wont to use it in a bash script.

Your help is appreciated
Thank you

Certainly - several no doubt.
But to use regex you have be able to precisely define what to keep and/or what to discard. Precisely.

There is probably a regex solution, but you will need to supply a better description of the string and the part you wish to extract.

The basic problem is to work out a regular expression which describes the needed part and how it may be recognized within the entire string.

For example, if as in your string the needed part is always after the second underscore and contains no underscores itself, you might use something like this:

Code:

s/^[^_]+_[^_]+_([^_]+).*/\1/

If you are not familiar with regular expressions you can find many resources using your search engine of choice.

To get help here you will need to provide a few real examples of the strings you want to extract from, along with the results you would expect from each. If there is a pattern to the strings then describing that pattern will lead most directly to the solution.

Umm its more complicated than I thought. All the strings follow the pattern above, all have the underscores in the same positions.

This is the closest i have got: [^_][\w][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]
this gives me: e_needed

I know this very poor, I am experimenting with it on https://regexr.com/

Hope that help

regex always exposed corner cases - be very careful what you choose to use. "\w" is usually defined (e.g. in those engines that choose to follow perlre) to include the underscore character ....

Quote:

Originally Posted by Zero4 (Post 5796687)

If they follow the same pattern then it should be easy enough.

I could not get the substitutions to work at the site you linked, but found this site which does seem to work: regex101.com.

In the expression you show above, you can replace the repeated [a-z]'s with [a-z]+ (one or more characters in range a-z), but I don't think that will do what you want.

The example I gave above does work at the URL I have linked, but you need to enter the match and substitution patterns in separate input elements, like so...

Code:

Regular Expression:

^[^_]+_[^_]+_([^_]+).*



Test String:

stuff_bitmore_needed_stuffnotneeded_083.txt



Substitution:

\1



Result

needed

Do you see why that works?

I would encourage you to open a terminal on your GNU/Linux machine and learn by using grep and sed from the command line. It will teach you the skills without any quirks which web-applications sometimes have, and in the text environment where regular expressions natively exist! Plus you will have all the native documentation available at the same time: man regex, man pcre, man pcresyntax and man pcrepattern, and more!

For example, putting your test patterns in a file named 'infile' and using sed to match/replace (again with the above sample):

Code:

cat infile

stuff_bitmore_needed_stuffnotneeded_083.txt

stuff_junk_wanted_morestuff_ABC.txt

books_worms_desired_trailingjunk_xxx.txt

first_leading_soughtfor_following_ZZZ.txt



sed -r 's/^[^_]+_[^_]+_([^_]+).*/\1/' infile

needed

wanted

desired

soughtfor

Or if regexp is not a religion

Code:

str="stuff_bitmore_needed_stuffnotneeded_083.txt"

echo $str | cut -d_ -f3

if _ is the delimiter you can do it easily (but OP should tell us if that was the case)

Code:

P=( ${str//_/ } )

echo ${P[2]}

to avoid pipe and external tools...

Thank you everyone that contributed. It seems I have a lot to learn.

you are welcome.
If you think your problem is solved, please mark the thread solved. If you have some additional questions, do not hesitate, just ask.
And if you really want to say thanks just click on yes.
(and obviously everyone of us have a lot to learn)