LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   regex substring (https://www.linuxquestions.org/questions/programming-9/regex-substring-4175620142/)

Zero4 12-22-2017 07:51 PM

regex substring
 
Hi
I have a string: stuff_bitmore_needed_stuffnotneeded_083.txt

The objective is to extract the 'needed' part of the string however it may
not always be 6 characters, it could be more or less.

Is there a regex solution to my problem I wont to use it in a bash script.

Your help is appreciated
Thank you

syg00 12-22-2017 08:06 PM

Certainly - several no doubt.
But to use regex you have be able to precisely define what to keep and/or what to discard. Precisely.

astrogeek 12-22-2017 08:07 PM

There is probably a regex solution, but you will need to supply a better description of the string and the part you wish to extract.

The basic problem is to work out a regular expression which describes the needed part and how it may be recognized within the entire string.

For example, if as in your string the needed part is always after the second underscore and contains no underscores itself, you might use something like this:

Code:

s/^[^_]+_[^_]+_([^_]+).*/\1/
If you are not familiar with regular expressions you can find many resources using your search engine of choice.

To get help here you will need to provide a few real examples of the strings you want to extract from, along with the results you would expect from each. If there is a pattern to the strings then describing that pattern will lead most directly to the solution.

Zero4 12-22-2017 10:13 PM

Umm its more complicated than I thought. All the strings follow the pattern above, all have the underscores in the same positions.

This is the closest i have got: [^_][\w][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]
this gives me: e_needed

I know this very poor, I am experimenting with it on https://regexr.com/

Hope that help

syg00 12-22-2017 11:26 PM

regex always exposed corner cases - be very careful what you choose to use. "\w" is usually defined (e.g. in those engines that choose to follow perlre) to include the underscore character ....

astrogeek 12-22-2017 11:54 PM

Quote:

Originally Posted by Zero4 (Post 5796687)
Umm its more complicated than I thought. All the strings follow the pattern above, all have the underscores in the same positions.

This is the closest i have got: [^_][\w][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]
this gives me: e_needed

I know this very poor, I am experimenting with it on https://regexr.com/

Hope that help

If they follow the same pattern then it should be easy enough.

I could not get the substitutions to work at the site you linked, but found this site which does seem to work: regex101.com.

In the expression you show above, you can replace the repeated [a-z]'s with [a-z]+ (one or more characters in range a-z), but I don't think that will do what you want.

The example I gave above does work at the URL I have linked, but you need to enter the match and substitution patterns in separate input elements, like so...

Code:

Regular Expression:
^[^_]+_[^_]+_([^_]+).*

Test String:
stuff_bitmore_needed_stuffnotneeded_083.txt

Substitution:
\1

Result
needed

Do you see why that works?

I would encourage you to open a terminal on your GNU/Linux machine and learn by using grep and sed from the command line. It will teach you the skills without any quirks which web-applications sometimes have, and in the text environment where regular expressions natively exist! Plus you will have all the native documentation available at the same time: man regex, man pcre, man pcresyntax and man pcrepattern, and more!

For example, putting your test patterns in a file named 'infile' and using sed to match/replace (again with the above sample):

Code:

cat infile
stuff_bitmore_needed_stuffnotneeded_083.txt
stuff_junk_wanted_morestuff_ABC.txt
books_worms_desired_trailingjunk_xxx.txt
first_leading_soughtfor_following_ZZZ.txt

sed -r 's/^[^_]+_[^_]+_([^_]+).*/\1/' infile
needed
wanted
desired
soughtfor


keefaz 12-23-2017 06:51 AM

Or if regexp is not a religion
Code:

str="stuff_bitmore_needed_stuffnotneeded_083.txt"
echo $str | cut -d_ -f3


pan64 12-23-2017 09:49 AM

if _ is the delimiter you can do it easily (but OP should tell us if that was the case)
Code:

P=( ${str//_/ } )
echo ${P[2]}

to avoid pipe and external tools...

Zero4 12-23-2017 06:28 PM

Thank you everyone that contributed. It seems I have a lot to learn.

pan64 12-24-2017 03:46 AM

you are welcome.
If you think your problem is solved, please mark the thread solved. If you have some additional questions, do not hesitate, just ask.
And if you really want to say thanks just click on yes.
(and obviously everyone of us have a lot to learn)


All times are GMT -5. The time now is 02:54 PM.