LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-22-2017, 07:51 PM   #1
Zero4
Member
 
Registered: Jun 2007
Posts: 67

Rep: Reputation: 0
regex substring


Hi
I have a string: stuff_bitmore_needed_stuffnotneeded_083.txt

The objective is to extract the 'needed' part of the string however it may
not always be 6 characters, it could be more or less.

Is there a regex solution to my problem I wont to use it in a bash script.

Your help is appreciated
Thank you
 
Old 12-22-2017, 08:06 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 16,653

Rep: Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453
Certainly - several no doubt.
But to use regex you have be able to precisely define what to keep and/or what to discard. Precisely.
 
Old 12-22-2017, 08:07 PM   #3
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,664
Blog Entries: 6

Rep: Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525
There is probably a regex solution, but you will need to supply a better description of the string and the part you wish to extract.

The basic problem is to work out a regular expression which describes the needed part and how it may be recognized within the entire string.

For example, if as in your string the needed part is always after the second underscore and contains no underscores itself, you might use something like this:

Code:
s/^[^_]+_[^_]+_([^_]+).*/\1/
If you are not familiar with regular expressions you can find many resources using your search engine of choice.

To get help here you will need to provide a few real examples of the strings you want to extract from, along with the results you would expect from each. If there is a pattern to the strings then describing that pattern will lead most directly to the solution.
 
Old 12-22-2017, 10:13 PM   #4
Zero4
Member
 
Registered: Jun 2007
Posts: 67

Original Poster
Rep: Reputation: 0
Umm its more complicated than I thought. All the strings follow the pattern above, all have the underscores in the same positions.

This is the closest i have got: [^_][\w][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]
this gives me: e_needed

I know this very poor, I am experimenting with it on https://regexr.com/

Hope that help
 
Old 12-22-2017, 11:26 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 16,653

Rep: Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453Reputation: 2453
regex always exposed corner cases - be very careful what you choose to use. "\w" is usually defined (e.g. in those engines that choose to follow perlre) to include the underscore character ....
 
Old 12-22-2017, 11:54 PM   #6
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,664
Blog Entries: 6

Rep: Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525Reputation: 2525
Quote:
Originally Posted by Zero4 View Post
Umm its more complicated than I thought. All the strings follow the pattern above, all have the underscores in the same positions.

This is the closest i have got: [^_][\w][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]
this gives me: e_needed

I know this very poor, I am experimenting with it on https://regexr.com/

Hope that help
If they follow the same pattern then it should be easy enough.

I could not get the substitutions to work at the site you linked, but found this site which does seem to work: regex101.com.

In the expression you show above, you can replace the repeated [a-z]'s with [a-z]+ (one or more characters in range a-z), but I don't think that will do what you want.

The example I gave above does work at the URL I have linked, but you need to enter the match and substitution patterns in separate input elements, like so...

Code:
Regular Expression:
^[^_]+_[^_]+_([^_]+).*

Test String:
stuff_bitmore_needed_stuffnotneeded_083.txt

Substitution:
\1

Result
needed
Do you see why that works?

I would encourage you to open a terminal on your GNU/Linux machine and learn by using grep and sed from the command line. It will teach you the skills without any quirks which web-applications sometimes have, and in the text environment where regular expressions natively exist! Plus you will have all the native documentation available at the same time: man regex, man pcre, man pcresyntax and man pcrepattern, and more!

For example, putting your test patterns in a file named 'infile' and using sed to match/replace (again with the above sample):

Code:
cat infile
stuff_bitmore_needed_stuffnotneeded_083.txt
stuff_junk_wanted_morestuff_ABC.txt
books_worms_desired_trailingjunk_xxx.txt
first_leading_soughtfor_following_ZZZ.txt

sed -r 's/^[^_]+_[^_]+_([^_]+).*/\1/' infile
needed
wanted
desired
soughtfor

Last edited by astrogeek; 12-23-2017 at 12:17 AM.
 
1 members found this post helpful.
Old 12-23-2017, 06:51 AM   #7
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,156

Rep: Reputation: 698Reputation: 698Reputation: 698Reputation: 698Reputation: 698Reputation: 698
Or if regexp is not a religion
Code:
str="stuff_bitmore_needed_stuffnotneeded_083.txt"
echo $str | cut -d_ -f3
 
1 members found this post helpful.
Old 12-23-2017, 09:49 AM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,898

Rep: Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241
if _ is the delimiter you can do it easily (but OP should tell us if that was the case)
Code:
P=( ${str//_/ } )
echo ${P[2]}
to avoid pipe and external tools...
 
1 members found this post helpful.
Old 12-23-2017, 06:28 PM   #9
Zero4
Member
 
Registered: Jun 2007
Posts: 67

Original Poster
Rep: Reputation: 0
Thank you everyone that contributed. It seems I have a lot to learn.
 
Old 12-24-2017, 03:46 AM   #10
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,898

Rep: Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241Reputation: 3241
you are welcome.
If you think your problem is solved, please mark the thread solved. If you have some additional questions, do not hesitate, just ask.
And if you really want to say thanks just click on yes.
(and obviously everyone of us have a lot to learn)
 
  


Reply

Tags
bash, regex


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Confusing issue with Perl regEx - Regex check seems to require variable being set EnderX Programming 1 09-07-2013 04:36 AM
bash replace all matches of regex substring in string nickleus Linux - General 3 04-30-2011 11:08 AM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
Perl to find regex and print following 5 lines after regex casperdaghost Linux - Newbie 3 08-29-2010 08:08 PM
php preg_replace substring of a substring senyahnoj Programming 5 12-08-2006 11:31 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration