ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Not to mention there is a meta character being deleted unannounced. Who knows what other requirements have been omitted.
A quick hack simply doing multiple passes would probably be my solution - trying to do it as one convoluted regex is just an academic pursuit IMHO.
How many regexes do you have to deal with that makes this not easier to do by hand?
What engine are the regexes for? If there is a dash in a character class (e.g. "[\w-]") how should it be handled?
regexes are a few dozens. i wouldn't edit them by hand because i need to grep for original regexes first, and then grep for 'no-dashes' regexes if no matches are found with the original ones.
dashes removal rule should be: 'remove dashes outside square brackets and any following meta character (?*+)'
If all the data are of exactly that structure, a reasonably straightforward ERE with sed will do the job. I didn't attempt to do it in one compound expression, but can be done in a single pass.
Perl will of course do it, but that is a given when regex is mentioned.
Doing it in pure bash is not something I would ever contemplate.
A dash is a special character, normally indicating a range, in all the regexps I have worked with. To remove a literal dash character you need to escape it. \ is the normal escape character, so 13\-14 would match the character sequence one three dash one four. Otherwise it is trying to match a range of 13 to 14 which is probably not what you want and may even not make any sense in you expression.
That was actually quite a challenge when you start looking at all the ways the characters can occur within a regex, escapes, and so on. Can't promise this is perfect, but its a start:
Code:
$ cat /tmp/regex
#!/bin/bash
# ERE's:
#
# match bracket expression ([]) block: (\[\^?]?[^]]*])
# match characters until escaped character, hyphen, or start of brackets block: ([^[\-]+)
# match escaped char: (\\.)
# match hyphens: (-+)
# rest of line: (.*)
sed -E -e '
:again
s/^(((\[\^?]?[^]]*])|(\\.)|([^[\-]+))*)(-+)(.*)$/\1\7/
t again
'
$ echo '13-45[a-z][3-4]-?buburr-ex' | /tmp/regex
1345[a-z][3-4]?buburrex
$ echo '13-45\[a-z][3-4]-?buburr-ex' | /tmp/regex
1345\[az][3-4]?buburrex
$
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski
Last edited by GazL; 02-25-2020 at 10:15 AM.
Reason: fixed comments
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski
Some people, when confronted with a regex, insist on posting a throw-away response to a misuse of Perl like its a piece of inspired wisdom (instead of the useless meme it actually was).
Quote:
That was actually quite a challenge when you start looking at all the ways the characters can occur within a regex, escapes, and so on. Can't promise this is perfect, but its a start:
A lot of people think regex is complicated, but it's actually a pretty simple language. There's only two fairly basic syntaxes (inside character classes and outside character classes), with a pretty limited set of rules of what goes where - and in this specific situation only a handful of characters that need to be cared about.
But that doesn't necessarily mean parsing regex with regex is the best approach...
The sed solution you posted checks for character classes before escaped characters, which means it doesn't handle [\]-] correctly.
That'll be fixable, but if you also want to handle nested classes like [[:alpha:]-] then the simplest approach is to probably count unescaped brackets to know when you're outside again. (Assuming a regex flavour where brackets must be escaped in character classes, which isn't guaranteed.)
Here's the dreaded pure Bash solution addressing that issue:
Code:
#!/bin/bash
input='13-45[a-z][3-4]-?buburr-ex'
output=''
inclass=0
for (( i=0 ; i<${#input} ; ++i ))
do
c=${input:$i:1}
case "$c" in
'\') output+="${input:$i:2}" && ((++i)) && continue ;;
'[') ((++inclass));;
']') ((--inclass));;
'-')
if [ $inclass -eq 0 ]
then
case "${input:$i+1:1}" in ('*'|'+'|'?')
((++i))
esac
continue
fi
esac
output+="$c"
done
echo "$output"
Thanks for pointing that out. Clearly I missed the [:class:] cases. Oh well, I said it likely wasn't perfect.
If I were approaching this as a real problem to be solved I would have reached for C. I was using a regex here purely for fun of it and to see if I could do it that way. Apparently I only got part way there.
However memey it has become I still like that Jamie Zawinski quote: It's humorous, and it's also a caution that if you attempt to do something too complicated with a regex you're going to fail, or cause yourself problems in the future. Nothing is absolute though, and regex do have their place, however it seems this wasn't one of them.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.