LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Remove trailing characters while adding leading characters (https://www.linuxquestions.org/questions/programming-9/remove-trailing-characters-while-adding-leading-characters-4175735650/)

sharky 04-03-2024 08:17 PM

Remove trailing characters while adding leading characters
 
Text file contains numerous strings with trailing sub-string.

example where _xx is the trailing sub-string;

Quote:

"m1_xx" some other text "m2_xx"
"p2_xx" yet more text "p2_xx" extra text
change is good "hello_xx"
desired output:

Quote:

"yy_m1" some other text "yy_m2"
"yy_p2" yet more text "yy_p2" extra text
change is good "yy_hello"
I found ways to make the substitution. However, with my method I lose the existing spacing - all the strings in the output are separated by a single space.

syg00 04-03-2024 08:56 PM

sed is your friend - use regex and capture groups. Do-able in a single invocation.

grail 04-03-2024 10:56 PM

Please provide what you have tried so we may assist?

sharky 04-04-2024 01:05 PM

Quote:

Originally Posted by syg00 (Post 6493963)
sed is your friend - use regex and capture groups. Do-able in a single invocation.

What is a 'capture group'?

sharky 04-04-2024 01:32 PM

Quote:

Originally Posted by grail (Post 6493973)
Please provide what you have tried so we may assist?

Code:

#!/bin/env python3

def changeXXToYY():

  # read lines into list
  with open("testText") as fp:
    mapList = fp.readlines()

  # remove all line feeds
  mapList = [x.strip() for x in mapList]

  toRemove = '_XX"'
  toAdd = '"YY_'

  for elem in mapList:
    elem = elem.split()
    for item in elem:
      if toRemove in item:
        item = toAdd + item.split(toRemove)[0].split('"')[-1] + '"'
        print(item)

change2kTo3d()

This prints out the desired new string but the original line remains unchanged.

MadeInGermany 04-04-2024 02:37 PM

With sed:
Code:

sed 's/"\([^"]*\)_xx"/"yy_\1"/g' testText
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.

sharky 04-04-2024 03:12 PM

Quote:

Originally Posted by MadeInGermany (Post 6494118)
With sed:
Code:

sed 's/"\([^"]*\)_xx"/"yy_\1"/g' testText
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.

It works. Thanks for the explanation also.

sharky 04-04-2024 03:25 PM

Quote:

Originally Posted by MadeInGermany (Post 6494118)
With sed:
Code:

sed 's/"\([^"]*\)_xx"/"yy_\1"/g' testText
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.

My apologies but I noticed that my input file will also have cases where the original string is not withing double quotes.

How should this sed command be modified to work in such cases? I've tried a few things but nothing changed.

MadeInGermany 04-04-2024 08:50 PM

If " anchors cannot be used, you can try \b anchors ("word boundaries"):
Code:

sed 's/\b\([^" ]*\)_xx\b/yy_\1/g' testText
[^" ]* is a string of characters that are not " or space.
The pre-defined "word boundary" is just a marker not a character, so it must not be re-inserted. But it is less precise e.g. also occurs at a - character.
The following uses Extended RegularExpression and three ( ) groups:
Code:

sed -E 's/(^|[" ])([^" ]*)_xx([" ]|$)/\1yy_\2\3/g' testText
The 1st group is the beginning marker or a " or space character.
The 2nd group is a string of not " or space characters.
The 3rd group is a " or space character or the end marker.
\1 \2 \3 is what the respective group has matched.

syg00 04-04-2024 09:15 PM

An alternate approach is to specify what you are looking for, rather than what you are not looking for. Also protects from overlooking possible corner cases (like what if one of those blanks is a tab ?).
Code:

sed -r 's/([[:alnum:]]+)_xx/yy_\1/g' input.file

sundialsvcs 04-05-2024 09:01 AM

Just to clarify: a “regular expression (regex …)” can not only match a string pattern – (“yes or no, does it match?”) – but also return to you various specified pieces of the matching string. Such as: “some other text” and “more text.” In environments like sed, these pieces are instantly available as things like [left to right …] “$1” and “$2.” Or maybe, “\1” and “\2.” Which you can immediately use to produce output.

Also: These days, “regex support” is universal, and the syntax has become standardized. Implementations now vary only in the details. Every language has it. Therefore, understanding this very important power-tool is definitely “an essential life skill.” (Like knowing how to use a life jacket …) If you need to “tear apart a text string,” (and who doesn’t?), regex has your back.

There are “esoteric fee-churs” in regexes that you can learn about if and when you actually need them, and others that you might use every day.

sharky 04-05-2024 05:59 PM

Quote:

Originally Posted by sundialsvcs (Post 6494235)
Just to clarify: a “regular expression (regex …)” can not only match a string pattern – (“yes or no, does it match?”) – but also return to you various specified pieces of the matching string. Such as: “some other text” and “more text.” In environments like sed, these pieces are instantly available as things like [left to right …] “$1” and “$2.” Or maybe, “\1” and “\2.” Which you can immediately use to produce output.

Also: These days, “regex support” is universal, and the syntax has become standardized. Implementations now vary only in the details. Every language has it. Therefore, understanding this very important power-tool is definitely “an essential life skill.” (Like knowing how to use a life jacket …) If you need to “tear apart a text string,” (and who doesn’t?), regex has your back.

There are “esoteric fee-churs” in regexes that you can learn about if and when you actually need them, and others that you might use every day.

I do coding in Cadence SKILL language for design automation in a Linux environment (analog IC design). However, to my complete and utter shame, I have never gotten past a few rudimentary regular expression usages. The fact is, despite working in a Linux environment, I don't often have much need for regular expressions and have never taken that deep dive. I blame it on linuxquestions - you guys spoil me with amazing solutions. :)

sharky 04-05-2024 05:59 PM

Quote:

Originally Posted by syg00 (Post 6494165)
An alternate approach is to specify what you are looking for, rather than what you are not looking for. Also protects from overlooking possible corner cases (like what if one of those blanks is a tab ?).
Code:

sed -r 's/([[:alnum:]]+)_xx/yy_\1/g' input.file

This worked perfectly.

Thanks!

syg00 04-06-2024 05:40 AM

You need to take that "deep dive" - regex is a powerful and useful tool. MadeInGermany has given you good pointers to get you started.

danielbmartin 04-06-2024 11:53 AM

Please forgive if this is obvious to LQ regulars.

The excellent solution posted by syg00 may be generalized.
xx and yy could be variable names instead of character strings.

With this InFile ...
Code:

m1_SALT some other text m2_SALT
p2_SALT yet more text p2_SALT extra text
change is good hello_SALT
m1_HAM some other text m2_HAM
p2_HAM yet more text p2_HAM extra text
change is good hello_HAM

... this code ...
Code:

xx='SALT'
yy='SUGAR'
sed -r 's/([[:alnum:]]+)_'$xx'/'$yy'_\1/g' <$InFile >$OutFile

... produces this OutFile ...
Code:

SUGAR_m1 some other text SUGAR_m2
SUGAR_p2 yet more text SUGAR_p2 extra text
change is good SUGAR_hello
m1_HAM some other text m2_HAM
p2_HAM yet more text p2_HAM extra text
change is good hello_HAM

Which shows how we may change SALT into SUGAR
but not HAM into CHEESE.

Daniel B. Martin

.


All times are GMT -5. The time now is 11:44 PM.