LinuxQuestions.org - Replace part of string-delete last letter of it

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Replace part of string-delete last letter of it (https://www.linuxquestions.org/questions/programming-9/replace-part-of-string-delete-last-letter-of-it-728749/)

Replace part of string-delete last letter of it

Hi to all

I would like to delete/(replace with nothing) the last letter of the strings that have the 2 or 3 numbers between "_P" and "X,Y or Z" in the follow pattern.

Code:

YAN_P83Y 

YEN_P123Z

YIN_P91X 

YON_P77Z 

YUN_P240Y

I was trying with but doesn´t seem to work.

Code:

sed '/P[0-9][0-9]|[0-9][X-Z]/ s/[X-Z]//g' inputfile

the desired output would be:

Code:

YAN_P83 

YEN_P123

YIN_P91 

YON_P77 

YUN_P240

Any help would be very appreciated

Thanks in advance

Why do you think that regex will match 2 or 3 ?. Why not 1 (also) in that case ?.
That says match exactly 3 digits. Try

Code:

sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]//g' inputfile

Probably better would be

sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]$//' inputfile

First, you need extended regexes to use the "alternation" operator......e.g. sed -r 'stuff' <file>

Second, I'm not sure how this: "P[0-9][0-9]|[0-9][X-Z]" gets interpreted. I think it reads as: "P", followed by any digit, then any digit or any digit, then "X", "Y", or "Z".

Try this:
sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]//g' filename

("P", followed by 2 digits, then a third optional digit, then "X", "Y", or "Z".)

But, this still replaces the leading "Y", so some more work is needed. e.g.: "X$" means the "X" at the end of the line.

if you have Python

Code:

#!/usr/bin/env python

for line in open("file"):

    line=line.strip()

    if line[-1] in ["X","Y","Z"]:

        ind=line.index("_P")

        number=line[ind+2:-1]

        if number.isdigit() and len(number) >=2  :

            print line[:-1]

output

Code:

# more file

YAN_P83Y

YEN_P123Z

YIN_P91X

YON_P77Z

YUN_P240Y

YEN_P2X

# ./test.py

YAN_P83

YEN_P123

YIN_P91

YON_P77

YUN_P240

In Perl using chop the last character has been deleted from the string.

Cgcamal didn't specify all the details. E.g., must "_" be present? Is X,Y,Z the last character of the line? Should it be retained after 1 or 4 digits? Must "sed" be used?
If Perl is allowed, it's easy:

Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/'

This is quite careful.

Thangappan is right, "chop" drops the last character, at first the newline. You could do this:

Code:

perl -wne 'chop;chop;print $_,"\n"'

But that's dangerous because it drops the last character of any line, regardless of preceding _P and digits.

/Quigi

Quote:

Originally Posted by Quigi (Post 3554629)

Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/'

This is quite careful.

Thangappan is right, "chop" drops the last character, at first the newline. You could do this:

Code:

perl -wne 'chop;chop;print $_,"\n"'

But that's dangerous because it drops the last character of any line, regardless of preceding _P and digits.

/Quigi

Why [XYZ] and not just '.' ? Rather

Code:

perl -wpe 's/(_P\d{2,3}.\n/$1\n/'

- dot ('.') is any character.

Quote:

Originally Posted by Sergei Steshenko (Post 3554664)

Why [XYZ] and not just '.' ?

Just being careful. Yes, your suggestion would work in the 5 examples, and it's simpler. But maybe there are cases where the last character should not be deleted, e.g., if it's "5" or "U", or following a different number of digits? We'd need more input from the original poster.

The examples would be done right even by s/.$// -- simply dropping the last character.

That seems to work as it's supposed to:

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/$P[0-9]*$$[X-Z]$/\1/g' filename

Hi again, many thanks to all for help me in this question. Sorry for not put more details the sample inputfile before.
Now I come back with a more detailed inputfile that would be as follow:

# The strings wanted are surrounded by other columns on left and on right.

#inputfile

Code:

Some other text

Text YAN_P83Y Another text in this column

Text YEN_P123Z

Text YIN_P91X Another text in this column

Text YON_P77Z Another text in this column 

Text YUN_P240Y



Some other text

Some other text

Step by step:

syg00 (Thanks):

Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/ s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile

sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]$//' inputfile # This script doesn´t delete last letter when the line has more than one column

pixellany (Thanks):

Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile

sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]$//g' inputfile #This script doesn´t change anything in the inputfile

ghostdog74 (Thanks):

Code:

With script below, many thanks for your help, but I don´t have Python:

#!/usr/bin/env python

for line in open("file"):

    line=line.strip()

    if line[-1] in ["X","Y","Z"]:

        ind=line.index("_P")

        number=line[ind+2:-1]

        if number.isdigit() and len(number) >=2  :

            print line[:-1]

thangappan (Thanks):

Quigi (Thanks):

Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/' inputfile # This script works!



perl -wne 'chop;chop;print $_,"\n"' inputfile # This script delete what ever last letter of any line, not the required



sed 's/.$//g' inputfile #This script deletes last letter of any line, but not last letter of column 1.

Sergei Steshenko (Thanks):

Code:

$ perl -wpe 's/(_P\d{2,3}.\n/$1\n/' inputfile # With this script I get the next error

Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE _P\d{2,3}.\n/ at -e lin

sycamorex (Thanks):

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' inputfile #This script works on Cygwin and UWIN either,

but in UWIN it processed with a minor warning in line 9.



Some other text

YAN_P83 Another text in this column

YEN_P123

YIN_P91 Another text in this column

YON_P77 Another text in this column

YUN_P240



Some other text

sed: "inputfile", line 9: warning: newline appended

Some other text

Questions:

sycamorex or someone else:

1- How it works this regex you have used? May you explain it a little bit?

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/$P[0-9]*$$[X-Z]$/\1/g' filename

2- In what cases we need to use "\" like in "\{2,3\}" or "\(P[0-9]*\"3- What it means "\1" in .../\1/g' inputfile
Thanks again to all,

Best regards.

If you want to understand sed, start here: http://www.grymoire.com/Unix/Sed.html#uh-0

Quote:

Originally Posted by cgcamal (Post 3555050)

Code:

Some other text

Text YAN_P83Y Another text in this column

Text YEN_P123Z

Text YIN_P91X Another text in this column

Text YON_P77Z Another text in this column 

Text YUN_P240Y



Some other text

Some other text

Step by step:

syg00 (Thanks):

Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/ s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile

sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]$//' inputfile # This script doesn´t delete last letter when the line has more than one column

pixellany (Thanks):

Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile

sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]$//g' inputfile #This script doesn´t change anything in the inputfile

ghostdog74 (Thanks):

Code:

With script below, many thanks for your help, but I don´t have Python:

#!/usr/bin/env python

for line in open("file"):

    line=line.strip()

    if line[-1] in ["X","Y","Z"]:

        ind=line.index("_P")

        number=line[ind+2:-1]

        if number.isdigit() and len(number) >=2  :

            print line[:-1]

thangappan (Thanks):

Quigi (Thanks):

Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/' inputfile # This script works!



perl -wne 'chop;chop;print $_,"\n"' inputfile # This script delete what ever last letter of any line, not the required



sed 's/.$//g' inputfile #This script deletes last letter of any line, but not last letter of column 1.

Sergei Steshenko (Thanks):

Code:

$ perl -wpe 's/(_P\d{2,3}.\n/$1\n/' inputfile # With this script I get the next error

Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE _P\d{2,3}.\n/ at -e lin

sycamorex (Thanks):

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' inputfile #This script works on Cygwin and UWIN either,

but in UWIN it processed with a minor warning in line 9.



Some other text

YAN_P83 Another text in this column

YEN_P123

YIN_P91 Another text in this column

YON_P77 Another text in this column

YUN_P240



Some other text

sed: "inputfile", line 9: warning: newline appended

Some other text

Questions:

sycamorex or someone else:

1- How it works this regex you have used? May you explain it a little bit?

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/$P[0-9]*$$[X-Z]$/\1/g' filename

2- In what cases we need to use "\" like in "\{2,3\}" or "\(P[0-9]*\"3- What it means "\1" in .../\1/g' inputfile
Thanks again to all,

Best regards.

Sorry, I was modifying somebody else's regular expression - this form:

Code:

perl -wpe 's/(_P\d{2,3}).\n/$1\n/' input_file

compiles OK and appears to work.

Quote:

1- How it works this regex you have used? May you explain it a little bit?

Code:
sed '/P[0-9]\{2,3\}[X-Z]/ s/$P[0-9]*$$[X-Z]$/\1/g' filename

I don't have much time as I'm at work typing on Windows XP:), but in a few words:
Apply the substitutions only for lines matching the following pattern:

Quote:

'/P[0-9]\{2,3\}[X-Z]/

'P' followed by 2 or 3 digits and everything followed by either 'X', 'Y' or 'Z'.
The substitutions:

Quote:

s/$P[0-9]*$$[X-Z]$/\1/g'

It's called backreferencing. Basically you create 'containers' surrounded by $ ......$
1) So the first container (1) is:

Quote:

$P[0-9]*$

'P' followed by 0 or more digits.
2. The second 'container' is:

Quote:

$[X-Z]$

A letter: X, Y, or Z.

Then we replace what has been matched in both containers with the first container only: \1 (omitting whatever was matched in the second container). If you put \2 instead, it would leave only what was matched in the second container.

I hope it makes sense. If not you can read something about back referencing in sed.
Sorry, need to go back to work - big boss is watching.

SED has a marvelous ability to give you totally inscrutable code----the Grymoire site (linked earlier) saved me. It is missing a few nuances, so read the official manual also.

Hi guys, many thanks to all, I´ve learned a lot from your different solutions and your references to study will be useful without doubt.

sycamorex, the "container" trick/option is 90% what I needed to get the desired result using sed in the way I was original trying.

Many thanks to all again.

best regards