LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Replace part of string-delete last letter of it (https://www.linuxquestions.org/questions/programming-9/replace-part-of-string-delete-last-letter-of-it-728749/)

cgcamal 05-26-2009 11:55 PM

Replace part of string-delete last letter of it
 
Hi to all


I would like to delete/(replace with nothing) the last letter of the strings that have the 2 or 3 numbers between "_P" and "X,Y or Z" in the follow pattern.


Code:

YAN_P83Y
YEN_P123Z
YIN_P91X
YON_P77Z
YUN_P240Y

I was trying with but doesn´t seem to work.
Code:

sed '/P[0-9][0-9]|[0-9][X-Z]/ s/[X-Z]//g' inputfile
the desired output would be:

Code:

YAN_P83
YEN_P123
YIN_P91
YON_P77
YUN_P240

Any help would be very appreciated

Thanks in advance

syg00 05-27-2009 12:48 AM

Why do you think that regex will match 2 or 3 ?. Why not 1 (also) in that case ?.
That says match exactly 3 digits. Try
Code:

sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]//g' inputfile
Probably better would be
sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]$//' inputfile


pixellany 05-27-2009 01:01 AM

First, you need extended regexes to use the "alternation" operator......e.g. sed -r 'stuff' <file>

Second, I'm not sure how this: "P[0-9][0-9]|[0-9][X-Z]" gets interpreted. I think it reads as: "P", followed by any digit, then any digit or any digit, then "X", "Y", or "Z".

Try this:
sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]//g' filename

("P", followed by 2 digits, then a third optional digit, then "X", "Y", or "Z".)

But, this still replaces the leading "Y", so some more work is needed. e.g.: "X$" means the "X" at the end of the line.

ghostdog74 05-27-2009 01:03 AM

if you have Python
Code:

#!/usr/bin/env python
for line in open("file"):
    line=line.strip()
    if line[-1] in ["X","Y","Z"]:
        ind=line.index("_P")
        number=line[ind+2:-1]
        if number.isdigit() and len(number) >=2  :
            print line[:-1]

output
Code:

# more file
YAN_P83Y
YEN_P123Z
YIN_P91X
YON_P77Z
YUN_P240Y
YEN_P2X
# ./test.py
YAN_P83
YEN_P123
YIN_P91
YON_P77
YUN_P240


thangappan 05-27-2009 04:33 AM

In perl
 
In Perl using chop the last character has been deleted from the string.

Quigi 05-27-2009 02:08 PM

Cgcamal didn't specify all the details. E.g., must "_" be present? Is X,Y,Z the last character of the line? Should it be retained after 1 or 4 digits? Must "sed" be used?
If Perl is allowed, it's easy:
Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/'
This is quite careful.

Thangappan is right, "chop" drops the last character, at first the newline. You could do this:
Code:

perl -wne 'chop;chop;print $_,"\n"'
But that's dangerous because it drops the last character of any line, regardless of preceding _P and digits.

/Quigi

Sergei Steshenko 05-27-2009 02:34 PM

Quote:

Originally Posted by Quigi (Post 3554629)
Cgcamal didn't specify all the details. E.g., must "_" be present? Is X,Y,Z the last character of the line? Should it be retained after 1 or 4 digits? Must "sed" be used?
If Perl is allowed, it's easy:
Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/'
This is quite careful.

Thangappan is right, "chop" drops the last character, at first the newline. You could do this:
Code:

perl -wne 'chop;chop;print $_,"\n"'
But that's dangerous because it drops the last character of any line, regardless of preceding _P and digits.

/Quigi

Why [XYZ] and not just '.' ? Rather

Code:

perl -wpe 's/(_P\d{2,3}.\n/$1\n/'
- dot ('.') is any character.

Quigi 05-27-2009 02:43 PM

Quote:

Originally Posted by Sergei Steshenko (Post 3554664)
Why [XYZ] and not just '.' ?

Just being careful. Yes, your suggestion would work in the 5 examples, and it's simpler. But maybe there are cases where the last character should not be deleted, e.g., if it's "5" or "U", or following a different number of digits? We'd need more input from the original poster.

The examples would be done right even by s/.$// -- simply dropping the last character.

sycamorex 05-27-2009 03:57 PM

That seems to work as it's supposed to:
Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' filename

cgcamal 05-28-2009 12:26 AM

Hi again, many thanks to all for help me in this question. Sorry for not put more details the sample inputfile before.
Now I come back with a more detailed inputfile that would be as follow:

# The strings wanted are surrounded by other columns on left and on right.

#inputfile
Code:

Some other text
Text YAN_P83Y Another text in this column
Text YEN_P123Z
Text YIN_P91X Another text in this column
Text YON_P77Z Another text in this column
Text YUN_P240Y

Some other text
Some other text

Step by step:

syg00 (Thanks):
Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/ s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile
sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]$//' inputfile # This script doesn´t delete last letter when the line has more than one column

pixellany (Thanks):
Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile
sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]$//g' inputfile #This script doesn´t change anything in the inputfile

ghostdog74 (Thanks):
Code:

With script below, many thanks for your help, but I don´t have Python:
#!/usr/bin/env python
for line in open("file"):
    line=line.strip()
    if line[-1] in ["X","Y","Z"]:
        ind=line.index("_P")
        number=line[ind+2:-1]
        if number.isdigit() and len(number) >=2  :
            print line[:-1]

thangappan (Thanks):

Quigi (Thanks):
Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/' inputfile # This script works!

perl -wne 'chop;chop;print $_,"\n"' inputfile # This script delete what ever last letter of any line, not the required

sed 's/.$//g' inputfile #This script deletes last letter of any line, but not last letter of column 1.

Sergei Steshenko (Thanks):
Code:

$ perl -wpe 's/(_P\d{2,3}.\n/$1\n/' inputfile # With this script I get the next error
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE _P\d{2,3}.\n/ at -e lin

sycamorex (Thanks):

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' inputfile #This script works on Cygwin and UWIN either,
but in UWIN it processed with a minor warning in line 9.

Some other text
YAN_P83 Another text in this column
YEN_P123
YIN_P91 Another text in this column
YON_P77 Another text in this column
YUN_P240

Some other text
sed: "inputfile", line 9: warning: newline appended
Some other text

Questions:

sycamorex or someone else:

1- How it works this regex you have used? May you explain it a little bit?
Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' filename
2- In what cases we need to use "\" like in "\{2,3\}" or "\(P[0-9]*\"3- What it means "\1" in .../\1/g' inputfile
Thanks again to all,

Best regards.

chrism01 05-28-2009 12:29 AM

If you want to understand sed, start here: http://www.grymoire.com/Unix/Sed.html#uh-0

Sergei Steshenko 05-28-2009 01:27 AM

Quote:

Originally Posted by cgcamal (Post 3555050)
Hi again, many thanks to all for help me in this question. Sorry for not put more details the sample inputfile before.
Now I come back with a more detailed inputfile that would be as follow:

# The strings wanted are surrounded by other columns on left and on right.

#inputfile
Code:

Some other text
Text YAN_P83Y Another text in this column
Text YEN_P123Z
Text YIN_P91X Another text in this column
Text YON_P77Z Another text in this column
Text YUN_P240Y

Some other text
Some other text

Step by step:

syg00 (Thanks):
Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/ s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile
sed -r '/P[0-9]{2,3}[X-Z]/ s/[X-Z]$//' inputfile # This script doesn´t delete last letter when the line has more than one column

pixellany (Thanks):
Code:

sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]//g' inputfile #This script doesn´t change anything in the inputfile
sed '/P[0-9]{2}[0-9]?[X-Z]/s/[X-Z]$//g' inputfile #This script doesn´t change anything in the inputfile

ghostdog74 (Thanks):
Code:

With script below, many thanks for your help, but I don´t have Python:
#!/usr/bin/env python
for line in open("file"):
    line=line.strip()
    if line[-1] in ["X","Y","Z"]:
        ind=line.index("_P")
        number=line[ind+2:-1]
        if number.isdigit() and len(number) >=2  :
            print line[:-1]

thangappan (Thanks):

Quigi (Thanks):
Code:

perl -wpe 's/(_P\d{2,3})[XYZ]/$1/' inputfile # This script works!

perl -wne 'chop;chop;print $_,"\n"' inputfile # This script delete what ever last letter of any line, not the required

sed 's/.$//g' inputfile #This script deletes last letter of any line, but not last letter of column 1.

Sergei Steshenko (Thanks):
Code:

$ perl -wpe 's/(_P\d{2,3}.\n/$1\n/' inputfile # With this script I get the next error
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE _P\d{2,3}.\n/ at -e lin

sycamorex (Thanks):

Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' inputfile #This script works on Cygwin and UWIN either,
but in UWIN it processed with a minor warning in line 9.

Some other text
YAN_P83 Another text in this column
YEN_P123
YIN_P91 Another text in this column
YON_P77 Another text in this column
YUN_P240

Some other text
sed: "inputfile", line 9: warning: newline appended
Some other text

Questions:

sycamorex or someone else:

1- How it works this regex you have used? May you explain it a little bit?
Code:

sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' filename
2- In what cases we need to use "\" like in "\{2,3\}" or "\(P[0-9]*\"3- What it means "\1" in .../\1/g' inputfile
Thanks again to all,

Best regards.


Sorry, I was modifying somebody else's regular expression - this form:

Code:

perl -wpe 's/(_P\d{2,3}).\n/$1\n/' input_file
compiles OK and appears to work.

sycamorex 05-28-2009 05:19 AM

Quote:

1- How it works this regex you have used? May you explain it a little bit?

Code:
sed '/P[0-9]\{2,3\}[X-Z]/ s/\(P[0-9]*\)\([X-Z]\)/\1/g' filename
I don't have much time as I'm at work typing on Windows XP:), but in a few words:
Apply the substitutions only for lines matching the following pattern:
Quote:

'/P[0-9]\{2,3\}[X-Z]/
'P' followed by 2 or 3 digits and everything followed by either 'X', 'Y' or 'Z'.
The substitutions:

Quote:

s/\(P[0-9]*\)\([X-Z]\)/\1/g'
It's called backreferencing. Basically you create 'containers' surrounded by \( ......\)
1) So the first container (1) is:
Quote:

\(P[0-9]*\)
'P' followed by 0 or more digits.
2. The second 'container' is:
Quote:

\([X-Z]\)
A letter: X, Y, or Z.

Then we replace what has been matched in both containers with the first container only: \1 (omitting whatever was matched in the second container). If you put \2 instead, it would leave only what was matched in the second container.

I hope it makes sense. If not you can read something about back referencing in sed.
Sorry, need to go back to work - big boss is watching.

pixellany 05-28-2009 07:27 AM

SED has a marvelous ability to give you totally inscrutable code----the Grymoire site (linked earlier) saved me. It is missing a few nuances, so read the official manual also.

cgcamal 05-28-2009 11:35 PM

Hi guys, many thanks to all, I´ve learned a lot from your different solutions and your references to study will be useful without doubt.

sycamorex, the "container" trick/option is 90% what I needed to get the desired result using sed in the way I was original trying.

Many thanks to all again.

best regards


All times are GMT -5. The time now is 12:46 PM.