sed : substitute without displacing columns

billywayne · 05-27-2010, 04:56 PM

Hello.

I am looking to use `sed' to substitute one string for another within a file.

My issue is that the new string is not always the same length as the old string. When this is the case, the other characters on the line are displaced.

For example, I have the following line.

Code:

  9 H      1    1    8    Y.YY       7  109.416000   6   65.783000        0

My goal is to replace the "Y.YY" (4 characters) with "10.01" (5 characters).

But if I simply use

Code:

s/Y.YY/10.01/

then all of the characters following the "10.01" will be moved to the right, which will cause an input error in the program into which I feed this input (the program is coded in FORTRAN and is inflexible as to where the input parameters are positioned).

How can I replace the Y.YY with 10.01 without causing the rest of the characters to be shifted?

Thanks!

BW

GazL · 05-27-2010, 05:10 PM

How about:

Code:

s/Y.YY /10.01/ 

s/ Y.YY/10.01/

depending on exactly how you want it to line up.

billywayne · 05-27-2010, 05:18 PM

Well, manually changing the command would work, and I'm thankful for the suggestion.

However, this sed command is part of a larger shell script which will operate on an array of files for a sequence of values, say 1.00 to 10.50, or so. So sometimes Y.YY is being replaced by a value with the same number of characters, like 1.11, but sometimes with more characters, like 10.01.

I'm looking to develop a robust way of substituting the pattern no matter how many characters need to be substituted in. I perform this operation several dozens of times, so being able to make the substitution independent of character string length is highly valuable to me.

syg00 · 05-27-2010, 05:32 PM

Fix your Fortran parsing.
Else I'd reckon you're up for something like perl or awk to add some logic to figure the length of the substitute and resolve the correct offset to replace.

pixellany · 05-27-2010, 05:35 PM

Normally, you would use tabs to accomplish something like this. If that's not possible, then you can first count the number of characters in the target string, and then use that to adjust the replacement.

so, maybe like this?? (pseudocode---not tested):

Code:

set maxcount to appropriate value**
set rmspace to appropriate value
while read line; do
    count=$(echo $line | grep -o 'Y,Y*' | wc -m)
    subtract count from maxcount to get # of spaces to be added
    create fillstring with the right number of spaces
    echo $(echo $line | sed -r "s/Y.Y* {$rmspace}/10\.01$fillstring/")
done <filename >newfilename

**Get the appropriate value for maxcount by determining the maximum total of characters to be replaced and adjusting for the size of the new string to be added.

GazL · 05-27-2010, 05:59 PM

Quote:

Originally Posted by billywayne

Well, manually changing the command would work, and I'm thankful for the suggestion.

However, this sed command is part of a larger shell script which will operate on an array of files for a sequence of values, say 1.00 to 10.50, or so. So sometimes Y.YY is being replaced by a value with the same number of characters, like 1.11, but sometimes with more characters, like 10.01.

I'm looking to develop a robust way of substituting the pattern no matter how many characters need to be substituted in. I perform this operation several dozens of times, so being able to make the substitution independent of character string length is highly valuable to me.

The trick is to work on the whole field and pad the value you're substituting appropriately. So for a 5 character right aligned field:

Code:

sed -e "s/ Y.YY/$(printf "%5s" $value)/"

and for a left aligned field:

Code:

sed -e "s/Y.YY /$(printf "%-5s" $value)/"

colucix · 05-27-2010, 06:07 PM

Following previous suggestions, I would end-up with something like this:

Code:

#!/bin/bash
number=10.01
digits=$(echo "scale=0; l($number)/l(10)" | bc -l)
sed -i.bck "s/ \{$digits\}Y.YY/$number/" file

the integer of the base-10 logarithm of the number, just counts the additional digits. This information can be used to count the number of spaces to substitute before (and together with) the Y.YY string.

billywayne · 05-27-2010, 06:38 PM

Very good input, you guys.

I've taken your ideas and mixed them up and threw in one of my own. Here's the result.

You're posts made me realize that what I really want to do isn't to replace one string of a given length with another string of, perhaps, a different length.

Actually, both strings need to be 11 characters in order for everything to be just right. Here's my first approximation:

Code:

#!/bin/bash

VALUE="10.01"     # the value I want to replace Y.YY
PATTERN="$( printf "%-11s" Y.YY )"    # Y.YY expressed as an 11 character string, the `-' left justifies it.
REPLACE="$( printf "%-11s" ${VALUE} )"  # the value of $VALUE expressed as a left justified 11 character string
YLINE="$( grep -n Y.YY *.z | cut -d ":" -f 1 )"  # the line on which Y.YY may be found

sed "${YLINE}s/${PATTERN}/${REPLACE}/" *.z

This works like a charm. I'm totally open to suggestions for improving it though.

Thanks again for all the input.

GazL · 05-27-2010, 06:54 PM

Obviously, I don't have full view of exactly what you're doing but the grep and $YLINE look unnecessary.
sed won't replace if it doesn't match and dropping the grep will save you an extra pass through the file.

billywayne · 05-27-2010, 07:00 PM

Quote:

Originally Posted by GazL

Obviously, I don't have full view of exactly what you're doing but the grep and $YLINE look unnecessary.
sed won't replace if it doesn't match and dropping the grep will save you an extra pass through the file.

Very true. It's something I started doing and now I can't even remember why.

billywayne · 05-27-2010, 07:12 PM

Quote:

Originally Posted by billywayne

Very true. It's something I started doing and now I can't even remember why.

Oh yeah. Now I remember.

Once I've replaced the X.XX with the value (VALUE_1), I feed the input file into the main program.

The main program then produces a file exactly like the one I gave it, with certain other values updated, but not VALUE_1.

I then have to replace VALUE_1 with with another value (VALUE_2).

It's likely that VALUE_1 may appear somewhere else in the input file, so a global sed substitution may produce results I don't want.

I keep track of where X.XX was so that when it comes time to replace VALUE_1 with VALUE_2, I know exactly where to look for it.

GazL · 05-27-2010, 07:33 PM

Yikes. That sounds messy. Best of luck.

grail · 05-27-2010, 07:38 PM

Not to be a party pooper (or maybe I just misunderstand), but the following doesn't make sense:

Quote:

It's likely that VALUE_1 may appear somewhere else in the input file, so a global sed substitution may produce results I don't want.
I keep track of where X.XX was so that when it comes time to replace VALUE_1 with VALUE_2, I know exactly where to look for it.

Reason being is that grep would return all lines with VALUE_1 (would it not??) and so there would be multiple line numbers in YLINE.
Or did I miss something?

billywayne · 05-27-2010, 08:33 PM

Quote:

Originally Posted by grail

Not to be a party pooper (or maybe I just misunderstand), but the following doesn't make sense:

Reason being is that grep would return all lines with VALUE_1 (would it not??) and so there would be multiple line numbers in YLINE.
Or did I miss something?

I think GazL's suggestion was that the entire YLINE procedure could be omitted. Not to remove the grep from the YLINE, but to get rid of YLINE altogether.

I could indeed remove the YLINE stuff if my script were only replacing Y.YY with a value. I could tell sed to perform a global substitution and be done with it. But I need to know where Y.YY was originally in order to perform subsequent substitutions.

Something like this

Code:

YLINE=$( grep -n Y.YY file.z | cut -d ":" -f 1 )
SEQUENCE=$( seq -w 1.00 0.01 10.00 )

for VALUE in ${SEQUENCE} ; do
    sed -i.bak "${YLINE}s/Y.YY/${VALUE}/" file.z
    submit_to_program  # produces an output.z file
    sed -i.bak "${YLINE}s/${VALUE}/Y.YY/" output.z
    cp output.z file.z
done

Not exactly but close.

I have some steps (initializing and incremented a counter and cp output.z output${COUNTER}.z) in order to create temporary files so as not to clobber intermediate output.z files, but that's kind of the essence of what I'm doing. And I do this more times than I'd like to think of, so I'd like for the script to be as general as possible, handling three digit or four digit numbers without having to think about the length of the variable to be substituted. Going in and replacing X.XX with XX.XX in all of its occurrences every time I needed to submit an input file would be repetitive and tedious. And isn't that what computers were made for, doing the repetitive tedious stuff so I don't have to? I guess I could just s/X.XX/XX.XX/g on the script every time, but what's the fun in that?

I want to store the script in my local bin directory and call it whenever I need it without having to worry about it.

$VALUE may appear somewhere else in the file. Having $YLINE ensures me that everything is happening to the correct line. Granted I don't need it for the initial sed substitution, but I kind of like the feeling that sed isn't looking through the entire file when I can tell it exactly which line it's on.

See what I'm saying?