LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Text conversion help (https://www.linuxquestions.org/questions/linux-newbie-8/text-conversion-help-4175599677/)

jdoginky 02-13-2017 02:40 PM

Text conversion help
 
Hi, new to the group, and first time poster.
I'm trying to convert "abc" to "def" in a file, but only when found in positions 3,4,5 and 32,33,34.

Example:
I want to convert:
3 abc eelee ref: OAK ARR: ONT abc 0236
to:
3 def eelee ref: OAK ARR: ONT def 0236

but not change:
3 lmn eelee abc: OAK ARR: ONT lmn 0400 abc

I thought sed would be my best option, but having problem figuring out how.

jdoginky 02-13-2017 02:41 PM

Hi, new to the group, and first time poster.
I'm trying to convert "abc" to "def" in a file, but only when found in positions 3,4,5 and 32,33,34.

Example:
I want to convert:
3 abc eelee ref: OAK ARR: ONT abc 0236
to:
3 def eelee ref: OAK ARR: ONT def 0236

but not change:
3 lmn eelee abc: OAK ARR: ONT lmn 0400 abc

I thought sed would be my best option, but having problem figuring out how.

pan64 02-13-2017 03:08 PM

if position is important probably awk is a better tool

szboardstretcher 02-13-2017 03:13 PM

You say '3-5' and '32-34' which are character numbers. That can be very difficult to nail down, especially if there are any length differences or spaces that push your characters over 1 or 2.

Would it be better to look at it like 'column number'? Like for your example:

Code:

1  2    3    4    5  6    7  8    9
3 abc eelee ref: OAK ARR: ONT abc 0236


jdoginky 02-17-2017 01:01 PM

Sorry, I didn't provide a very good example. There are some column's (fields)that run together, which would prevent your suggestion.

I want to convert:
3 abc eelee ref: OAK ARR: ONT abc0236
to:
3 def eelee ref: OAK ARR: ONT def0236

but not change:
3 lmn eelee abc: OAK ARR: ONT lmn0400 abc

szboardstretcher 02-17-2017 01:13 PM

That shouldn't matter programmatically. For example, would this pseudo-code definition do what you expect?

Code:

if data in col 1,2,3 = 3,abc,eelee
then
change data in col2=def AND col8=def0236


rtmistler 02-17-2017 01:14 PM

Quote:

Originally Posted by jdoginky (Post 5669957)
Hi, new to the group, and first time poster.
I'm trying to convert "abc" to "def" in a file, but only when found in positions 3,4,5 and 32,33,34.

Example:
I want to convert:
3 abc eelee ref: OAK ARR: ONT abc 0236
to:
3 def eelee ref: OAK ARR: ONT def 0236

but not change:
3 lmn eelee abc: OAK ARR: ONT lmn 0400 abc

I thought sed would be my best option, but having problem figuring out how.

Hi,

While I tend to agree with pan64 that awk may be better, you should post your attempts with sed or awk to show what you have tried.

LQ members are happy to help you, however they are also here as volunteers and further to help you to learn "how to" by your self. Thus it's best to see your earlier attempts to see how you approach a solution and then have members offer refinement. Please post some of the attempts and describe where the outcomes were not correct or what things you wished to do but could not because of your inexperience with either sed or awk, or some other tool.

grail 02-17-2017 02:39 PM

sed does this fairly easily :) Happy to show example once you show your attempts ;)

jdoginky 03-02-2017 09:09 AM

Thanks all. As I’ve looked more closely at this, I think I have a clear picture of what the original VB code was doing.

If char 3-5 is in the ALL_WIDGETS file
If char 3-5 and 138-140 match
convert both to alt_widget
else
convert char 3-5 to alt_widget
replace char 138-140 with spaces

My problem (in red)is what logic to use to compare char 3-5 with 138-140 on each line to determine whether to convert both occurrences, or whether to convert 3-5, and blank out 138-140 if they differ.

> cat ALL_WIDGETS
abc,hij
def,klm
ghi,nop



> cat convert_widgets
for WIDGETs in `cat ALL_WIDGETS`
do
WIDGET=`echo $WIDGETs | cut -d\, -f1`
grep -q $WIDGET $source_file
if [ $? = 0 ];then
alt_WIDGET=`echo $WIDGETs | cut -d, -f2`

##### If char 3-5 and char 138-140 match, convert both
if [ char 138-140 = char 3-5 ];then
echo "Converting $WIDGET to $alt_WIDGET @ 3-5"
sed -E "s/^(.{2})$WIDGET/\1$alt_WIDGET /" $source_file > $source_file.tmp
mv $source_file.tmp $source_file

echo "Converting $WIDGET to $alt_WIDGET @ 138-140"
sed -E "s/^(.{137})$WIDGET/\1$alt_WIDGET /" $source_file > $source_file.tmp
mv $source_file.tmp $source_file

else #### Otherwise, convert 3-5, and replace 138-140 with spaces
echo "Converting $WIDGET to $alt_WIDGET @ 3-5"
sed -E "s/^(.{2})$WIDGET/\1$alt_WIDGET /" $source_file > $source_file.tmp
mv $source_file.tmp $source_file
#blank out 138-140
echo "Blanking out $WIDGET @ 138-140"
sed -E "s/^(.{137})$WIDGET/\1 /" $Dest/PLEG > $Dest/PLEG.tmp
mv $source_file.tmp $source_file
fi
fi
done



Thanks for your help.

Turbocapitalist 03-02-2017 09:31 AM

The script will be more readable if you enclose it in [code] [/code] tags.

I would go with awk as suggested earlier, if the data is in space-delimited columns like this:

Code:

3 abc eelee ref: OAK ARR: ONT abc 0236
3 def eelee ref: OAK ARR: ONT def 0236
3 lmn eelee abc: OAK ARR: ONT lmn 0400

Then you can do the substitution in one line:

Code:

awk '$2 == "abc" && $2 == $8 { $2 = $8 = "def"; } { print; }' $source_file >> $temp_file;
You can even pass variables to awk.

Code:

awk --assign widget='abc' --assign altwidget='def' '$2 == widget && $2 == $8 { $2 = $8 = altwidget; } { print; }' $source_file >> $temp_file;

jdoginky 03-02-2017 10:20 AM

Thanks Turbocapitalist, unfortunately, my file is not space-delimited , however, every occurrence of 'widget' that I want to convert, is char 3-5 and 138-140 on the lines where 'widget' is found.

grail 03-02-2017 10:31 AM

As the fields may be run together, awk may be a little harder to use for the solution, here is the sed I was thinking of:
Code:

sed -r 's/^(.{2})(abc)(.{131})\2(.*)$/\1def\3def\4/'
So you can simply loop over your ALL_WIDGETS file and use the sed on the sourcefile as required, something like:
Code:

while IFS=, read -r current new
do
  sed -r -i "s/^(.{2})($current)(.{131})\2(.*)$/\1$new\3$new\4/" "$source_file"
done<ALL_WIDGETS

You might need to tweak it (can't remember if you have to escape the $ terminator), but you get the idea

rtmistler 03-02-2017 10:40 AM

Not attempting to write a script for this, however my tact for when I do something like this in sed or just using emacs search and replace, I find a unique property of what I wish to change for my search spec.

From all of your examples, you wish to change all [SPACE]abc[SPACE] to another pattern.

So code for that. Your contrary examples show either abc[COLON] or abc[other characters]

My points there being that you bring up character position, repeatedly, however I see no examples yet showing the simple search won't work. I daresay, (sorry) that you can find or invent more examples to contradict that. But consider (1) if you have more examples, then you really should show all examples now, not incrementally (2) if you're inventing contrary examples because you wish to stick with this adamant restriction, then I really can't help you much except to say that once you find <pattern> you can then evaluate the position to determine if it meets the further criteria for substitution.

My next point is about universal behavior. To whit is my example of either sed or emacs search and replace. Those are universal in that I have to give a search string and a replacement string. If you have an extremely and highly specific edit requirement, it is fine, but for me if I'm fixing one thing once, I do that and move on. If I know I will be fixing something many times over, then I will write re-usable code or script to do so. Therefore allowing for arguments and options and not just coding to a very, highly particular string and range of columns.

jdoginky 03-02-2017 02:34 PM

rtmistler, here is a before, and after sample of an actual file, and what I am trying to achieve. The values will change, but the changes will always be made to columns 3-5 and 138-140(if applicable)
(note: the red XX represent spaces)

BEFORE:
2UBBD 0008W16 01DEC1630DEC1602DEC16 02DEC16C BatchName 2015.2.4 1948000880
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
3 BBD77920101A01DEC1623DEC16 2345 AAA01500150+0100 ARN03450345+0100 71P TIC BBD7793 S F123VVFDAC 000881
3 BBD77920201A04DEC1618DEC16 2345 AAA23402340+0100 BRB00450045+0000 71P TIC BBD7793 S F123VVFDAC 000882
3 BBD77930101A01DEC1622DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BRA7797 S F123VVFDAC 000883
3 BBD77930102A01DEC1622DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD7792 S F123VVFDAC 000884
3 BBD77930201A02DEC1616DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BBD77921 S F123VVFDAC 000885
3 BBD77930202A02DEC1616DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD77922 S F123VVFDAC 000886
3 BBD77930301A05DEC1619DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD7792 S F123VVFDAC 000887
3 BBD77930401A23DEC1623DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BRA7753 S F123VVFDAC 000888
3 BBD77930402A23DEC1623DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD0000 S F123VVFDAC 000889


AFTER:
2UBF 0008W16 01DEC1630DEC1602DEC16 02DEC16C BatchName 2015.2.4 1948000880
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
3 BF 77920101A01DEC1623DEC16 2345 AAA01500150+0100 ARN03450345+0100 71P TIC BF 7793 S F123VVFDAC 000881
3 BF 77920201A04DEC1618DEC16 2345 AAA23402340+0100 BRB00450045+0000 71P TIC BF 7793 S F123VVFDAC 000882
3 BF 77930101A01DEC1622DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC XX 7797 S F123VVFDAC 000883
3 BF 77930102A01DEC1622DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 7792 S F123VVFDAC 000884
3 BF 77930201A02DEC1616DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BF 7795 S F123VVFDAC 000885
3 BF 77930202A02DEC1616DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 7798 S F123VVFDAC 000886
3 BF 77930301A05DEC1619DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 7792 S F123VVFDAC 000887
3 BF 77930401A23DEC1623DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC XX 7753 S F123VVFDAC 000888
3 BF 77930402A23DEC1623DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 0000 S F123VVFDAC 000889

rtmistler 03-02-2017 02:47 PM

In that example case you can globally replace "BBD" with "BF " and "BRA" with three space characters.

I would two pass that using sed.

Besides examples, how about a more inclusive summary of your requirements. For instance I can see BBD in the header and perhaps that aids you in determining what string to change later in the file. The BRA appears sparingly and therefore what qualifiers would tell someone entering the filename and search strings into a script as arguments to know that they need to specify that string?


All times are GMT -5. The time now is 05:17 PM.