Text conversion help
Hi, new to the group, and first time poster.
I'm trying to convert "abc" to "def" in a file, but only when found in positions 3,4,5 and 32,33,34. Example: I want to convert: 3 abc eelee ref: OAK ARR: ONT abc 0236 to: 3 def eelee ref: OAK ARR: ONT def 0236 but not change: 3 lmn eelee abc: OAK ARR: ONT lmn 0400 abc I thought sed would be my best option, but having problem figuring out how. |
Hi, new to the group, and first time poster.
I'm trying to convert "abc" to "def" in a file, but only when found in positions 3,4,5 and 32,33,34. Example: I want to convert: 3 abc eelee ref: OAK ARR: ONT abc 0236 to: 3 def eelee ref: OAK ARR: ONT def 0236 but not change: 3 lmn eelee abc: OAK ARR: ONT lmn 0400 abc I thought sed would be my best option, but having problem figuring out how. |
if position is important probably awk is a better tool
|
You say '3-5' and '32-34' which are character numbers. That can be very difficult to nail down, especially if there are any length differences or spaces that push your characters over 1 or 2.
Would it be better to look at it like 'column number'? Like for your example: Code:
1 2 3 4 5 6 7 8 9 |
Sorry, I didn't provide a very good example. There are some column's (fields)that run together, which would prevent your suggestion.
I want to convert: 3 abc eelee ref: OAK ARR: ONT abc0236 to: 3 def eelee ref: OAK ARR: ONT def0236 but not change: 3 lmn eelee abc: OAK ARR: ONT lmn0400 abc |
That shouldn't matter programmatically. For example, would this pseudo-code definition do what you expect?
Code:
if data in col 1,2,3 = 3,abc,eelee |
Quote:
While I tend to agree with pan64 that awk may be better, you should post your attempts with sed or awk to show what you have tried. LQ members are happy to help you, however they are also here as volunteers and further to help you to learn "how to" by your self. Thus it's best to see your earlier attempts to see how you approach a solution and then have members offer refinement. Please post some of the attempts and describe where the outcomes were not correct or what things you wished to do but could not because of your inexperience with either sed or awk, or some other tool. |
sed does this fairly easily :) Happy to show example once you show your attempts ;)
|
Thanks all. As I’ve looked more closely at this, I think I have a clear picture of what the original VB code was doing.
If char 3-5 is in the ALL_WIDGETS file If char 3-5 and 138-140 match convert both to alt_widget else convert char 3-5 to alt_widget replace char 138-140 with spaces My problem (in red)is what logic to use to compare char 3-5 with 138-140 on each line to determine whether to convert both occurrences, or whether to convert 3-5, and blank out 138-140 if they differ. > cat ALL_WIDGETS abc,hij def,klm ghi,nop > cat convert_widgets for WIDGETs in `cat ALL_WIDGETS` do WIDGET=`echo $WIDGETs | cut -d\, -f1` grep -q $WIDGET $source_file if [ $? = 0 ];then alt_WIDGET=`echo $WIDGETs | cut -d, -f2` ##### If char 3-5 and char 138-140 match, convert both if [ char 138-140 = char 3-5 ];then echo "Converting $WIDGET to $alt_WIDGET @ 3-5" sed -E "s/^(.{2})$WIDGET/\1$alt_WIDGET /" $source_file > $source_file.tmp mv $source_file.tmp $source_file echo "Converting $WIDGET to $alt_WIDGET @ 138-140" sed -E "s/^(.{137})$WIDGET/\1$alt_WIDGET /" $source_file > $source_file.tmp mv $source_file.tmp $source_file else #### Otherwise, convert 3-5, and replace 138-140 with spaces echo "Converting $WIDGET to $alt_WIDGET @ 3-5" sed -E "s/^(.{2})$WIDGET/\1$alt_WIDGET /" $source_file > $source_file.tmp mv $source_file.tmp $source_file #blank out 138-140 echo "Blanking out $WIDGET @ 138-140" sed -E "s/^(.{137})$WIDGET/\1 /" $Dest/PLEG > $Dest/PLEG.tmp mv $source_file.tmp $source_file fi fi done Thanks for your help. |
The script will be more readable if you enclose it in [code] [/code] tags.
I would go with awk as suggested earlier, if the data is in space-delimited columns like this: Code:
3 abc eelee ref: OAK ARR: ONT abc 0236 Code:
awk '$2 == "abc" && $2 == $8 { $2 = $8 = "def"; } { print; }' $source_file >> $temp_file; Code:
awk --assign widget='abc' --assign altwidget='def' '$2 == widget && $2 == $8 { $2 = $8 = altwidget; } { print; }' $source_file >> $temp_file; |
Thanks Turbocapitalist, unfortunately, my file is not space-delimited , however, every occurrence of 'widget' that I want to convert, is char 3-5 and 138-140 on the lines where 'widget' is found.
|
As the fields may be run together, awk may be a little harder to use for the solution, here is the sed I was thinking of:
Code:
sed -r 's/^(.{2})(abc)(.{131})\2(.*)$/\1def\3def\4/' Code:
while IFS=, read -r current new |
Not attempting to write a script for this, however my tact for when I do something like this in sed or just using emacs search and replace, I find a unique property of what I wish to change for my search spec.
From all of your examples, you wish to change all [SPACE]abc[SPACE] to another pattern. So code for that. Your contrary examples show either abc[COLON] or abc[other characters] My points there being that you bring up character position, repeatedly, however I see no examples yet showing the simple search won't work. I daresay, (sorry) that you can find or invent more examples to contradict that. But consider (1) if you have more examples, then you really should show all examples now, not incrementally (2) if you're inventing contrary examples because you wish to stick with this adamant restriction, then I really can't help you much except to say that once you find <pattern> you can then evaluate the position to determine if it meets the further criteria for substitution. My next point is about universal behavior. To whit is my example of either sed or emacs search and replace. Those are universal in that I have to give a search string and a replacement string. If you have an extremely and highly specific edit requirement, it is fine, but for me if I'm fixing one thing once, I do that and move on. If I know I will be fixing something many times over, then I will write re-usable code or script to do so. Therefore allowing for arguments and options and not just coding to a very, highly particular string and range of columns. |
rtmistler, here is a before, and after sample of an actual file, and what I am trying to achieve. The values will change, but the changes will always be made to columns 3-5 and 138-140(if applicable)
(note: the red XX represent spaces) BEFORE: 2UBBD 0008W16 01DEC1630DEC1602DEC16 02DEC16C BatchName 2015.2.4 1948000880 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 3 BBD77920101A01DEC1623DEC16 2345 AAA01500150+0100 ARN03450345+0100 71P TIC BBD7793 S F123VVFDAC 000881 3 BBD77920201A04DEC1618DEC16 2345 AAA23402340+0100 BRB00450045+0000 71P TIC BBD7793 S F123VVFDAC 000882 3 BBD77930101A01DEC1622DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BRA7797 S F123VVFDAC 000883 3 BBD77930102A01DEC1622DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD7792 S F123VVFDAC 000884 3 BBD77930201A02DEC1616DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BBD77921 S F123VVFDAC 000885 3 BBD77930202A02DEC1616DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD77922 S F123VVFDAC 000886 3 BBD77930301A05DEC1619DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD7792 S F123VVFDAC 000887 3 BBD77930401A23DEC1623DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BRA7753 S F123VVFDAC 000888 3 BBD77930402A23DEC1623DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BBD0000 S F123VVFDAC 000889 AFTER: 2UBF 0008W16 01DEC1630DEC1602DEC16 02DEC16C BatchName 2015.2.4 1948000880 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 3 BF 77920101A01DEC1623DEC16 2345 AAA01500150+0100 ARN03450345+0100 71P TIC BF 7793 S F123VVFDAC 000881 3 BF 77920201A04DEC1618DEC16 2345 AAA23402340+0100 BRB00450045+0000 71P TIC BF 7793 S F123VVFDAC 000882 3 BF 77930101A01DEC1622DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC XX 7797 S F123VVFDAC 000883 3 BF 77930102A01DEC1622DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 7792 S F123VVFDAC 000884 3 BF 77930201A02DEC1616DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC BF 7795 S F123VVFDAC 000885 3 BF 77930202A02DEC1616DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 7798 S F123VVFDAC 000886 3 BF 77930301A05DEC1619DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 7792 S F123VVFDAC 000887 3 BF 77930401A23DEC1623DEC16 2345 ARN18451845+0100 BRB20352035+0000 71P TIC XX 7753 S F123VVFDAC 000888 3 BF 77930402A23DEC1623DEC16 2345 BRB21452145+0000 AAA22552255+0100 71P TIC BF 0000 S F123VVFDAC 000889 |
In that example case you can globally replace "BBD" with "BF " and "BRA" with three space characters.
I would two pass that using sed. Besides examples, how about a more inclusive summary of your requirements. For instance I can see BBD in the header and perhaps that aids you in determining what string to change later in the file. The BRA appears sparingly and therefore what qualifiers would tell someone entering the filename and search strings into a script as arguments to know that they need to specify that string? |
All times are GMT -5. The time now is 05:17 PM. |