LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   substitute part of a regex (https://www.linuxquestions.org/questions/programming-9/substitute-part-of-a-regex-4175421255/)

schneidz 08-09-2012 01:37 PM

substitute part of a regex
 
[aix] hi, i would like to modify portions of a file that look like ths so that the month is incremented by 1
Code:

DTP*434*RD8*20110714-20110716
DTP*434*RD8*20110714-20110718
DTP*435*DT*201106100600
DTP*435*DT*201107110500
DTP*472*D8*20110712
DTP*472*RD8*20110718-20110718
DTP*472*RD8*20110719-20110719
DTP*472*RD8*20110720-20110720
DTP*472*RD8*20110721-20110721
DTP*472*RD8*20110722-20110722
DTP*472*RD8*20110723-20110723
DTP*472*RD8*20110724-20110724
DTP*573*D8*20110803

i tried this in sed but no go
Code:

sed s/DTP.472.RD8.2011..../DTP*472*RD8*201108../ date.txt
DTP*472*RD8*201108..-20110724

thanks,

linuxxer 08-09-2012 03:29 PM

It is difficult to understand, so give the expected OUTPUT.

pixellany 08-09-2012 04:54 PM

If I understand correctly, you want to modify date codes---example:

OLD: 20110723
NEW: 20110823

For a completely general solution, you cannot do that directly with SED, unless you plan to enter each substitution by hand**.

For a general solution, I would see something like this (psedocode):
Code:

In a loop:
    read in a line
    use SED, Grep, Awk to extract the month into a variable
    increment the variable by 1
    insert the new value using SED
(return to the beginning of the loop)

**If you want to do each substitution by hand, you might as well just use a text editor


A less robust solution:
Code:

sed 's/201107/201108/g' oldfile > newfile
repeat as require for different date codes

danielbmartin 08-09-2012 04:57 PM

Brute force ...
Code:

sed  's/201112/201201/g' < $InFile  \
|sed 's/201111/201112/g' \
|sed 's/201110/201111/g' \
|sed 's/201109/201110/g' \
|sed 's/201108/201109/g' \
|sed 's/201107/201108/g' \
|sed 's/201106/201107/g' \
|sed 's/201105/201106/g' \
|sed 's/201104/201105/g' \
|sed 's/201103/201104/g' \
|sed 's/201102/201103/g' \
|sed 's/201101/201102/g' \
> $OutFile

Daniel B. Martin

firstfire 08-09-2012 11:14 PM

Hi.

Here is another sed solution:
Code:

#!/bin/sed -rf
s/$/ 01020304050607080910111201/
:a
s/2011(..)(.*\1(..)(..)*)/%\3\2/g
ta
s/ [^ ]*$//
s/%/2011/g

At the first line of the script we append the lookup table delimited from other text by space. :a is a label to jump to later. In the third line we search for two characters after 2011, search these two characters in lookup table and replace characters after a year by next two characters in the lookup table. I also replace a year by the % sign to mark already processed dates. If last substitution was successful, jump to label a. Otherwise (all dates on the line are processed) remove lookup table and substitute all %-s by 2011.

@Daniel: quite clever and simple! It would probably be faster to make all substitutions by single sed process than using pipes. Anyway it should be much faster than mine.

EDIT: I just realized that my script increments only the month, so 201112 incremented to 201101... This is a bug :)
To solve this, one may add
Code:

s/%01/201201/g
as a penultimate line of the script.

grail 08-10-2012 05:36 AM

How about:
Code:

#!/bin/bash

while read -r line
do
        IFS='*-' && set -- $line && unset IFS

        if (( $# == 5 ))
        then
                line=${line/$4/$(date -d"$4 +1 month" "+%Y%m%d")}
                line=${line/$5/$(date -d"$5 +1 month" "+%Y%m%d")}
        fi
        echo $line
done<date.txt >temp.txt

if ! diff -q date.txt temp.txt &>/dev/null
then
    mv temp.txt date.txt
fi

This also does not suffer from needing to change the year if required :)

firstfire 08-10-2012 06:01 AM

Yes, I was also thinking about using date:
Code:

$ sed -r 's/^/echo /; s/([*-])([0-9]{8})/\1$(date -d "\2 +1 month" "+%Y%m%d")/eg' infile
DTP*434*RD8*20110814-20110816
DTP*434*RD8*20110814-20110818
DTP*435*DT*201107100600
DTP*435*DT*201108110500
DTP*472*D8*20110812
DTP*472*RD8*20110818-20110818
DTP*472*RD8*20110819-20110819
DTP*472*RD8*20110820-20110820
DTP*472*RD8*20110821-20110821
DTP*472*RD8*20110822-20110822
DTP*472*RD8*20110823-20110823
DTP*472*RD8*20110824-20110824
DTP*573*D8*20110903

But this will work only with GNU sed, because the e (evaluate) modifier is used.

schneidz 08-10-2012 06:02 AM

hi, i ended up piping about 10 sed substitutes (there was one date fror 2010). i figured changing the month was easier than incrementig the day because i wouldn't have to worry about the 30th/31st (or 28th/29th) as much.

also, aix's version of date does not have the -d parameter.

thanks, i'm sure i'll have to revisit this in the future so now i have some reference.

Reuti 08-10-2012 12:45 PM

NB: Besides compiling the GNU coreutils on AIX, you can check whether the admin installed them already at a location like /opt/freeware/bin as IBM offers the AIX toolbox with these.

danielbmartin 08-10-2012 01:18 PM

Quote:

Originally Posted by firstfire (Post 4750520)
It would probably be faster to make all substitutions by single sed process than using pipes.

Yes, like so:
Code:

sed  's/201112/201201/g;
      s/201111/201112/g;
      s/201111/201112/g;
      s/201110/201111/g;
      s/201109/201110/g;
      s/201108/201109/g;
      s/201107/201108/g;
      s/201106/201107/g;
      s/201105/201106/g;
      s/201104/201105/g;
      s/201103/201104/g;
      s/201102/201103/g;
      s/201101/201102/g' < $InFile \
> $OutFile

Daniel B. Martin

schneidz 08-10-2012 01:21 PM

different question but why is this:
Code:

sed s/hello/world/ < file.txt
preferred over this:
Code:

sed s/hello/world/ file.txt
?

danielbmartin 08-10-2012 02:18 PM

Quote:

Originally Posted by schneidz (Post 4751078)
different question but why is this:
Code:

sed s/hello/world/ < file.txt
preferred over this:
Code:

sed s/hello/world/ file.txt
?

I prefer the former construct because it is more readable. Of course, readability is subjective. Others will disagree.

Daniel B. Martin


All times are GMT -5. The time now is 06:41 PM.