[SOLVED] replacing list of numbers from a file

Jykke · 06-24-2012, 02:41 AM

I have a rather long file with ids (integer) in it, under the circumstances if I do not do some processing they appear on comma separated lists
about 10 per line, for example:
1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010

now I want to replace these ids one by one with new ones and I can allocate
the new ids into a separate list, for example:
1001, 2001
1002, 2002
etc.
now the numbers are not running numbers and not in some order or as in these examples.

The question is how can I perform the replacement in file 1 taking the new ids from file 2 one by one. There is a chance that an id may occur more in the file 1, but it is highly unlikely so it could be limited to replacement of first occurence.

How should I go on? can I pipe it through sed somehow or should I make a short program? I can still modify the format of file 2 to for example:
/1001/,/2001/
this I could maybe pipe into sed as a variable, but at this moment my brain fails me - so help would be appreciated.

pixellany · 06-24-2012, 03:14 AM

It is not clear to me exactly what you need to do---please post a sample of file 1, a sample of file 2, and sample of what the final should look like.

Nominal Animal · 06-24-2012, 03:30 AM

If the file format is not too sensitive, you could get by with a pretty simple awk script.

Code:

#!/usr/bin/awk -f
BEGIN {
    # Accept any newline convention; ignore leading and trailing whitespace.
    RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"

    # Fields are separated by commas, with optional whitespace.
    FS = "[\t\v\f ]*,[\t\v\f ]*"

    # For output, use newlines and commas only.
    ORS = "\n"
    OFS = ","

    # First file specifies replacements.
    file = 0
}

# Increase file number when the first record is seen.
(FNR == 1) { file++ }

# Record replacements only from the first file.
(file == 1 && NF >= 2) { replace[$1] = $2 }

# All other files:
(file > 1 && NF > 0) {
    for (i = 1; i <= NF; i++) {
        value = $i
        if (value in replace)
            value = replace[value]
        if (i < NF)
            printf("%s%s", value, OFS)
        else
            printf("%s%s", value, ORS)
    }
}

You specify two or more input files to the script. The first one contains the replacements: old values in the first column, and replacement values in the second column.

The rest of the input files will be split at commas (with whitespace around a comma removed). The values are only replaced if the entire field matches. On output, it uses commas (OFS) between fields, and newline (ORS) to end each line.

Given first input file containing

Code:

1001, 2001
1002, 2002

and second input file containing

Code:

1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010

the script will output

Code:

2001,2002,1003,1004,1005,1006,1007,1008,1009,1010

Jykke · 06-24-2012, 04:38 AM

Thanks that is exactly what I need - well have to try it out first, but according to your output it
seems to do the job!

Thanks!