LinuxQuestions.org - [SOLVED] mass substitutions

Quote:

Originally Posted by ntubski (Post 4883803)

More improvement is possible by using the right algorithm, ...
I can't quite figure out what trs does because all the comments and names in the source are Polish.

I did some tests and it looks like trs isn't using a fast algorithm. trs is faster than sed, but only by a constant factor. For both methods, doubling input quadruples runtime.

Since it appears that there is no good solution for this, I think I'll try writing one using MultiFast: Aho-Corasick C Library.

Code:

~/tmp/replacing$ for size in 10 100 1000 2000 ; do ./replace.sh sed $size ; done

0.01 real        0.00 user        -        sed (10) 

0.05 real        0.04 user        -        sed (100) 

2.93 real        2.86 user        -        sed (1000) 

10.67 real        10.57 user        -        sed (2000) 

~/tmp/replacing$ for size in 10 100 1000 2000 4000 8000 ; do ./replace.sh trs $size ; done

0.00 real        0.00 user        -        trs (10) 

0.01 real        0.00 user        -        trs (100) 

0.24 real        0.23 user        -        trs (1000) 

0.83 real        0.82 user        -        trs (2000) 

3.15 real        3.12 user        -        trs (4000) 

12.44 real        12.34 user        -        trs (8000)

Code:

#!/bin/bash

# replace.sh



method=$1

size=$2



TIMEFORMAT=$'%2R real\t%2U user\t-\t'"$method ($size) "



case "$method" in

    sed)

        time sed -r 's|(^.*) (.*)|s/\1\/\2/g|' "$size-replacements.txt" | \

            sed -f - "$size-unreplaced.txt" > /dev/null

        ;;



    trs)

        time trs -f "$size-replacements.txt" < "$size-unreplaced.txt" > /dev/null

        ;;



    *) echo "I don't know how to do $method" ; exit 1

        ;;

esac

Code:

#!/bin/sh



# Some test inputs: replace a bunch numbers to xxxxxxx. The number of

# replacement pairs is equal to the size of input, so a naive

# algorithm will be show quadratic running times.



makeit()

{

    seq -f'%09.0f xxxxxxxxx' $1 > "$1-replacements.txt"

    seq -f'%09.0f' $1 > "$1-unreplaced.txt"

}



rm *-replacements.txt *-unreplaced.txt



makeit 10

makeit 100

makeit 1000

makeit 2000

makeit 4000

makeit 8000