Shell script: substitute a file's content according to a map?

Chowroc · 11-02-2005, 08:02 PM

Now I want to substitute some strings to other strings in one file according to a map. For example, the map is:

Code:

A    STR1
B    STR2
C    STR3
...

and the file is:

Code:

......<$A>....<$C>....
...<$C>....<$B>.....$<A>
............<$B>......

Now I can only use shell script to get one line every time and do substitution:

Code:

_map_num=`wc -l $_map | cut -d" " -f1`
while [ $i -le $_map_num ]; do
   line=`sed -n "$i p" $_map`
   <Do substitution by $line>
   i=`expr $i + 1`
done

But I think this is not efficient because the sed will process the $_map file every time to get a line. Is there any way that I can just achieve the goal to process 2 files in 1 time just use awk? Or is there any other ways to make it more efficient?

Thanks.

chrism01 · 11-02-2005, 10:09 PM

Here's a sed example that does every occurence of src string in all tgt files:

Code:

sed -i -e 's/Internalitem_code/internal_item_code/g' *.sql

HTH

Chowroc · 11-03-2005, 07:46 PM

look at this code:

Code:

_map_num=`wc -l $_map_file | cut -d" " -f1`
_map=`cat $_map_file`

if [ -z $_result ]; then _result=`cat $_tpl`; fi

i=1
while [ $i -le $_map_num ]; do
     line=`echo "$_map" | sed -n "$i p"`

 # echo "$_map" | while read line; do

The "sed" can works but I think it's not efficient enough. But "read" takes no effect.

What's wrong?

Chowroc · 11-03-2005, 08:02 PM

More detail, I have written a script to generate SQL from a template and a string map(I will use it to generate SQL automatically and query MySQL database with "mysql" continously for testing in my project). So the next time, if I will test another database, I don't need to change the code, but give a different template and map.

The template like this:

Code:

BEGIN;
SET INSERT_ID=SELECT MAX(UID) FROM ACCSTORE<$NUM> + 1;
# SET INSERT_ID=<$INSERT_ID>;
INSERT INTO `ACCSTORE<$NUM>` ( ACCOUNT , PASSWD , STATE , TYPE) VALUES( '<$ACCOUNT>' , '<$PASSWD>' , '0' , '0');
INSERT INTO `BASEINFO<$NUM>` ( ACCOUNT , ADDRESS , BIRTH , CREATETIME , EMAIL , IDCARD , MPHONE , MPTYPE , NICKNAME , PHONE , POSTNUM , SEX , SUPERPASSWD , TNAME , TOKENRING , UID) VALUES( '<$ACCOUNT>' , '' , '' , '' , '<$NAME>@<$DOMAIN>' , '' , '' , '0' , '' , '' , '' , '0' , '<$NAME>' , '' , '<$MD5>' , 'SELECT MAX(UID) FROM ACCSOTRE<$NUM> + 1');
INSERT INTO `POINTBONUS<$NUM>` ( ACCOUNT , UID) VALUES( '<$ACCOUNT>' , 'SELECT MAX(UID) FROM ACCSTORE<$NUM> + 1');
COMMIT;

This database have 7 types tables, and every type have 31 tables. There is a hash program to determine which to insert.

and The map file like this:

Code:

NUM:            cmd(random_int 0 32 %04d)
ACCOUNT:        cmd(random_str "a b c d e f g h i j k l m n o p q r s t u v w x y z - _ @ 1 2 3 4 5 6 7 8 9 0")
PASSWD:         cmd(random_str "a b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6 7 8 9 0 - = _ + @ % , . ; A B C D E J G H I J K L M N O P Q R S T U V W X Y Z")
NAME:           user
DOMAIN:         example.com.cn
MD5:            cmd(random_str "a b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6 7 8 9 0 - = _ + @ % , . ;" | md5sum | cut -d" " -f1)
STR:            cmd(random_str "a b c d e f g h i j k l m n o p q r s t u v w x y z - _ @ 1 2 3 4 5 6 7 8 9 0 - = + % , . ÖÜ Åô ÖÐ ÎÄ ; Ç¨¤ ¨®¨¨ ¨®Ò ÂÌ Ë¨° ÒÂ Ð¡À ¡¤ç Ï¸ Ó¨º ²» Ð¨¨ ¹¨¦ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z")
DATE:           cmd(date -d "`random_str 0 $RANDOM` days ago" -I)
NUSER:          new_user_name
NDOMAIN:        example.com
IDNUM:          cmd(expr $(random_int 0 1000000))

The script will looks up the map and finds the srings to substitute, if it is cmd(.*), then it will execute it as a shell command by "eval", and the result will be the string to replace. The script like this:

Code:

#!/bin/sh

# ¸Ã½Å¡À¾ÀûÓÃÒ»¸ö SQL Ä£¡ãæºÍÒ»¸öÕë¶Ô¸ÃÄ£¡ãæµÄÓ³ÉäÉ¨²³É SQL ²¨¦Ñ¯Óï¾äÒÔ¹©²âÊÔÖ®ÓÃ¡£
# Ä£¡ãæÖÐÓÐÒ»Ð© <$STR> ÐÎÊ½µÄ´®£¬Í¡§¹ýÔÚÓ³ÉäÎÄ¼þÖÐ²¨¦ÕÒÏ¨¤Ó¦µÄ STR ¶øµÃµ½Ìæ»»¡£
# ÕâÀïÀûÓÃÁË´¨²ÂëÉ¨²³ÉÆ¡ÂµÄÔÀ¨ª¡£

# ÖÜÅô, Chowroc at atgame, 20051102

_tpl=""
_map_file="map.txt"
_map=""
__result=""

# ÒÔÏÂº¯Êý²¨²É¨²Ö¸¶¡§¡¤¶Î¡ìÖ®ÄÚµÄËæ»¨²ÕûÊý
random_int()  {
    start=$1
    range=`expr $2 - $1`
    if [ $# -eq 3 ]; then 
        format=$3; 
    else
        format="%d"
    fi

    num=`echo "" | awk "{srand(); print int(rand()*$range)+$start; }"`
    printf "$format" $num 
}

# ÀûÓÃÒÔÏÂº¯Êý²¨²É¨²Ò»¸öËæ»¨²µÄ¡Àä³¡è¡ÁÖ¡¤û´®
random_str()  {
    len="length()"
    if [ $# -eq 2 ]; then len=$2; fi
    echo $1 \
        | sed 's/ /\n/'g  \
        | while read L; do echo "$L $RANDOM"; done  \
        | sort -k2n  \
        | cut -d" " -f1  \
        | while read L; do echo -n $L; done  \
        | awk "{srand(); num=int(rand()*$len)+1; print substr(\$0,0,num)}"
        # | awk "{srand(); num=int(rand()*length())+1; print substr(\$0,0,num)}"
        # | awk '{srand(); num=int(rand()*length())+1; print substr($0,0,num)}'
    echo
}

usage()  {
    echo "usage: $0 -f template [-m map]"
}

while [ $# -ge 1 ]; do
    if [ $1 == "-f" ]; then
        _tpl=$2; 
        # Èç¹û $2 Îª¿Õ£¬ÄÇÃ´ÏÂÃæµÄ wc ºÍ cat »¨¢Ê¹³ÌÐ¨°Í£Ö¹µÈ´ý´Ó¡À¨º¡Á¼ÊäÈë
        shift
    elif [ $1 == "-m" ]; then
        _map_file=$2;
        shift
    # elif [ $1 == "-" ]; then
    #   _result=``
    fi
    shift
done

# if [ $# -eq 0 ]; then
if [ -z $_tpl ] || [ -z $_map_file ]; then usage; exit 0; fi

if ! [ -f $_tpl ]; then
    echo "No such template file."
    exit 1
fi

if ! [ -f $_map_file ]; then
    echo "No such map file."
    exit 1
fi

_map_num=`wc -l $_map_file | cut -d" " -f1`
_map=`cat $_map_file`

if [ -z $_result ]; then _result=`cat $_tpl`; fi
# Ó¦¸Ã³¢ÊÔ´Ó¡À¨º¡Á¼ÊäÈë¶ÁÈ¡¿´ÄÜ¡¤ñÌ¨¢¸ßÖ´ÐÐÐ¡ìÂÊ£¡

i=1
while [ $i -le $_map_num ]; do
    # line=`sed -n "$i p" $_map_file` 
    line=`echo "$_map" | sed -n "$i p"`
    # Ã¿´Î¶ÁÈëÒ»ÐÐÓ³Éä

# echo "$_map" | while read line; do
# cat $_map_file | while read line; do
    # echo "$line"

    if echo $line | grep "cmd\(.*\)" >/dev/null 2>&1; then
        cmd=`echo $line | sed "s/^.*cmd(\(.*\))$/\1/g"` 
        src=`echo $line | awk -F: '{print $1}'`
        dst=`eval $cmd`
    else
        src=`echo $line | awk -F: '{print $1}'`
        dst=`echo $line | awk -F: '{print $2}' | sed 's/^ *//g'`
        # ÕâÀïÓÃ sed È¥³ýÐÐÊ¡ÁµÄ¿Õ¸ñ¡£¸¨¹ºÃµÄ¡ã¨¬¡¤¡§£¿
    fi
    # ¸¨´¾ÝÇ¨¦¿ö£¬Èç¹ûÓÐ cmd(.*)£¬Ô¨°½«Æä¡Á¡ÂÎª shell Ã¨¹ÁîÖ´ÐÐ²¢½«¡¤µ»Ø½¨¢¹û¡Á¡ÂÎªÌæ»»´®
    # echo "<\$$src> --> $dst"
    echo "<\$$src>"
    echo $dst
    # _result=`echo "$_result" | sed "s/<\$$src>/$dst/g"`
    _result=$(echo "$_result" | sed "s/<\$$src>/$dst/g")
    # ½øÐÐÌæ»»
    i=`expr $i + 1`
done

echo "$_result"

Any suggestion? I have said that now the problem is "sed" is not efficient, but "read" takes no effect.

Thanks

bigearsbilly · 11-04-2005, 03:20 AM

I like it

great idea.

I think maybe you are stretching the limits of shell programming here.

When you are starting to use lots of pipes to sed, awk, cut, grep
it starts getting very inefficient and messy.

maybe you should consider using perl?

OR (not so good but did you know?)

You can use m4 for simple substitution.
maybe first pass substitutions second pass evals?

you can substitue with m4 command line like so:

Code:

m4 -DA=STR1 -DB=STR2 -DC=STR3  template

or put the definitions in a file.

bigearsbilly · 11-04-2005, 03:26 AM

suggestion:

how about getting random words from a dictionary
( /usr/dict/words ? ) instead of just random letters?

Chowroc · 11-06-2005, 05:09 AM

Before I have tried to get random words from /usr/share/dict/linux.words, also slow for the same reason.

Now it is about generating 1 SQL transaction per sec(90s~2min for 105 transactions). I tested using "while read", It could only make it faster about 10~20 seconds, and if I use:
echo "$_result"
in the "while read", I could get the right result every cycle, but I don't know how to get it from the circulation with high performance!

And I think anothor problem is that: it must assign the _result variable every time, and this make the low efficiency.

Maybe I will rebuild it with python, I'm learning that. But before that, I want to try to test the concurrency: If I could run 20 copies of that script at the same time, what about the result?

Thank's for you help. :-)

bigearsbilly · 11-07-2005, 05:14 AM

Quote:

Before I have tried to get random words from /usr/share/dict/linux.words, also slow for the same reason.

you could always try converting it to a dbm database if you have time, it's very simple and very quick.

Chowroc · 11-07-2005, 07:41 AM

Quote:

Originally posted by bigearsbilly
you could always try converting it to a dbm database if you have time, it's very simple and very quick.

Could you give me some details?

I have tried the concurrency, and no effect. Even I write a simple C program that fork child processes to do the task!

In fact, I found that if I fork 5 child processes, The program will wait about 5 seconds, and 20 seconds for 20 childs. The execution becames not averagely, and the total time was near to the results before.

I don't know the exact reason.

Thank you very much.

bigearsbilly · 11-08-2005, 03:56 AM

example? for /usr/dict/words into DBM database?

I have some dbm code somewhere, It could be changed to make a simple
random word generator. Is this what you mean?

Chowroc · 11-14-2005, 08:45 PM

Quote:

Originally posted by bigearsbilly
example? for /usr/dict/words into DBM database?

I have some dbm code somewhere, It could be changed to make a simple
random word generator. Is this what you mean?

Yes, that what I mean. thank you.

bigearsbilly · 11-15-2005, 04:08 AM

ok I will dig it out.
it's at home so laters.