LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-14-2011, 01:04 PM   #1
corone
Member
 
Registered: Jul 2009
Posts: 46

Rep: Reputation: 1
[SOLVED] How to parse and modify these keywords using shell script?


Hello,

Code:
system information

model = xxx
specs = yyy
mode = zzz

model = iii
specs = jjj
mode = kkk

system information

model = aaa
specs = bbb
mode = ccc

model = ddd
specs = eee
mode = fff
There is a file with that format of each models' information.
I don't think that's good format, but I cannot change that format.

I needed to modify the model name, 'model = xxx' as 'model = abc'.
So I tried like the following.
Code:
sed -i "/system information/,/model = /s/model = .\+/model = abc/" filename
But this script modified not only 'model = xxx' but also 'model = aaa' as 'model = abc'.

And I don't know how to parse and modify 'model = iii' and 'model = ddd'.

The only clue to parse 'model = ddd' is the second 'model = ' after the second 'system information'. But how to parse the second keyword?
Is it possible with 'sed'?

I sometimes have to modify the information of the file.

using shell script if possible.
Python is ok. (Shell script is better for me.)

Thank you.

Last edited by corone; 04-26-2011 at 09:09 AM.
 
Old 04-14-2011, 07:34 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by corone View Post
I don't think that's good format, but I cannot change that format.

I needed to modify the model name, 'model = xxx' as 'model = abc'.
If that's all, then
Code:
sed -e 's|^\([\t ]*model[\t ]*=[\t ]*\)xxx[\t ]*$|\1abc|' -i file
should work -- I added the patterns for optional horizontal whitespace --, but you'll have quite a hard time getting the xxx and abc strings embedded correctly in the expression. You could use awk instead:
Code:
awk -v "old=xxx" -v "new=abc" '($2 == "=" && tolower($1) == "model" && tolower($3) == tolower(old)) { $3 = new } { print $0 }' infile
With awk, you'll have to redirect the output to a new file, and replace the original file if successful, though. The tolower()s make the comparisons case insensitive. Put that into a shell script, with proper temporary file handling, usage, and so on:
Code:
#!/bin/bash
if [ $# -ne 3 ]; then
    echo "" >&2
    echo "Usage: $0 [ -h | --help ]" >&2
    echo "       $0 old-model new-model datafile" >&2
    echo "" >&2

    [ $# -eq 0 ] && exit 0
    [ "$1" == "-h" ] && exit 0
    [ "$1" == "--help" ] && exit 0
    exit 1
fi

# Create a safe autodeleted temporary directory.
WORK="`mktemp -d`" || exit $?
trap "rm -rf '$WORK'" EXIT

# Run the awk command.
awk -v "oldmodel=$1" -v "newmodel=$2" '
    BEGIN {
        # Set record separator to any newline convention.
        RS = "(\r\n|\n\r|\r|\n)"

        # Old model is case insensitive.
        oldmodel = tolower(oldmodel)
    }

    ($2=="=" && tolower($1)=="model" && tolower($3) == oldmodel) {
        $3 = newmodel
    }

    {   print $0
    }

' "$3" > "$WORK/file" || exit $?

# Success. Clone the original file mode, if possible.
chmod --reference="$3" "$WORK/file" &>/dev/null

# Replace the original file.
mv -f "$WORK/file" "$3" || exit $?

# Done.
exit 0
However, the above awk expression will not handle values with whitespace in them. It also ignores the section header you might have in your file.

If you need to limit the replacement to under a specific header, and/or your value strings may contain whitespace, I'd use a bit more complex awk part in the bash + awk script:
Code:
#!/bin/bash
if [ $# -ne 4 ]; then
    echo "" >&2
    echo "Usage: $0 header oldmodel newmodel file" >&2
    echo "" >&2
    [ $# -eq 0 ] && exit 0
    [ "$1" == "-h" ] && exit 0
    [ "$1" == "--help" ] && exit 0
    exit 1
fi
HEADER="$1"
OLD="$2"
NEW="$3"
FILE="$4"

# Create an automatically removed temporary directory.
WORK="`mktemp -d`" || exit $?
trap "rm -rf '$WORK'" EXIT

# Run awk with the necessary parameters, saving the output to a temporary file.
awk -v "header=$HEADER" -v "old=$OLD" -v "new=$NEW" '
    BEGIN {
        # Record separator is a newline, in any convention.
        RS="(\r\n|\n\r|\r|\n)"

        # Field separator is =, including any whitespace around it.
        FS="[\t\v\f ]*=[\t\v\f ]*"

        # header and old are case insensitive; convert to lower case.
        header = tolower(header)
        old = tolower(old)

        # Not within a the correct header section.
        active=0
    }

    (NF == 1) {
        # Trim out whitespace and comments from the header string.
        value = tolower($0)
        gsub(/[\t\n\v\f\r ]+/, " ", value)
        sub(/^ +/, "", value)
        sub(/[#;].*$/, "", value)
        sub(/ +$/, "", value)

        # Set active nonzero if this is the correct header.
        active = (value == header)
    }

    (NF >= 2 && active && $1 ~ /^[\t\n\v\f\r]*[Mm][Oo][Dd][Ee][Ll]$/) {

        # Trim out whitespace from the model string.
        value = tolower($2)
        gsub(/[\t\n\v\f\r ]+/, " ", value)
        sub(/ +$/, "", value)
        sub(/^ +/, "", value)

        # If matches, replace the old value but retain whitespace.
        if (value == old)
            $0 = gensub(/(=[\t\v\f ]*).*/, "\\1" new, 1, $0)
    }

    {   print $0 }

    ' "$FILE" > "$WORK/file" || exit $?

# Copy the access mode from the original file.
chmod --reference="$FILE" "$WORK/file" &>/dev/null

# Replace the original with the temporary file.
mv -f "$WORK/file" "$FILE" || exit $?

# All done.
exit 0
The latter one also tries very hard to keep whitespace intact in the file. Neither of these are the best solution in any way, but they should help you in writing your own.

Note that the way I redefine RS in the awk scripts mean that it accepts the input file using any newline convention, converting them to standard Unix newlines (\n).

Hope this helps.
 
1 members found this post helpful.
Old 04-14-2011, 08:50 PM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,635

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
I must have missed something here?? Is there a reason something simple like:
Code:
sed -i '/model/s/xxx/abc/' file
won't work?

Based on the description I presume that the model to name (ie. xxx) is a unique combination, ie. no 2 models will
have xxx as a name?

Also, @ Nominal, just a query, as I have seen you use it before, what is the difference between:
Code:
RS="(\r\n|\n\r|\r|\n)"

# and

RS="[\n\r]*" # or maybe a '+' if we assume we want all not including any after the last line
Sorry for small hijack ... just curious
 
Old 04-14-2011, 08:55 PM   #4
kurumi
Member
 
Registered: Apr 2010
Posts: 223

Rep: Reputation: 45
Code:
$ ruby -ane '$F[-1]="abc" if /^model.*=.*xxx$/;puts $F.join("\s")' file
 
Old 04-14-2011, 09:02 PM   #5
kurumi
Member
 
Registered: Apr 2010
Posts: 223

Rep: Reputation: 45
Quote:
Originally Posted by grail View Post
Also, @ Nominal, just a query, as I have seen you use it before, what is the difference between:
Code:
RS="(\r\n|\n\r|\r|\n)"

# and

RS="[\n\r]*" # or maybe a '+' if we assume we want all not including any after the last line
Sorry for small hijack ... just curious
your version RS="[\n\r]*" (or + ) is equivalent to RS="" while Nominal's version makes the newlines a record itself. A simple test shows this
Code:
$ cat file
1

2



3

4

$ awk 'BEGIN{ RS=""}{print "->"$0}' file                               
->1                                                                                          
->2
->3
->4
$ awk 'BEGIN{ RS="(\r\n|\n\r|\r|\n)"}{print "->"$0}' file
->1
->
->2
->
->
->
->3
->
->4

$ awk 'BEGIN{ RS="[\n\r]+"}{print "->"$0}' file
->1
->2
->3
->4
I believe awk is able to take care of universal record separator so there's actually no need to set RS to RS="(\r\n|\n\r|\r|\n)".

Last edited by kurumi; 04-14-2011 at 09:04 PM.
 
Old 04-15-2011, 12:01 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,635

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
@kurumi - thanks for the explanation. It makes sense, but I believe the idea was to catch the scenario if the file is written in dos :
Quote:
Originally Posted by Nominal Animal
Note that the way I redefine RS in the awk scripts mean that it accepts the input file using any newline convention, converting them to standard Unix newlines (\n).
 
Old 04-15-2011, 12:57 AM   #7
corone
Member
 
Registered: Jul 2009
Posts: 46

Original Poster
Rep: Reputation: 1
Thank you for your answers.

I am so happy that you gave me many answers.
Thank you.
But I should have told you more.

The above information is just an example.
Code:
system information

model = 
specs = 
mode = 

model = 
specs = 
mode = 

system information

model = 
specs = 
mode = 

model = 
specs = 
mode =
I don't know the previous data at all.
I just know the format.

So the script should not parse the value, 'xxx' or 'yyy' from the file.
That's why I mentioned 'the second model' and 'the second system information'.

Please, give me more help.
Thank you.

Last edited by corone; 04-15-2011 at 01:02 AM.
 
Old 04-15-2011, 02:15 AM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by grail View Post
Also, @ Nominal, just a query, as I have seen you use it before, what is the difference between:
RS="(\r\n|\n\r|\r|\n)" and RS="[\n\r]*"
RS="(\r\n|\n\r|\r|\n)" matches all newline conventions, and counts empty records, whereas RS="[\n\r]*" does not count empty records.

Quote:
Originally Posted by kurumi View Post
I believe awk is able to take care of universal record separator so there's actually no need to set RS to RS="(\r\n|\n\r|\r|\n)".
Unfortunately, GNU awk 3.1.6 at least does not consider \r by itself a newline; the default RS seems to be equivalent to RS="\r?\n". Mac OS prior to X used \r as a newline, and I still sometimes run into such files.
_ _ _

As you might quess, I thought a bit further about processing configuration files, and reread the relevant sections in the GNU Awk User Manual. I realized there is a very simple way to set up RS and FS, and use RT and OFS to handle all this stuff.

Here is my skeleton awk script for parsing name=value -type configuration files. It automatically retains all whitespace, including newline conventions, and even when replacing the value. Furthermore, it fully supports shell-type comments, as long as they either start at the beginning of the line, or are preceded by whitespace. The comments are not shown in $0, and your rules need not worry about them at all.
Code:
BEGIN {
    RS = "[\t\n\v\f\r ]*[\r\n]+[\t\n\v\f\r ]*"
    FS = "[\t\v\f ]*=[\t\v\f ]*"
}

# Retain newline convention and whitespace.
{
    ORS = RT
    if (match($0, /[\t\v\f ]*=[\t\v\f ]*/, recs))
        OFS = recs[0]
    else
        OFS = " = "
}

# Handle comment lines.
($1 ~ /^[\t\v\f ]*#/) {
    print $0
    next
}
($0 ~ /[\t\v\f ]#/) {
    rec = $0
    sub(/[\t\v\f ]#.*$/, "", $0)
    ORS = substr(rec, 1 + length($0)) RT
}

# Do your normal processing here.
# If (NF==2), name is in $1 and value in $2.
# If (NF==1) you have a header record.
# You can replace $1 or $2, and whitespace (and comment)
# will still be retained intact.
# If you want to delete the record, use
#     print "" ; next

# This here line prints the current record to output.
{ print $0 }
 
Old 04-15-2011, 02:25 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,635

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
Well whether you use Nominal's code (looks cool btw), sed or ruby, you are going to need to know something about what you need changed
otherwise everywhere that 'model' appears the corresponding 'name' will be changed. I am not saying you need to know the exact name, but you would have to know at least
which section and which entry, ie first system information section and the second model entry.

So I think you will need to know your data a little better before we can truly help you further.
 
Old 04-15-2011, 02:44 AM   #10
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by corone View Post
So the script should not parse the value, 'xxx' or 'yyy' from the file.
That's why I mentioned 'the second model' and 'the second system information'.
Here's a bash + awk script which takes four parameters: the section number, model number, new model value, and the file name. The script retains whitespaces and shell-type comments (beginning with #). Any line which does not have a = and is not empty or a comment is a header line, and starts a new section. The part before the first header line in the file is section 0. Run the script without parameters to see the usage.
Code:
#!/bin/bash

usage () {
    exec >&2
    echo ""
    echo "Usage: $0 [ -h | --help]"
    echo "       $0 section model value file"
    echo ""
    echo "This script replaces the model'th model line"
    echo "under the section'th header in file 'file' with 'value'."
    echo ""
    exit $1
}

[ $# -eq 0 ] && usage 0
[ "$1" == "-h" ] && usage 0
[ "$1" == "--help" ] && usage 0
[ $# -ne 4 ] && usage 1

SECTION="$1"
INDEX="$2"
NEWMODEL="$3"
FILE="$4"

WORK="`mktemp -d`" || exit $?
trap "rm -rf '$WORK'" EXIT

awk -v "model=$NEWMODEL" -v "section=$SECTION" -v "occurrence=$INDEX" '

    BEGIN {
        RS = "[\t\n\v\f\r ]*[\r\n]+[\t\n\v\f\r ]*"
        FS = "[\t\v\f ]*=[\t\v\f ]*"

        section = int(section)
        currsect = 0
        currmodel = 0
    }

    {
        ORS = RT
        if (match($0, /[\t\v\f ]*=[\t\v\f ]*/, recs))
            OFS = recs[0]
        else
            OFS = " = "
    }
    ($1 ~ /^[\t\v\f ]*#/) {
        print $0
        next
    }
    ($0 ~ /[\t\v\f ]#/) {
        rec = $0
        sub(/[\t\v\f ]#.*$/, "", $0)
        ORS = substr(rec, 1 + length($0)) RT
    }

    (NF == 1) {
        currsect++
        currmodel = 0
    }

    (NF == 2 && currsect == section && tolower($1) == "model") {
        currmodel++
        if (currmodel == occurrence)
            $2 = model
    }

    {
        print $0
    }

' "$FILE" > "$WORK/file" || exit $?

if cmp -s "$FILE" "$WORK/file" ; then
    echo "$FILE: No changes." >&2
    exit 0
fi

chmod --reference="$FILE" "$WORK/file" &>/dev/null
mv -f "$WORK/file" "$FILE" || exit $?

echo "$FILE: Modified successfully." >&2
exit 0
If you save the above script as change-model.bash, then
Code:
./change-model.bash 2 3 'The New Model' models.txt
will change the third model=anything line under the second header to model=The New Model, in file models.txt. The script will tell if the file was modified or not, too.
 
Old 04-15-2011, 05:10 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,635

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
Nominal's new script goes only further to proving my point that without knowing something about the data, ie second section and third model along, that all the solutions
are kind of void until you bed down some more particulars.
 
Old 04-15-2011, 05:04 PM   #12
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by grail View Post
Nominal's new script goes only further to proving my point that without knowing something about the data, ie second section and third model along, that all the solutions are kind of void until you bed down some more particulars.
I agree.

corone, could you elaborate a bit on exactly what you are doing?

For example, if we knew that you need to modify say a hardware manifest, we could tell you that you'd save a lot of time by splitting the original manifest(s) into manageable parts (by, say, the header line, or by identifier in the section) -- for example, by splitting it into multiple files. Then it'd be much easier to modify each part separately. If you use a safe empty temporary directory to work in, you can name the parts as 001.part-identifier, 002.part-identifier, and so on. Then you can specify the part file name as *.part-identifier for each modification, to only modify specific parts, regardless of their order in the original manifest. Finally, merging the parts back to a single manifest is trivial: cat *.* (in the otherwise empty temporary directory).

The key is that the more complete picture of the problem we have, the better the solution.

It is always a good idea to describe what you've already tried. However, usually that's not enough. Your approach may be inefficient, for example. Therefore telling us also what is the entire task you wish to accomplish, not just the tricky bit you're having a problem with, is important for you to get a good solution. It also makes it much easier for others to give you advice. Sure, you'll probably get also advice that is not suitable for you for various reasons, but it never hurts to see how those solutions tick. You may be able to use some nugget in them to improve your solution.
 
Old 04-20-2011, 10:23 AM   #13
corone
Member
 
Registered: Jul 2009
Posts: 46

Original Poster
Rep: Reputation: 1
Thank you, Nominal Animal!!
Code:
./change-model.bash 2 3 'The New Model' models.txt
It works very very well.
That is exactly what I want.

I can never thank you enough.
I really really appreciate your helping me out.

I hope you to see my thanks.
And I wish I work with you in same office. =]

Last edited by corone; 04-20-2011 at 10:56 AM.
 
Old 04-20-2011, 10:54 AM   #14
corone
Member
 
Registered: Jul 2009
Posts: 46

Original Poster
Rep: Reputation: 1
Thank you for your advice, grail.
This is a little more expatiation.

There are four devices.
Code:
┌────┐
│Device1│
├────┤
│Device2│
└────┘
┌────┐
│Device3│
├────┤
│Device4│
└────┘
And this is a format for the devices.
Code:
system information

model = 
specs = 

model = 
specs = 

system information

model = 
specs = 

model = 
specs =
The first 'model =' is the model name of the Device1.
The second 'model =' is the model name of the Device2.
The third 'model =' is the model name of the Device3.
The fourth 'model =' is the model name of the Device4.

When the device is changed, the file should be modified automatically using a script.

I don't know the previous model name for the Device1.
I just know which device is changed as which model.

I think the format is very stupid.
The following would be much better.
Code:
system information

Device 1 model = 
specs = 

Device 2 model = 
specs =
Or at least,
Code:
system information

model 1 = 
specs = 

model 2 = 
specs =
Anyway I soved the problem with Nominal's kind help.
 
Old 04-20-2011, 11:06 AM   #15
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,635

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
In a way you have demonstrated that there is more information in that the devices run from top to bottom in line with the model and spec information.
This now means that your script simply needs to know which occurrence of model it is to change. Hence if you pass the number 2 it will wait
until it finds the second occurrence of model before initiating a change and exit immediately after.

So using Nominal's script you really only need the INDEX and NEWMODEL variables.

Glad you got it sorted
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] need help to modify output of shell script raggmopp1961 Programming 2 12-04-2010 03:26 AM
how to modify a line of a file using shell script? shayori Linux - Newbie 9 04-16-2010 03:44 AM
[shell script] execute command and parse output stoiss Programming 2 01-26-2009 02:49 AM
ssimple shell script to parse a file ~sed or awk stevie_velvet Programming 7 07-14-2006 04:41 AM
parse HTML file and find keywords ? fnd Programming 8 06-09-2004 01:35 PM


All times are GMT -5. The time now is 09:05 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration