LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-30-2007, 10:01 AM   #1
dwynter
Member
 
Registered: Jun 2002
Distribution: Centos 4.4
Posts: 82

Rep: Reputation: 15
regex with sed to process file, need help on regex


Hi,

I have a large log file. In it I have several sections delineated by a header and footer line. I want to process all line between each header footer pair to strip out just the end of the line and create a new file , one for each header footer pair.

Example

blah blah creating lock file XYZ.v
blah blah no identifier found for value = 345.O
random content line
blah blah no identifier found for value = 123.O
blah blah deleting lock file XYZ.v
random content line
random content line
blah blah creating lock file ABC.v
blah blah no identifier found for value = ABC.O
random content line
blah blah no identifier found for value = DEF.O
blah blah deleting lock file ABC.v

Where "blah blah " can be any text, "random content line" is any text on a line without the pattern to be ignored, "no identifier found for value = " is the thing to use to recognize a line.

"creating lock file " is used to recognise a start of a section as a header and the text to the right of this is used as the filename for the output. The output is anything to the right of the "no identifier found for value = " to the end of the line, each output without line break and wrapped in '<text found goes here>'. Then "deleting lock file " shows the end of that output as footer. like so:

'345.O','123.O' would go into a file called XYZ.v.txt

I can use sed -e for each line using >> to output all in a for loop using cat to list the contents of the file to sed, I think, but putting it all together is rather daunting. Also the regex is beyond me. Any experts who do this in their sleep?

Thanks,

David
 
Old 08-30-2007, 10:33 AM   #2
dwynter
Member
 
Registered: Jun 2002
Distribution: Centos 4.4
Posts: 82

Original Poster
Rep: Reputation: 15
Here is what I think the script should look like in theory

for i in (cat /var/myapp/logfile)
do
$arg1 = sed -e '<header regex>'
$arg2 = sed -e '<footer regex>'
$arg3 = sed -e '<header regex>'
if [ $arg1 != "" ]
cat > $arg1
fi
if [ $arg2 != "" ]
^D > $arg1
fi
if [ $arg1 != "" ]
"'$arg3',"
fi
arg1=""
arg2=""
arg3=""
done

But of course it does not work the cat cat /var/myapp/logfile does not output a line at a time. Also not sure how to output a ^D to close the file.
 
Old 08-30-2007, 10:35 AM   #3
dwynter
Member
 
Registered: Jun 2002
Distribution: Centos 4.4
Posts: 82

Original Poster
Rep: Reputation: 15
With code tags this time, dooh.

Code:
for i in (cat /var/myapp/logfile)
    do
        $arg1 = sed -e '<header regex>'
        $arg2 = sed -e '<footer regex>'
        $arg3 = sed -e '<header regex>'
        if [ $arg1 != "" ]
            cat > $arg1
        fi
        if [ $arg2 != "" ]
            ^D > $arg1
        fi
        if [ $arg1 != "" ]
            "'$arg3',"
        fi
        arg1=""
        arg2=""
        arg3=""
done
 
Old 08-30-2007, 10:50 AM   #4
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
I'd put something like this in a script file:
Code:
#!/bin/bash

sed -n '
  s/^.*creating lock file \(.*\)$/FILE \1/p
  t file
  d
  : file
  n
  /deleting lock file/ d
  s/^.*no identifier found for value = \(.*\)$/= \1/p
  b file
' | awk '
  function new() {
    if (content != "" && out != "") { print content >out; close(out); }
    out=$2 ".txt"
    content=""
  }
  $1=="FILE" { new(); }
  $1=="=" {
    if (content != "") content=content ","
    content=content "'"'"'" substr($0, 3) "'"'"'"
  }
  END { new(); }
'
You run this with ./script.sh <logfile

Yves.
 
Old 08-31-2007, 04:23 AM   #5
dwynter
Member
 
Registered: Jun 2002
Distribution: Centos 4.4
Posts: 82

Original Poster
Rep: Reputation: 15
Clarifications on some parts of the script

Hi,

I'd like to learn how this thing works, at all steps.

I understand the sed part processing the stream and outputting the required substrings. But this part I cannot find how this works in the sed tutorial

Code:
  t file
  d
  : file
  n
...
  b file
The other part that I think will not work is this

Code:
    content=content "'"'"'" substr($0, 3) "'"'"'"
Because the argument $0 is everything from the "no identifier found for value = " to the end of the line and that is variable length.

I did run it and got no output at all so something is not right.

If I run it in my home dir can I use a fullpath to the log file? Like so
./script.sh </var/myapp/logslogfile

Thx.

David
 
Old 08-31-2007, 05:10 AM   #6
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
Yes, that's it, and there is indeed no output, as everything is put into files with extension ".txt".
Here's what it does:
Code:
sed -n '
    | FOR EACH LOG LINE...
        | Output "FILE <name of file>" if it is a file name
  s/^.*creating lock file \(.*\)$/FILE \1/p
        | If it was indeed so (t), then branch to label "file"
  t file
        | Else delete line and start at beginning with next
  d
        | From this label on, we know we've found a file name
  : file
        | Read next line
  n
        | Until we fine the closing of the block
  /deleting lock file/ d
        | Meanwhile output all not-found values: "= <value>"
  s/^.*no identifier found for value = \(.*\)$/= \1/p
        | And so on...
  b file
        | Now, all this output is worked on with awk:
' | awk '
        | This function creates a new <name of file>.txt and reinit variables
  function new() {
                | Create file only if there is content
    if (content != "" && out != "") { print content >out; close(out); }
                | Then empty variables
    out=$2 ".txt"
    content=""
  }
    | FOR EACH LINE FROM SED...
        | If it is a file name, call the function
  $1=="FILE" { new(); }
        | Else append the unfound value to the content
  $1=="=" {
    if (content != "") content=content ","
    content=content "'"'"'" substr($0, 3) "'"'"'"
  }
        | At the end, we don't forget to create the last file
  END { new(); }
'
The "'"'"'" you see above is
" begin awk string (enclosed between ")
' close awk script (enclosed between ')
" begin bash string (enclosed between ")
' this string's content is a single quote (the actual content of the awk string)
" close bash string (enclosed between ")
' reopen awk script (enclosed between ')
" close awk string (enclosed between ")
All that is so that we can put a single quote inside the awk script without having to enclose the script within double quotes instead of single quotes, which would force us to escape ("\" character) each and every ", \, or $ inside the script.

Yves.

Last edited by theYinYeti; 08-31-2007 at 05:15 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed regex question ShaqDiesel Programming 1 02-09-2007 07:24 PM
sed RegEx problems InJesus Programming 6 01-12-2007 11:48 AM
sed / regex question whysyn Linux - General 3 06-28-2005 02:11 PM
regex problem with sed ta0kira Programming 7 06-20-2005 12:33 AM
Help with Sed and regex cmfarley19 Programming 6 11-18-2004 01:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration