-   Programming (
-   -   Modify a text files with awk/sed/perl (

climber75 05-07-2008 03:49 AM

Modify a text files with awk/sed/perl

I have a huge text file where I have to filter different command lines which starts everytime with the same characters.

So here's a litte example

This is the first line with no information
This is the second important line with the following Code in the middle RC0xxx Command = Important code"

This is another line with no important information
This is another line with no important information

This is a further important line in the middle with RC0xxx Command = Important code"

and so on

From these code "RC0xxx Command = Important code" I need the following structure:

RC0xxx;Important Code
RC0xxx;Important Code

This output can be written to a separat file. I also have to take care that I have no doubles inside the new files.

It's equal whether it's written in shell/perl/awk/sed or mixed code.

Thanks for your help :-)

colucix 05-07-2008 04:22 AM

What have you tried so far? Anyway, here is a little awk code:

/RC0xxx Command = / {
      ix = index($0,"RC0xxx Command = ")
      st = substr($0,ix + 17)
      printf "RC0xxx;%s\n",array[1]

First it checks the index at which the string "RC0xxx Command = " starts, then extracts the content of the line after the string "RC0xxx Command = ", splits it using the double quote as separator and finally prints out the requested line. I have considered the double quotes at the end of the command always there, otherwise you have to find some other criteria to establish where the command terminates.

climber75 05-07-2008 05:47 AM

Thanks for your really quick answer!!! I tried it with awk but I'm not so familiar with it. So I'm amused that you use it :-)

Maybe I've to post a view lines of the real text file so it's getting clearer what
I want.


patchinstall.exe /g:144 /n /z:s /f /c:90 /p /t:30 /m:"patchauthorize.xml" changed to PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
Microsoft Update - Mandatory November - Ran at - 21.11.2006 11:29:00 Program Name = Microsoft Update - Mandatory November ID = RC020376 Commandline = PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"

Ending process 21.11.2006 11:30:01

21.11.2006 13:00:01

LAP - Ran at - 13.01.2007 18:00:00 Program Name = LAP ID = RC020232 Commandline = wscript.exe ResetPassword.vbs Current
RC_RealPlayer_10.5 - Ran at - 13.01.2007 18:00:00 Program Name = Remove_RC_RealPlayer_8.0 ID = RC020382 Commandline = wscript.exe Remove_Legacy_Realplayer_SMS.vbs
RC_RealPlayer_10.5 - Ran at - 13.01.2007 18:00:00 Program Name = RC_Real_Player_10.5_Upgrade_3 ID = RC020383 Commandline = wscript.exe \\ccanet\approot\installs\apps\RC_RealPlayer_10.5\RC_RealPlayer_10.5_push_2.vbs
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 ID = RC020384 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 LoggedOff ID = RC020385 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"

Ending process 13.01.2007 19:00:02

Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 LoggedOff ID = RC020385 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"
13.01.2007 20:30:00

Ending process 13.01.2007 20:30:00

13.01.2007 22:00:00
and so on


Here's the output how it should be:

RC020376;PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
RC020232;wscript.exe ResetPassword.vbs Current
RC020382;wscript.exe Remove_Legacy_Realplayer_SMS.vbs
RC020383;wscript.exe \\ccanet\approot\installs\apps\RC_RealPlayer_10.5\RC_RealPlayer_10.5_push_2.vbs
RC020384;PatchInstall.exe /g:312 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
RC020385;PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"

All informations have the following same:
Begin with RC0 and end with CRLF (Carriage Return, Line Feed)

I hope this helps


syg00 05-07-2008 06:37 AM

Looks pretty easy to parse with any of them if, as appears likely, the wanted data extends to eol.
What have you tried ??? - better to get help with specific problems than to expect others to do your work for you.

colucix 05-07-2008 07:59 AM


Originally Posted by climber75 (Post 3145470)
I tried it with awk but I'm not so familiar with it.

You cited awk among your options, I thought you were familiar with it. What scripting language are you used to? And what have you tried till now?

pixellany 05-07-2008 08:05 AM

You can use SED to strip out the desired patterns.....something like this:
sed -n 's/.*\(pattern\)/\1/p' filename > newfilename

To this you simply add another SED command (using -e) to replace the "Commandline = " with ";"

Alternatively, it call all be done inside one SED "s" command.

Really good SEd tutorial here:

climber75 05-07-2008 08:56 AM

Normally I use shell scripting but in this case I thought awk would be helpful. So I bought a awk book
and go step by step. I still prefer awk to solve this problem because of the learning effect.

So If you have already time to help me this would be nice!

Why this script:
Every RCxxx entry of this textfile cause a e-mail as long as I put it in the demanded format. So we are everytime informed of automatic installations which happens in the background.


colucix 05-07-2008 12:25 PM

A very good awk book is the official guide, here. Anyway, following my previous post, you can try something like

/RC0..... Commandline =/ {
      ix = index($0,"RC0")
      rc = substr($0,ix,8)
      st = substr($0,ix+23)
      printf "%s;%s\n",rc,st

You can try to understand by yourself what exactly this code do (indeed, it's not difficult at all). There are other ways to do the same thing in awk, for example by parsing fields instead of extracting substrings, but I prefer this method. Cheers!

chrism01 05-07-2008 07:29 PM


#!/usr/bin/perl -w

use strict;            # Enforce declarations

my (
    @arr1, $file, $rec, $var2, $var3

open( TXT_FILE, "<$file" ) or
            die "Can't open txt file: $file: $!\n";
@arr1 = <TXT_FILE>;
# remove newline endings
close(TXT_FILE) or
            die "Can't close txt file: $file: $!\n";
for $rec (@arr1)
    if( $rec =~ /RC[0-9]{6}/ )
        # Split rec on '=', get fields we want
        ($var2, $var3) = (split(/=/, $rec))[2,3];

        # Remove unwanted string, replace with ';'
        $var2 =~ s/ Commandline /;/;
        # remove leading spaces
        $var2 =~ s/^\s+//;
        $var3 =~ s/^\s+//;

        # concat & print to stdout
        print "${var2}${var3}\n";

Assumes field layout is constant as implied by examples above.

See f you want more perl explanations, or ask again here.

Markcore 05-08-2008 12:33 AM

a way in bash


while read line; do
        filt="${line#*ID = }"
        if [ "${filt}" = "${line}" ]; then
        id="${filt%% *}"
        code="${filt#*Commandline = }"
        echo "${id};${code}"

done < blah.txt

or i guess...

while read line; do
        filt="${line#*ID = }"
        [ "${filt}" = "${line}" ] && continue;
        echo "${filt%% *};${filt#*Commandline = }"
done < blah.txt

or i suppose


while read line; do
        filt="${line#*ID = }"
        [ "${filt}" != "${line}" ] && echo "${filt%% *};${filt#*Commandline = }"
done < blah.txt


awk -F"=" '(NF) && $(NF-1)~/^ RC0/ {split($(NF-1),b," "); print b[1]";"$NF}' blah.txt
sorry, I'm bored.

theNbomr 05-08-2008 11:55 AM

Obligatory perl one-liner offering:

perl -e 'while(<>){if($_ =~ m/(RC0[0-9]+)\s*Commandline\s*=\s*(.+$)/){ print "$1;$2";}}'
Give input text file as input. Redirect output to file.
--- rod.

colucix 05-08-2008 11:59 AM

I love these multiple contributions using different languages! :)

PS - Waiting for a sed and/or python solution...

ghostdog74 05-09-2008 01:54 AM


awk '{
 print substr($0,RSTART)
}' file


for n,line in enumerate(open("file")):
    for m,j in enumerate(line):
        if "RC" in j and j[2:].isdigit():
            print ' '.join(line[m:])

syg00 05-09-2008 09:47 PM

I see a sed contribution is still missing. Try this (note I just stole theNbomr's regex, and told sed to use regex-extended)

sed -nr 's:.*(RC0[0-9]+)\s*Commandline\s*=\s*(.+$):\1;\2:p' testreg.txt

dfezz1 08-05-2008 04:15 PM

OK OK I am SUCH a newbie that I don't get it. I admit it!

So I need to do something similar, but my example is easier and your answer to my question might help me get the above reply's:

Thank You in advance.

Here is the situation.

Linux RHEL4.6

I want to disable "CTRL-ALT-DEL" in the /etc/inittab

I want to replace:
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# Changed 8-5-08 -dfezz1 (disabling ctrl-alt-del at console)
ca:12345:ctrlaltdel:/bin/echo "CTRL-ALT-DEL is disabled"

I have tried the simplest SED I know:
$sed 's/replace_please/REPLACED_THX/g' /tmp/dummy
$sed 's/ca::ctrlaltdel:/sbin/shutdown -t3 -r now/ca:12345:ctrlaltdel:/bin/echo "CTRL-ALT-DEL is disabled"/g' /tmp/dummy

As you can tell from my feeble attempt, it didn't work, spaces and quotes seem to be the main reason.
Any help???


My /etc/inittab:
For Ref.

[root@myserver Project_Server_Files]# cat /etc/inittab
# inittab This file describes how the INIT process should set up
# the system in a certain run-level.
# Author: Miquel van Smoorenburg, <>
# Modified for RHS Linux by Marc Ewing and Donnie Barnes

# Default runlevel. The runlevels used by RHS are:
# 0 - halt (Do NOT set initdefault to this)
# 1 - Single user mode
# 2 - Multiuser, without NFS (The same as 3, if you do not have networking)
# 3 - Full multiuser mode
# 4 - unused
# 5 - X11
# 6 - reboot (Do NOT set initdefault to this)

# System initialization.

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# When our UPS tells us power has failed, assume we have a few minutes
# of power left. Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.
pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled"

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon

All times are GMT -5. The time now is 12:21 PM.