LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Modify a text files with awk/sed/perl (http://www.linuxquestions.org/questions/programming-9/modify-a-text-files-with-awk-sed-perl-640466/)

climber75 05-07-2008 02:49 AM

Modify a text files with awk/sed/perl
 
Hi,

I have a huge text file where I have to filter different command lines which starts everytime with the same characters.

So here's a litte example

------------------------------
This is the first line with no information
This is the second important line with the following Code in the middle RC0xxx Command = Important code"

This is another line with no important information
This is another line with no important information

This is a further important line in the middle with RC0xxx Command = Important code"

..
...
....
and so on

------------------------------------
From these code "RC0xxx Command = Important code" I need the following structure:

RC0xxx;Important Code
RC0xxx;Important Code


This output can be written to a separat file. I also have to take care that I have no doubles inside the new files.

It's equal whether it's written in shell/perl/awk/sed or mixed code.

Thanks for your help :-)
climber75

colucix 05-07-2008 03:22 AM

What have you tried so far? Anyway, here is a little awk code:
Code:

/RC0xxx Command = / {
      ix = index($0,"RC0xxx Command = ")
      st = substr($0,ix + 17)
      split(st,array,"\"")
      printf "RC0xxx;%s\n",array[1]
}

First it checks the index at which the string "RC0xxx Command = " starts, then extracts the content of the line after the string "RC0xxx Command = ", splits it using the double quote as separator and finally prints out the requested line. I have considered the double quotes at the end of the command always there, otherwise you have to find some other criteria to establish where the command terminates.

climber75 05-07-2008 04:47 AM

Thanks for your really quick answer!!! I tried it with awk but I'm not so familiar with it. So I'm amused that you use it :-)

Maybe I've to post a view lines of the real text file so it's getting clearer what
I want.

-------------------------------------------------------------------------------------------------------------------

COMMANDLINE CHANGE FROM APPROVED LIST
patchinstall.exe /g:144 /n /z:s /f /c:90 /p /t:30 /m:"patchauthorize.xml" changed to PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
Microsoft Update - Mandatory November - Ran at - 21.11.2006 11:29:00 Program Name = Microsoft Update - Mandatory November ID = RC020376 Commandline = PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"

Ending process 21.11.2006 11:30:01

21.11.2006 13:00:01

NEW ADVERTISEMENT
LAP - Ran at - 13.01.2007 18:00:00 Program Name = LAP ID = RC020232 Commandline = wscript.exe ResetPassword.vbs Current
NEW ADVERTISEMENT
RC_RealPlayer_10.5 - Ran at - 13.01.2007 18:00:00 Program Name = Remove_RC_RealPlayer_8.0 ID = RC020382 Commandline = wscript.exe Remove_Legacy_Realplayer_SMS.vbs
NEW ADVERTISEMENT
RC_RealPlayer_10.5 - Ran at - 13.01.2007 18:00:00 Program Name = RC_Real_Player_10.5_Upgrade_3 ID = RC020383 Commandline = wscript.exe \\ccanet\approot\installs\apps\RC_RealPlayer_10.5\RC_RealPlayer_10.5_push_2.vbs
NEW ADVERTISEMENT
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 ID = RC020384 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
NEW ADVERTISEMENT
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 LoggedOff ID = RC020385 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"

Ending process 13.01.2007 19:00:02

NEW ADVERTISEMENT
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 LoggedOff ID = RC020385 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"
13.01.2007 20:30:00


Ending process 13.01.2007 20:30:00

13.01.2007 22:00:00
...
...
and so on

------------------------------------------------------------------------------------------------------------


Here's the output how it should be:

RC020376;PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
RC020232;wscript.exe ResetPassword.vbs Current
RC020382;wscript.exe Remove_Legacy_Realplayer_SMS.vbs
RC020383;wscript.exe \\ccanet\approot\installs\apps\RC_RealPlayer_10.5\RC_RealPlayer_10.5_push_2.vbs
RC020384;PatchInstall.exe /g:312 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
RC020385;PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"



All informations have the following same:
Begin with RC0 and end with CRLF (Carriage Return, Line Feed)

I hope this helps

Thanks
climber75

syg00 05-07-2008 05:37 AM

Looks pretty easy to parse with any of them if, as appears likely, the wanted data extends to eol.
What have you tried ??? - better to get help with specific problems than to expect others to do your work for you.

colucix 05-07-2008 06:59 AM

Quote:

Originally Posted by climber75 (Post 3145470)
I tried it with awk but I'm not so familiar with it.

You cited awk among your options, I thought you were familiar with it. What scripting language are you used to? And what have you tried till now?

pixellany 05-07-2008 07:05 AM

You can use SED to strip out the desired patterns.....something like this:
sed -n 's/.*\(pattern\)/\1/p' filename > newfilename

To this you simply add another SED command (using -e) to replace the "Commandline = " with ";"

Alternatively, it call all be done inside one SED "s" command.

Really good SEd tutorial here: http://www.grymoire.com/Unix/Sed.html

climber75 05-07-2008 07:56 AM

Normally I use shell scripting but in this case I thought awk would be helpful. So I bought a awk book
and go step by step. I still prefer awk to solve this problem because of the learning effect.

So If you have already time to help me this would be nice!

Why this script:
Every RCxxx entry of this textfile cause a e-mail as long as I put it in the demanded format. So we are everytime informed of automatic installations which happens in the background.


Regards
climber75

colucix 05-07-2008 11:25 AM

A very good awk book is the official guide, here. Anyway, following my previous post, you can try something like
Code:

/RC0..... Commandline =/ {
      ix = index($0,"RC0")
      rc = substr($0,ix,8)
      st = substr($0,ix+23)
      printf "%s;%s\n",rc,st
}

You can try to understand by yourself what exactly this code do (indeed, it's not difficult at all). There are other ways to do the same thing in awk, for example by parsing fields instead of extracting substrings, but I prefer this method. Cheers!

chrism01 05-07-2008 06:29 PM

Perl:
Code:

#!/usr/bin/perl -w

use strict;            # Enforce declarations

my (
    @arr1, $file, $rec, $var2, $var3
  );

$file="test.txt";
open( TXT_FILE, "<$file" ) or
            die "Can't open txt file: $file: $!\n";
@arr1 = <TXT_FILE>;
# remove newline endings
chomp(@arr1);
close(TXT_FILE) or
            die "Can't close txt file: $file: $!\n";
for $rec (@arr1)
{
    if( $rec =~ /RC[0-9]{6}/ )
    {
        # Split rec on '=', get fields we want
        ($var2, $var3) = (split(/=/, $rec))[2,3];

        # Remove unwanted string, replace with ';'
        $var2 =~ s/ Commandline /;/;
        # remove leading spaces
        $var2 =~ s/^\s+//;
        $var3 =~ s/^\s+//;

        # concat & print to stdout
        print "${var2}${var3}\n";
    }
}

Assumes field layout is constant as implied by examples above.

See http://perldoc.perl.org/ f you want more perl explanations, or ask again here.

Markcore 05-07-2008 11:33 PM

a way in bash
 
Code:

#!/bin/bash

while read line; do
        filt="${line#*ID = }"
        if [ "${filt}" = "${line}" ]; then
                continue;
        fi
        id="${filt%% *}"
        code="${filt#*Commandline = }"
        echo "${id};${code}"

done < blah.txt

or i guess...
Code:

#!/bin/bash
while read line; do
        filt="${line#*ID = }"
        [ "${filt}" = "${line}" ] && continue;
        echo "${filt%% *};${filt#*Commandline = }"
done < blah.txt

or i suppose
Code:

#!/bin/bash

while read line; do
        filt="${line#*ID = }"
        [ "${filt}" != "${line}" ] && echo "${filt%% *};${filt#*Commandline = }"
done < blah.txt

Code:

awk -F"=" '(NF) && $(NF-1)~/^ RC0/ {split($(NF-1),b," "); print b[1]";"$NF}' blah.txt
sorry, I'm bored.

theNbomr 05-08-2008 10:55 AM

Obligatory perl one-liner offering:
Code:

perl -e 'while(<>){if($_ =~ m/(RC0[0-9]+)\s*Commandline\s*=\s*(.+$)/){ print "$1;$2";}}'
Give input text file as input. Redirect output to file.
--- rod.

colucix 05-08-2008 10:59 AM

I love these multiple contributions using different languages! :)

PS - Waiting for a sed and/or python solution...

ghostdog74 05-09-2008 12:54 AM

Code:

awk '{
 match($0,/RC[0-9]+.*/)
 print substr($0,RSTART)
}' file

Python
Code:

for n,line in enumerate(open("file")):
    line=line.split()
    for m,j in enumerate(line):
        if "RC" in j and j[2:].isdigit():
            print ' '.join(line[m:])


syg00 05-09-2008 08:47 PM

I see a sed contribution is still missing. Try this (note I just stole theNbomr's regex, and told sed to use regex-extended)
Code:

sed -nr 's:.*(RC0[0-9]+)\s*Commandline\s*=\s*(.+$):\1;\2:p' testreg.txt

dfezz1 08-05-2008 03:15 PM

OK OK I am SUCH a newbie that I don't get it. I admit it!

So I need to do something similar, but my example is easier and your answer to my question might help me get the above reply's:

Thank You in advance.

Here is the situation.

Linux RHEL4.6

I want to disable "CTRL-ALT-DEL" in the /etc/inittab

I want to replace:
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

With:
# Changed 8-5-08 -dfezz1 (disabling ctrl-alt-del at console)
ca:12345:ctrlaltdel:/bin/echo "CTRL-ALT-DEL is disabled"


I have tried the simplest SED I know:
$sed 's/replace_please/REPLACED_THX/g' /tmp/dummy
$sed 's/ca::ctrlaltdel:/sbin/shutdown -t3 -r now/ca:12345:ctrlaltdel:/bin/echo "CTRL-ALT-DEL is disabled"/g' /tmp/dummy


As you can tell from my feeble attempt, it didn't work, spaces and quotes seem to be the main reason.
Any help???

PS NO LAUGHING....I HATE TO BE LAUGHED AT :) just joking
Thanks
-dfezz1


My /etc/inittab:
For Ref.
########################################

[root@myserver Project_Server_Files]# cat /etc/inittab
#
# inittab This file describes how the INIT process should set up
# the system in a certain run-level.
#
# Author: Miquel van Smoorenburg, <miquels@drinkel.nl.mugnet.org>
# Modified for RHS Linux by Marc Ewing and Donnie Barnes
#

# Default runlevel. The runlevels used by RHS are:
# 0 - halt (Do NOT set initdefault to this)
# 1 - Single user mode
# 2 - Multiuser, without NFS (The same as 3, if you do not have networking)
# 3 - Full multiuser mode
# 4 - unused
# 5 - X11
# 6 - reboot (Do NOT set initdefault to this)
#
id:3:initdefault:

# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Trap CTRL-ALT-DELETE
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# When our UPS tells us power has failed, assume we have a few minutes
# of power left. Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.
pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled"


# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
########################################################################


All times are GMT -5. The time now is 12:33 PM.