LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-07-2008, 02:49 AM   #1
climber75
LQ Newbie
 
Registered: May 2008
Posts: 3

Rep: Reputation: 0
Question Modify a text files with awk/sed/perl


Hi,

I have a huge text file where I have to filter different command lines which starts everytime with the same characters.

So here's a litte example

------------------------------
This is the first line with no information
This is the second important line with the following Code in the middle RC0xxx Command = Important code"

This is another line with no important information
This is another line with no important information

This is a further important line in the middle with RC0xxx Command = Important code"

..
...
....
and so on

------------------------------------
From these code "RC0xxx Command = Important code" I need the following structure:

RC0xxx;Important Code
RC0xxx;Important Code


This output can be written to a separat file. I also have to take care that I have no doubles inside the new files.

It's equal whether it's written in shell/perl/awk/sed or mixed code.

Thanks for your help :-)
climber75

Last edited by climber75; 05-07-2008 at 02:50 AM.
 
Old 05-07-2008, 03:22 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
What have you tried so far? Anyway, here is a little awk code:
Code:
/RC0xxx Command = / {
       ix = index($0,"RC0xxx Command = ")
       st = substr($0,ix + 17)
       split(st,array,"\"")
       printf "RC0xxx;%s\n",array[1]
}
First it checks the index at which the string "RC0xxx Command = " starts, then extracts the content of the line after the string "RC0xxx Command = ", splits it using the double quote as separator and finally prints out the requested line. I have considered the double quotes at the end of the command always there, otherwise you have to find some other criteria to establish where the command terminates.
 
Old 05-07-2008, 04:47 AM   #3
climber75
LQ Newbie
 
Registered: May 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks for your really quick answer!!! I tried it with awk but I'm not so familiar with it. So I'm amused that you use it :-)

Maybe I've to post a view lines of the real text file so it's getting clearer what
I want.

-------------------------------------------------------------------------------------------------------------------

COMMANDLINE CHANGE FROM APPROVED LIST
patchinstall.exe /g:144 /n /z:s /f /c:90 /p /t:30 /m:"patchauthorize.xml" changed to PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
Microsoft Update - Mandatory November - Ran at - 21.11.2006 11:29:00 Program Name = Microsoft Update - Mandatory November ID = RC020376 Commandline = PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"

Ending process 21.11.2006 11:30:01

21.11.2006 13:00:01

NEW ADVERTISEMENT
LAP - Ran at - 13.01.2007 18:00:00 Program Name = LAP ID = RC020232 Commandline = wscript.exe ResetPassword.vbs Current
NEW ADVERTISEMENT
RC_RealPlayer_10.5 - Ran at - 13.01.2007 18:00:00 Program Name = Remove_RC_RealPlayer_8.0 ID = RC020382 Commandline = wscript.exe Remove_Legacy_Realplayer_SMS.vbs
NEW ADVERTISEMENT
RC_RealPlayer_10.5 - Ran at - 13.01.2007 18:00:00 Program Name = RC_Real_Player_10.5_Upgrade_3 ID = RC020383 Commandline = wscript.exe \\ccanet\approot\installs\apps\RC_RealPlayer_10.5\RC_RealPlayer_10.5_push_2.vbs
NEW ADVERTISEMENT
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 ID = RC020384 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
NEW ADVERTISEMENT
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 LoggedOff ID = RC020385 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"

Ending process 13.01.2007 19:00:02

NEW ADVERTISEMENT
Microsoft Update - Critical January 2007 - Ran at - 13.01.2007 18:00:00 Program Name = Microsoft Update - Critical January 2007 LoggedOff ID = RC020385 Commandline = PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"
13.01.2007 20:30:00


Ending process 13.01.2007 20:30:00

13.01.2007 22:00:00
...
...
and so on

------------------------------------------------------------------------------------------------------------


Here's the output how it should be:

RC020376;PatchInstall.exe /g:168 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
RC020232;wscript.exe ResetPassword.vbs Current
RC020382;wscript.exe Remove_Legacy_Realplayer_SMS.vbs
RC020383;wscript.exe \\ccanet\approot\installs\apps\RC_RealPlayer_10.5\RC_RealPlayer_10.5_push_2.vbs
RC020384;PatchInstall.exe /g:312 /n /z:s /f /c:90 /p /t:30 /m:"PatchAuthorize.xml"
RC020385;PatchInstall.exe /g:312 /n /z:s /f /c:5 /t:30 /m:"PatchAuthorize.xml"



All informations have the following same:
Begin with RC0 and end with CRLF (Carriage Return, Line Feed)

I hope this helps

Thanks
climber75

Last edited by climber75; 05-07-2008 at 04:54 AM.
 
Old 05-07-2008, 05:37 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Looks pretty easy to parse with any of them if, as appears likely, the wanted data extends to eol.
What have you tried ??? - better to get help with specific problems than to expect others to do your work for you.
 
Old 05-07-2008, 06:59 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by climber75 View Post
I tried it with awk but I'm not so familiar with it.
You cited awk among your options, I thought you were familiar with it. What scripting language are you used to? And what have you tried till now?
 
Old 05-07-2008, 07:05 AM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
You can use SED to strip out the desired patterns.....something like this:
sed -n 's/.*\(pattern\)/\1/p' filename > newfilename

To this you simply add another SED command (using -e) to replace the "Commandline = " with ";"

Alternatively, it call all be done inside one SED "s" command.

Really good SEd tutorial here: http://www.grymoire.com/Unix/Sed.html
 
Old 05-07-2008, 07:56 AM   #7
climber75
LQ Newbie
 
Registered: May 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Normally I use shell scripting but in this case I thought awk would be helpful. So I bought a awk book
and go step by step. I still prefer awk to solve this problem because of the learning effect.

So If you have already time to help me this would be nice!

Why this script:
Every RCxxx entry of this textfile cause a e-mail as long as I put it in the demanded format. So we are everytime informed of automatic installations which happens in the background.


Regards
climber75
 
Old 05-07-2008, 11:25 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
A very good awk book is the official guide, here. Anyway, following my previous post, you can try something like
Code:
/RC0..... Commandline =/ {
       ix = index($0,"RC0")
       rc = substr($0,ix,8)
       st = substr($0,ix+23)
       printf "%s;%s\n",rc,st
}
You can try to understand by yourself what exactly this code do (indeed, it's not difficult at all). There are other ways to do the same thing in awk, for example by parsing fields instead of extracting substrings, but I prefer this method. Cheers!
 
Old 05-07-2008, 06:29 PM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Perl:
Code:
#!/usr/bin/perl -w

use strict;             # Enforce declarations

my (
    @arr1, $file, $rec, $var2, $var3
   );

$file="test.txt";
open( TXT_FILE, "<$file" ) or
            die "Can't open txt file: $file: $!\n";
@arr1 = <TXT_FILE>;
# remove newline endings
chomp(@arr1);
close(TXT_FILE) or
            die "Can't close txt file: $file: $!\n";
for $rec (@arr1)
{
    if( $rec =~ /RC[0-9]{6}/ )
    {
        # Split rec on '=', get fields we want
        ($var2, $var3) = (split(/=/, $rec))[2,3];

        # Remove unwanted string, replace with ';'
        $var2 =~ s/ Commandline /;/;
        # remove leading spaces
        $var2 =~ s/^\s+//;
        $var3 =~ s/^\s+//;

        # concat & print to stdout
        print "${var2}${var3}\n";
    }
}
Assumes field layout is constant as implied by examples above.

See http://perldoc.perl.org/ f you want more perl explanations, or ask again here.
 
Old 05-07-2008, 11:33 PM   #10
Markcore
LQ Newbie
 
Registered: May 2006
Location: Vancouver, bc
Distribution: Slackware 11.0
Posts: 9

Rep: Reputation: 0
a way in bash

Code:
#!/bin/bash

while read line; do
        filt="${line#*ID = }"
        if [ "${filt}" = "${line}" ]; then
                continue;
        fi
        id="${filt%% *}"
        code="${filt#*Commandline = }"
        echo "${id};${code}"

done < blah.txt
or i guess...
Code:
#!/bin/bash
while read line; do
        filt="${line#*ID = }"
        [ "${filt}" = "${line}" ] && continue;
        echo "${filt%% *};${filt#*Commandline = }"
done < blah.txt
or i suppose
Code:
#!/bin/bash

while read line; do
        filt="${line#*ID = }"
        [ "${filt}" != "${line}" ] && echo "${filt%% *};${filt#*Commandline = }"
done < blah.txt
Code:
awk -F"=" '(NF) && $(NF-1)~/^ RC0/ {split($(NF-1),b," "); print b[1]";"$NF}' blah.txt
sorry, I'm bored.

Last edited by Markcore; 05-08-2008 at 03:31 AM.
 
Old 05-08-2008, 10:55 AM   #11
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Obligatory perl one-liner offering:
Code:
perl -e 'while(<>){if($_ =~ m/(RC0[0-9]+)\s*Commandline\s*=\s*(.+$)/){ print "$1;$2";}}'
Give input text file as input. Redirect output to file.
--- rod.
 
Old 05-08-2008, 10:59 AM   #12
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
I love these multiple contributions using different languages!

PS - Waiting for a sed and/or python solution...
 
Old 05-09-2008, 12:54 AM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
awk '{
 match($0,/RC[0-9]+.*/)
 print substr($0,RSTART)
}' file
Python
Code:
for n,line in enumerate(open("file")):
    line=line.split()
    for m,j in enumerate(line):
        if "RC" in j and j[2:].isdigit():
            print ' '.join(line[m:])
 
Old 05-09-2008, 08:47 PM   #14
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
I see a sed contribution is still missing. Try this (note I just stole theNbomr's regex, and told sed to use regex-extended)
Code:
sed -nr 's:.*(RC0[0-9]+)\s*Commandline\s*=\s*(.+$):\1;\2:p' testreg.txt
 
Old 08-05-2008, 03:15 PM   #15
dfezz1
LQ Newbie
 
Registered: Mar 2008
Posts: 11

Rep: Reputation: 0
OK OK I am SUCH a newbie that I don't get it. I admit it!

So I need to do something similar, but my example is easier and your answer to my question might help me get the above reply's:

Thank You in advance.

Here is the situation.

Linux RHEL4.6

I want to disable "CTRL-ALT-DEL" in the /etc/inittab

I want to replace:
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

With:
# Changed 8-5-08 -dfezz1 (disabling ctrl-alt-del at console)
ca:12345:ctrlaltdel:/bin/echo "CTRL-ALT-DEL is disabled"


I have tried the simplest SED I know:
$sed 's/replace_please/REPLACED_THX/g' /tmp/dummy
$sed 's/ca::ctrlaltdel:/sbin/shutdown -t3 -r now/ca:12345:ctrlaltdel:/bin/echo "CTRL-ALT-DEL is disabled"/g' /tmp/dummy


As you can tell from my feeble attempt, it didn't work, spaces and quotes seem to be the main reason.
Any help???

PS NO LAUGHING....I HATE TO BE LAUGHED AT just joking
Thanks
-dfezz1


My /etc/inittab:
For Ref.
########################################

[root@myserver Project_Server_Files]# cat /etc/inittab
#
# inittab This file describes how the INIT process should set up
# the system in a certain run-level.
#
# Author: Miquel van Smoorenburg, <miquels@drinkel.nl.mugnet.org>
# Modified for RHS Linux by Marc Ewing and Donnie Barnes
#

# Default runlevel. The runlevels used by RHS are:
# 0 - halt (Do NOT set initdefault to this)
# 1 - Single user mode
# 2 - Multiuser, without NFS (The same as 3, if you do not have networking)
# 3 - Full multiuser mode
# 4 - unused
# 5 - X11
# 6 - reboot (Do NOT set initdefault to this)
#
id:3:initdefault:

# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Trap CTRL-ALT-DELETE
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# When our UPS tells us power has failed, assume we have a few minutes
# of power left. Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.
pf:owerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345owerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled"


# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
########################################################################
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Text replacement question: sed/awk/perl whatever BigRedBall Programming 6 02-05-2008 11:53 AM
Replacing text on specific lines with sed or awk? Lantzvillian Linux - Newbie 5 10-17-2007 09:00 AM
awk/sed to grep the text ahpin Linux - Software 3 10-17-2007 12:34 AM
SED, AWK or PERL HELP embsupafly Programming 6 08-20-2005 09:07 PM
Help with a script to edit text file (awk? sed?) rickh Linux - Newbie 8 04-21-2005 08:24 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration