removing lines from file script

iluvatar · 08-18-2004, 03:28 AM

Hello everybody,

I'm writing a script in BASH to clean up some text files, wich went well until this problem: I must remove all lines wich start with "VAKVM" and DON'T have one of the words "RAS","PROD" or "DRC". so imagine this file:

Code:

Hello
Everybody
VAKVM hello
VAKVM there is RAS in this sentence
VAKVM this one stays not
VAKVM PROD=15, DRC=14
PROD without VAKVM at start
And so on
Next
VAKVM please remove
VAKVM thanx PROD

must become this:

Code:

Hello
Everybody
VAKVM there is RAS in this sentence
VAKVM PROD=15, DRC=14
PROD without VAKVM at start
And so on
Next
VAKVM thanx PROD

is the 'sed' command good for this? if so.... how do I do this?

thanx in advance,
.-=~ iluvatar ~=-.

edit: to make it even more clear: I cannot check for RAS first and then DRC 'cause while checking for RAS, a line with DRC could be deleted already... if someone can help me out on this? btw: I wrote the script in BASH but if someone gives me a simple solution in C / C++ or something that's ok too...

iluvatar · 08-18-2004, 05:14 AM

Ok I came up with this:

sed '/RAS/!s/VAKVM.*$//g' test_file

wich blanks the lines with VAKVM at the start and no RAS in the rest of the line, but how can I make sed check for more words in the line?
like this way it doesn't work:

sed '/{RAS|DRC}/!s/VAKVM.*$//g' test_file

anyone has an idea? the blank lines can simply be removed with another sed-command so I'm happy enough if this is gonna work...

thanx and greetz,
.-=~ iluvatar ~=-.

jschiwal · 08-18-2004, 05:29 AM

You could produce a sed script which consists of two lines. One line for a RAS line and one line for a DRC line. Then call sed like:
sed -f RasDrc.sed test_file

SED reads a line of the input file and applies each line of the script to that line before reading the next line. A sed script step which causes the input line to be deleted, causes SED to start on a new input line.

This would make future modifications easier also. The bash script wouldn't need to change, just the SED script.

iluvatar · 08-18-2004, 06:48 AM

Ok I tried it with this script:

Code:

/RAS/!s/VAKVM.*$//g
/DRC/!s/VAKVM.*$//g

this blanks all lines starting with VAKVM except the lines wich have both RAS and DRC... I need the lines with just RAS or DRC too... I guess sed now does this: the first script line looks for RAS, finds it and sed doesn't execute anything, but the second script line will blank the line anyway because DRC is not found...

isn't there some way to match a line like:

Code:

[VAKVM<random character(s)>[RAS | DRC]<random character(s)> | !^VAKVM]

then I can simply use:

Code:

sed -e '/<expression>/!d' test_file

am I right?

thanx,
.-=~ iluvatar ~=-.

iluvatar · 08-18-2004, 06:58 AM

ok now I'm going crazy

I use this test file now wich has every possible situation:

Code:

dit is regel 1
regel 2 RAS
VAKVM hallo
VAKVM RAS nr is
VAKVM dit is met RAS
VAKVM DRC
VAKVM dit is RAS en DRC
en dit is een losse regel

(yes that's dutch)

Now I tried this sed command:

sed -e '/VAKVM.*[RAS|DRC].*$/,/^!VAKVM/!d' test_file

And I got this output:

Code:

VAKVM RAS nr is
VAKVM dit is met RAS
VAKVM DRC
VAKVM dit is RAS en DRC
en dit is een losse regel

as we can see here, the VAKVM line with 'hello' is deleted, wich is correct

the last line is kept too because it doesn't state VAKVM at start, but what happened to the first two lines???

greetz,
.-=~ iluvatar ~=-.

iluvatar · 08-19-2004, 04:49 AM

okay everybody, I solved the problem by simply not using sed. I figured out awk can do the trick more simply:

awk '/^VAKVM.*RAS/ || /^VAKVM.*DRC/ || !/^VAKVM/ {print $0}' test_file

dis simply searches all lines wich have VAKVM with RAS, or VAKVM with DRC, or any line without VAKVM... suits perfect and way more simple than sed...

greetz,
.-=~ iluvatar ~=-.

tonyfreeman · 08-19-2004, 06:39 PM

OK ... I'm a little late but this is some great information! Thanks for the notes!

-- Tony
ps ... this code works also:

Code:

awk '/^VAKVM.*RAS/ || /^VAKVM.*DRC/ || !/^VAKVM/' test_file

iluvatar · 08-20-2004, 01:56 AM

I figured it's always good to post the solutions... I came up with another awk command even more complex. because the file looks actually like this:

Code:

bla
...
bla
VAKVM    31    1   blabla
VAKVM    31    2   blabla and RAS
VAKVM    31    3   blabla
VAKVM    44    1   blabla
VAKVM    44    2   blabla and DRC
VAKVM    44    3   etc.
VAKVM    44    4   last one DRC and RAS
blah
...
blah

as you can see the VAKVM lines are numbered (third column) an those need to be re-numbered after some lines are deleted, therefor, I came up with this:

Code:

awk -F"\t" 'BEGIN{T1="1"; T2="1"} $2=="31" {print $1"\t"$2"\t"T1"\t"$4; T1++};$2=="44" {print $1"\t"$2"\t"T2"\t"$4; T2++}; !/^VAKVM/' test_file

I simply love awk

greetz,
.-=~ iluvatar ~=-.

maxfacta · 08-20-2004, 05:42 AM

I couldn't bear to see so many posts on a problem such as this without a single mention of perl!!

Code:

 perl -ni -e 'print unless /^VAKVM/ and ! /RAS|PROD|DRC/' test_file

The re-numbering is more complicated - I'd like to see that in a one-liner !

iluvatar · 08-20-2004, 05:49 AM

LOL

it's unbelievable to see how many solutions there are in 2 days for a problem you knew nothing about 2 days ago... I don't have any knowledge about perl but if awk can do the line numbering in a one-liner I bet perl can do it too...

greetz,
.-=~ iluvatar ~=-.