LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-29-2008, 09:33 AM   #1
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Rep: Reputation: 58
Delete a line range with sed


I have a long text file, with such lines:

...
BT
sometextlines
ET
BT
sometextlines
ET
BT
LINETOBEDELETED1
LINETOBEDELETED2
LINETOBEDELETED3
ET
BT
sometextlines
ET
...

I want to delete only these 5 subsequent lines from the above text file:
BT
LINETOBEDELETED1
LINETOBEDELETED2
LINETOBEDELETED3
ET

(Those five subsequent lines can occur more than once in the text file, and all occurrences should be deleted)

Can it be done with sed (on FreeBSD)?

Last edited by J_Szucs; 04-29-2008 at 09:36 AM.
 
Old 04-29-2008, 10:54 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

Could you give a bit more information/clarity?

If I understand correctly: If there are 3 lines between BT and ET they should be deleted (including the BT and ET lines), no matter what these 3 lines are.

- Are these 3 lines 'random' (no uniqueness you can use)?
- Are there entries that have 4 or more lines between BT and ET?
- Must sed be used (I think it can be done in awk, not sure yet)?
- Does freeBSD use gnu awk and/or gnu sed?
 
Old 04-29-2008, 11:05 AM   #3
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
You can parse a multi-line pattern using the N command to retrieve the next line of input and append it to the pattern space, separated by a newline character. Then you can match the multi-line pattern and delete it. An example:
Code:
sed '/BT/{N;N;N;N;/BT\nLINETOBEDELETED1\nLINETOBEDELETED2\nLINETOBEDELETED3\nET/d}' testfile
every time the BT string is encountered, retrieve the next 4 lines. These lines will be appended together and you can match them as they were a single line pattern like this
Code:
BT\nLINETOBEDELETED1\nLINETOBEDELETED2\nLINETOBEDELETED3\nET
Anyway, the above code DOES NOT WORK properly, because every time BT is encountered, sed go on to the next 4 lines of input. So if you have an input like this
Code:
BT
singleline
ET
BT
LINETOBEDELETED1
LINETOBEDELETED2
LINETOBEDELETED3
ET
the lines in red are processed together, then sed proceeds from line "LINETOBEDELETED2" on and miss the "BT" in the fourth line (which is the one we are intersted in).

You can prevent this, by going to the next line only if the current line contains the string to be deleted, that is
Code:
$ sed '
> /BT/{N;
>     /LINETOBEDELETED1/{N;
>         /LINETOBEDELETED2/{N;
>              /LINETOBEDELETED3/{N;
>                   /ET/{/BT\nLINETOBEDELETED1\nLINETOBEDELETED2\nLINETOBEDELETED3\nET/d}
>              }
>         }
>     }
> }' testfile
I splitted the code on multiple lines to let it be more readable.
 
Old 04-29-2008, 11:34 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

@colucix: Nice sed solution! Only downside (also a question I asked in a previous reply): The lines that need deleting need to be unique enough to be used as regexp.

Here's a rough awk script. It will only work if there are 1, 2 or 3 lines between BT and ET (could be expanded to a certain point):
Code:
#!/bin/bash

awk 'BEGIN { RS = "BT" ; FS = "\n" ; OFS = "\n" }
{
  if ( NF == 4 ) {
    print "BT"
    print $2, $3
  }

  if ( NF == 5 ) {
    print "BT"
    print $2, $3, $4
  }

}' infile
A test run:
Code:
$ ./awk.del.range.sh 
BT
01sometextlines
ET
BT
02sometextlines
ET
BT
03sometextlines
ET
BT
04sometextlines
04sometextlines
ET
BT
05sometextlines
05sometextlines
ET
BT
06sometextlines
ET
You can extend it by adding these sections:
Code:
if ( NF == 7 ) {
  print "BT"
  print $2, $3, $4, $5, $6
}
if ( NF == 8 ) {
  print "BT"
  print $2, $3, $4, $5, $6, $7
}
Don't use NF == 6, that targets the section you want to cut out

Hope this helps.
 
Old 04-29-2008, 12:17 PM   #5
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Quote:
Originally Posted by druuna View Post
@colucix: Nice sed solution! Only downside (also a question I asked in a previous reply): The lines that need deleting need to be unique enough to be used as regexp.
Yes, you're right! What about adding "at the beginning" and "at the end" of line to make them unique?
Code:
/^BT$/
 
Old 04-29-2008, 03:30 PM   #6
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Original Poster
Rep: Reputation: 58
Thx for you all!

Now I only have to figure out what to use for the newline character in those expressions for the non-gnu sed or awk. (That's a lot of googling any time I need it, as I am unable to remember that).

Anyway, it was a "serious simplification" when I called the file as a text file. It is actually an uncompressed pdf file that has ascii text and binary parts, too. (My colleagues use a windows pdf editor that puts an unwanted "stamp" (some fixed text) in pdf files it has edited. I would like to help with a web-based tool to remove that stamp. It could be cleanly removed by vi, so maybe a sed called from php would do it, too. (I am too lame to php to do it all in php).

Last edited by J_Szucs; 04-29-2008 at 03:32 PM.
 
Old 04-29-2008, 03:59 PM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

Quote:
Now I only have to figure out what to use for the newline character in those expressions for the non-gnu sed or awk. (That's a lot of googling any time I need it, as I am unable to remember that).
Have you tried \r instead of \n?

Another thing: editing binary files can be 'risky'. Ed will take in account if a file is binary or not (and will, or will not add the newline char). Vi (vim, not sure about the original vi) can be started with the -b option to protect binary files.

I cannot find anything in the sed or awk manuals about it. With sed I don't know (isn't it based on ed?), maybe it works. Awk will probably corrupt your binary file(s).

Bottom line: Be sure to test the tool/solution you want to use on a copy and/or backup the originals first.
 
Old 04-29-2008, 09:33 PM   #8
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by druuna View Post
awk 'BEGIN { RS = "BT" ; FS = "\n" ; OFS = "\n" }
{
if ( NF == 4 ) {
print "BT"
print $2, $3
}

if ( NF == 5 ) {
print "BT"
print $2, $3, $4
}

}' infile[/code]
could save on the if statement
Code:
..
NF==5 {
 print ....
}
NF==4 {
 print ...
}
 
Old 04-29-2008, 11:43 PM   #9
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Original Poster
Rep: Reputation: 58
"Have you tried \r instead of \n?"

None of them works for a non-gnu sed. I will try today how awk "behaves".


"editing binary files can be 'risky'."

You may be right. The fact that the lines to be removed are in an ascii part of the file and that I could remove them cleanly with vi (even without "-b") gives some hopes, but it is still risky. Actually my yesterday's attempts with sed all resulted in a corrupted pdf file. I do not know what was the reason yet, I will see it today.

If it turns out that sed treats the binary parts badly, is there some other command line tool for removal of all occurrences of a given byte sequence from a binary file?

Last edited by J_Szucs; 04-30-2008 at 12:04 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SED - Delete line above or below as well as matching line... OldGaf Programming 7 06-26-2008 11:51 PM
Delete specific Range of lines Using sed , awk, grep etc. joyds219 Linux - Newbie 4 03-28-2008 08:59 AM
Delete a line using sed kushalkoolwal Programming 2 09-29-2007 07:25 PM
using sed to delete line to the right khairilthegreat Linux - Newbie 5 07-28-2007 02:10 PM
trying to delete a line with sed deoren Linux - General 2 01-03-2005 09:26 PM


All times are GMT -5. The time now is 07:18 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration