LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   A sed conundrum! (https://www.linuxquestions.org/questions/linux-software-2/a-sed-conundrum-124140/)

Bert 12-09-2003 04:14 AM

A sed conundrum!
 
Stream Editor is one of those programs that makes you wonder how you survived before it (when you were using other less powerful OSes).

I've tried to get sed to solve this rather unusual problem I have, and as yet I haven't found an efficient solution. Can anyone help out? I'm quite sure there's a answer ...

THE PROBLEM

I have a scorecard, which typically might look like this:

StartScores
Alice 90 points
Barry 12 points
Christabel 23 points
Derrick 17 points
Erica 12 points
Derrick 4 points
Flora 8 points
Barry 12 points
EndScores

Notice 'Barry 12 points' occurs twice! The list of scores above are kept in small text files, and there are lots of them.

Unfortunately, the person taking down the scores made a few mistakes, and has appended some of the files like so:

StartScores
Alice 90 points
Barry 12 points
Christabel 23 points
Derrick 17 points
Erica 12 points
Derrick 4 points
Flora 8 points
Barry 12 points
End Scores
Start Scores
Alice 90 points
Barry 12 points
Christabel 23 points
Derrick 17 points
Erica 12 points
Derrick 4 points
Flora 8 points
Barry 12 points
EndScores

This is highly undesirable, as unless the proprietary computer program looking at the scores suddenly becomes more intelligent than it is, it will assume the appended scores are valid, which they are not.

THE SOLUTION

Well, before we look at the solution, I'll just point out some things about the problem - Barry scored 12 points twice in the original scorecard. This means that doing a:
Code:

sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
(which removes non-consecutive duplicate lines) won't work. It will miss Barry's second 12 point score. Because of greedy matching (?) if you do a sed range command:
Code:

sed '/StartScores/,/EndScores/d'
It will match the last instance of 'EndScores', not the first.

Eeek.

Any ideas anyone? I'm trying to convince people that sed is designed for this problem, so any help would be really appreciated.

druuna 12-09-2003 07:20 AM

I'm not exactly sure what it is you want, I hope that you want to grab the first 'scorecard' and quit after it is printed. If that is the case:

Code:

sed '/End Scores/q' input.file
To show you that it works:

input files contains the following (your example + extra token for checking):

StartScores
1 Alice 90 points
1 Barry 12 points
1 Christabel 23 points
1 Derrick 17 points
1 Erica 12 points
1 Derrick 4 points
1 Flora 8 points
1 Barry 12 points
End Scores
Start Scores
2 Alice 90 points
2 Barry 12 points
2 Christabel 23 points
2 Derrick 17 points
2 Erica 12 points
2 Derrick 4 points
2 Flora 8 points
2 Barry 12 points
EndScores

$ sed '/End Scores/q' input.file
StartScores
1 Alice 90 points
1 Barry 12 points
1 Christabel 23 points
1 Derrick 17 points
1 Erica 12 points
1 Derrick 4 points
1 Flora 8 points
1 Barry 12 points
End Scores

As shown, it will only show the first 'scorecard'.

Bert 12-09-2003 08:38 AM

That looks like it might be the solution - thanks druuna. I'll try it out and let you know here if it worked.

Notice the simplicity of your solution compared to my overly complex attempts!

I keep forgetting that sed works on a stream, not a buffer in memory!


All times are GMT -5. The time now is 10:40 AM.