LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 12-09-2003, 05:14 AM   #1
Bert
Senior Member
 
Registered: Jul 2001
Location: 406292E 290755N
Distribution: GNU/Linux Slackware 8.1, Redhat 8.0, LFS 4.0
Posts: 1,004

Rep: Reputation: 46
A sed conundrum!


Stream Editor is one of those programs that makes you wonder how you survived before it (when you were using other less powerful OSes).

I've tried to get sed to solve this rather unusual problem I have, and as yet I haven't found an efficient solution. Can anyone help out? I'm quite sure there's a answer ...

THE PROBLEM

I have a scorecard, which typically might look like this:

StartScores
Alice 90 points
Barry 12 points
Christabel 23 points
Derrick 17 points
Erica 12 points
Derrick 4 points
Flora 8 points
Barry 12 points
EndScores

Notice 'Barry 12 points' occurs twice! The list of scores above are kept in small text files, and there are lots of them.

Unfortunately, the person taking down the scores made a few mistakes, and has appended some of the files like so:

StartScores
Alice 90 points
Barry 12 points
Christabel 23 points
Derrick 17 points
Erica 12 points
Derrick 4 points
Flora 8 points
Barry 12 points
End Scores
Start Scores
Alice 90 points
Barry 12 points
Christabel 23 points
Derrick 17 points
Erica 12 points
Derrick 4 points
Flora 8 points
Barry 12 points
EndScores

This is highly undesirable, as unless the proprietary computer program looking at the scores suddenly becomes more intelligent than it is, it will assume the appended scores are valid, which they are not.

THE SOLUTION

Well, before we look at the solution, I'll just point out some things about the problem - Barry scored 12 points twice in the original scorecard. This means that doing a:
Code:
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
(which removes non-consecutive duplicate lines) won't work. It will miss Barry's second 12 point score. Because of greedy matching (?) if you do a sed range command:
Code:
sed '/StartScores/,/EndScores/d'
It will match the last instance of 'EndScores', not the first.

Eeek.

Any ideas anyone? I'm trying to convince people that sed is designed for this problem, so any help would be really appreciated.

Last edited by Bert; 12-09-2003 at 05:15 AM.
 
Old 12-09-2003, 08:20 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
I'm not exactly sure what it is you want, I hope that you want to grab the first 'scorecard' and quit after it is printed. If that is the case:

Code:
sed '/End Scores/q' input.file
To show you that it works:

input files contains the following (your example + extra token for checking):

StartScores
1 Alice 90 points
1 Barry 12 points
1 Christabel 23 points
1 Derrick 17 points
1 Erica 12 points
1 Derrick 4 points
1 Flora 8 points
1 Barry 12 points
End Scores
Start Scores
2 Alice 90 points
2 Barry 12 points
2 Christabel 23 points
2 Derrick 17 points
2 Erica 12 points
2 Derrick 4 points
2 Flora 8 points
2 Barry 12 points
EndScores

$ sed '/End Scores/q' input.file
StartScores
1 Alice 90 points
1 Barry 12 points
1 Christabel 23 points
1 Derrick 17 points
1 Erica 12 points
1 Derrick 4 points
1 Flora 8 points
1 Barry 12 points
End Scores

As shown, it will only show the first 'scorecard'.
 
Old 12-09-2003, 09:38 AM   #3
Bert
Senior Member
 
Registered: Jul 2001
Location: 406292E 290755N
Distribution: GNU/Linux Slackware 8.1, Redhat 8.0, LFS 4.0
Posts: 1,004

Original Poster
Rep: Reputation: 46
That looks like it might be the solution - thanks druuna. I'll try it out and let you know here if it worked.

Notice the simplicity of your solution compared to my overly complex attempts!

I keep forgetting that sed works on a stream, not a buffer in memory!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sendmail conundrum! siepmann Linux - Networking 1 07-02-2005 10:17 AM
SCSI aha152x conundrum corso64 Linux - Newbie 1 05-04-2004 05:54 AM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 07:12 AM
@Home Conundrum. bzzt-whir Linux - Networking 0 02-18-2002 02:08 PM
Quite the conundrum...... Randall Linux - Newbie 10 11-09-2001 06:38 PM


All times are GMT -5. The time now is 04:04 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration