LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-10-2011, 11:43 AM   #31
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903

So, would a clean definition of your requirement be 'no branching/looping constructs allowed'? That would make things quite a bit more challenging for most problems. I haven't inherited your background, so I won't pretend to understand how you see that as helpful. I do wonder if it isn't just a bit severe; it certainly limits one of your stated goals, being 'learn Linux'.
Now I'm going to have to actually figure out what the posted sed solution does. 8-(

--- rod.
 
Old 12-11-2011, 11:21 AM   #32
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,099

Original Poster
Rep: Reputation: 288Reputation: 288Reputation: 288
Quote:
Originally Posted by theNbomr View Post
So, would a clean definition of your requirement be 'no branching/looping constructs allowed'?
Let's say "preferred" rather than "required." I may pose a question (such as the first post in this thread) to which I already have a Rexx solution which uses loops. Therefore the question is not, "how can this be done?" It is, "how may this be done with sed or grep?"

Quote:
Originally Posted by theNbomr View Post
That would make things quite a bit more challenging for most problems.
For some problems, anyway. Sometimes I ask for advice thinking "there might be a clever option which does this but I haven't sussed it out of the manual." I *never* post a question without having first made a sincere effort to solve on my own.

Quote:
Originally Posted by theNbomr View Post
... it certainly limits one of your stated goals, being 'learn Linux'.
I've got to start somewhere and have chosen to start by developing a competence with sed, grep, cut, paste, sort, uniq, nl, rev, comm. Not expertise, but competence.


Quote:
Originally Posted by theNbomr View Post
Now I'm going to have to actually figure out what the posted sed solution does.
I've been picking it apart hoping to figure it out but haven't made much progress. sed may be compared to the APL language (part of my distant past) in this respect: the function is impressive, the syntax is daunting, the code is not self-documenting, the learning curve is difficult... but once you master it, coding is fun!
 
Old 12-11-2011, 07:59 PM   #33
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,

I have been very busy and did not find any spare time to deal with explanations. I finally have some time to go into some details of the 'sed' solution. I will split it up first and then rebuild the most important part step by step. I have marked the main part in bold:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+) *([^ \n]*) *(.*\n)\2 +([^ \n]+)/\1 \5 \3 \4/g;tb;s/\n+/\n/gp'
The other parts are not so interesting at the moment. The first part
Code:
:a N;$! ba
simply reads the whole file into its pattern-buffer. The last substitution command
Code:
s/\n+/\n/gp
replaces multiple, consecutive newlines with just one newline. That is because the previous bold part will produce empty lines which we do not want.
Let us now try to understand how the bold part works. We will build it up step by step. Therefor we will use the following simplified data set:
Code:
$ cat simple-file
Janice Flavor
Linda Brown
Janice Taylor
Janice Wafer
Now let us try to identify the first two names:
Code:
sed -nr ':a N;$! ba;:b s/([^ ]+ +[^ \n]+)/|\1|\1/p' simple-file
Notice the brackets. They mark a group that can be back-referenced. That means, whatever pattern will be matched inside this braces will be stored in a *special* buffer. The content of this buffer can be accessed by backreferences, in this case with '\1'. Try the above example to see what is stored inside '\1'. Whatever is stored in '\1' will appear between '|'.
So we see that the RegEx
Code:
([^ ]+ +[^ \n]+)
will match "Janice Flavor" which should, hopefully, be obvious why; I am not sure how deep your sed knowledge is at this point.
The first character-class
Code:
[^ ]+
matches one or more characters that are NOT space. Then it should be followed by at least one (or more) space(s). The next character-class will match at least one or more characters that are NEITHER space NOR newlines. This is important since 'Flavor' is followed by a newline at this point.
So now we have matched 'Janice Flavor'. Our next objective is to somehow identify the *other* Janices and retrieve their second name. Remember what I said about backreferences? Any pattern that is matched inside () is stored in a *special* buffer. You have 9 of those buffers. You can access them with
\n

where n is a number from 1 to 9, e.g. \1 refers to the content inside the first pair of brackets, \2 stores the content of the second pair of braces.
Let us capture 'Janice' in a *special* buffer:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+)/|\1|\2|/p'
As you see, the groups can be nested! The first pair of braces (bold) still holds 'Janice Flavor'. The second pair (italic) holds 'Janice' alone.
Let us refine our RegEx a bit more:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+)\n\2/|\1|\2|/p'
Notice the bold part. Until now we have only used backreferences on the right-hand side of the substitution command. But we can also use it in the left-hand side. Now our RegEx looks for a first and a second name which is followed by a newline and then the first name again. We do not match 'Janice Flavor' anymore because she is followed by 'Linda'. 'Janice Taylor', however, is followed by 'Janice Wafer' on the next line. So our RegEx does match.
When we substitute we do not need the back-reference \2 since 'Janice' is already in \1. It would be nice if we can obtain 'Wafer'. Well, once again we use another group () that we can back-reference:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+)\n\2 +([^ \n]+)/|\1|\3|/p'
After we matched 'Janice' there can be one or more spaces until 'Wafer'. We match 'Wafer' itself by matching any character that is NEITHER a newline NOR a space. We negate space in order to accomodate for possible trailing spaces. Our first pair of braces matches 'Janice Taylor' and the third pair matches 'Wafer'. Those are our substitutes.

Now let us see if we can work around interfering 'Linda'. We want 'Janice Flavor' as our first match. 'Flavor' can be followed any character, which includes 'Linda Brown' and some newlines until we meet 'Janice' again in the third line. So let us add '.*\n' after our first pair of braces to account for that:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+).*\n\2 +([^ \n]+)/|\1|\3|/p' simple-file
It finally gets interesting! Notice, that you do NOT match 'Taylor' with your third group. RegExes are GREEDY. I.e., that '.*\n\2' will look for the longest possible match! And that is
Code:
Linda Brown\nJanice Taylor\nJanice
So the third group will match 'Wafer'. We are getting closer to our goal.
Our next step is to preserve 'Linda' and basically everything that has been matched by '.*\n'. Yes, once more we use another group that we can backreference:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+)(.*\n)\2 +([^ \n]+)/\1 \4 \3/p'
                                           ^ 3. br   ^ 4. br
Notice, that 'Wafer' is now matched by the 4th group and therefore must be back-referenced by \4. \3 holds our previously lost information. I also do not use the '|' on the RHS as a visual aide since they would interfere in the next step if we kept them.
We still need to get 'Taylor' between 'Flavor' and 'Wafer'. Therefor we will extend our RegEx to match 'Janice Flavor' and anything else that follows on that same line:
Code:
sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+) *([^\n]*)(.*\n)\2 +([^ \n]+)/\1 \5 \3 \4/;tb;p'
Two things happen here. We use ' *([^\n]*)' to match anything after 'Janice Flavor'. We are using the '*' quantifier for that which matches zero or more occurences of the pattern. So if 'Janice Flavor' is still alone on the first line the additional pattern will match nothing. When 'Wafer' has been added after the 's' command runs the first time it will match 'Wafer'. Also notice, that our back-references have shifted again.
In order to force the 's' command to execute again we use the conditional jump 't' command. This will jump back to point ':b' only if the previous 's' command has made any changes to the pattern space. If our RegEx does not find any more matches then we are finished and the 't' command does not jump and the print command ('p') will execute and sed will finally exit.
That's basically it. As I said at the beginning of the post, our RegEx produces some empty lines. This can be taken care of by using
Code:
s/\n+/\n/g
before we print the pattern space. There are some minor differences between this solution and the one I provided earlier. This is to account for possible trailing spaces. As it turns out, you also do not need the global flag in the first 's' command.


One final note. My main point in my previous post was:
Don't do it this way.
Use awk instead.
The right tool for the right job can spare you some headache

Since I do like a good brain teaser every now and then I thought of this cumbersome sed solution.
But normally I would not have posted it.

I hope this clears things up a bit.

PS:
Earlier you said that you are doing a sed tutorial but you did not say which one.
To be sure that you are doing the right one, this is the tutorial to start with:
http://www.grymoire.com/Unix/Sed.html

Last edited by crts; 12-12-2011 at 03:29 PM.
 
3 members found this post helpful.
Old 12-12-2011, 09:08 AM   #34
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,099

Original Poster
Rep: Reputation: 288Reputation: 288Reputation: 288
Quote:
Originally Posted by crts View Post
I finally have some time to go into some details of the 'sed' solution...
Wow! Thank you for this detailed breakdown. It illustrates that sed has multiple levels of functionality comparable to the frequently-referenced layers of an onion. The sed you constructed and explained introduce me to layers I'd never known. That's great!

Daniel B. Martin
 
Old 12-12-2011, 02:47 PM   #35
timetraveler
Member
 
Registered: Apr 2010
Posts: 243
Blog Entries: 2

Rep: Reputation: 31
Quote:
Originally Posted by David the H.
But do all of them have it installed by default?
I can't think of one that does not, can you?

Quote:
Originally Posted by David the H.
Can you walk up to any random Linux computer and be certain that your perl script will run on it?
Yes, see above.

Your line of thinking used to apply a long time ago but no longer.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Select lines from FileA based on a key field in FileB danielbmartin Linux - Newbie 2 02-11-2011 11:37 AM
Combining records with same key danielbmartin Linux - Newbie 1 04-04-2010 11:11 PM
[2 internet connections] Combining load balancing and rule based routing TomG22 Linux - Networking 4 05-18-2009 04:50 PM
Combining 2 command lines satimis Programming 6 10-18-2004 09:40 PM
combining multiple dsl lines BaudRacer General 3 01-12-2004 09:15 AM


All times are GMT -5. The time now is 09:41 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration