Seeking a clever RegEx for text processing
Eleven days ago LQ newbie r_clark2 initiated a thread titled Need help writing a simple sed script.
Susequently moderator acid_kewpie recognized the post as homework, a violation of forum rules, and locked the thread. No complaint there. Enough time has elapsed to make that homework overdue, so I'd like to exhume the problem. Have a file of this nature: Code:
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300 Code:
Blenheim, Steve:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300 1) Change "FirstName LastName" to "LastName, FirstName". 2) Remove salary number if it ends in 500. I tackled this problem as a learning exercise and developed this sed solution: Code:
sed 's/\([^:]*\) \([^:]*\):/\2, \1:/' $InFile \ Daniel B. Martin |
You can supply more than one editing command to sed: sed -e 'cmd1' -e 'cmd2' $InFile
|
I personally would write a short Perl script to do it. Or perhaps I would use "awk."
Basically, I think that you run into unnecessary problems very quickly when you either (a) "I can name-that-tune in one magical (but entirely unmaintainable...) sed script," and/or (b) "I can do <<anything at all>> in Bash, so there!!" What you categorically need to do, instead, is to locate the tool that will, in one step and with one tool and in a maintainable way, get you from start to finish. The solution needs to be readable, and, when (inevitably...) a change to the requirement surfaces, it needs to be possible to very quickly and reliably add support for that change without having to reconstruct it. The solution should not be "chicken scratches," but so many in-production cases are exactly that. ("It works, but you dare don't touch it, or even look at it sideways!") awk is a tool that was designed for this sort of thing, and the entire Perl language was originally an off-shoot of that. An awk script, in its simplest form, simply consists of a series of regular-expressions, but it has a programming-language element to it also. Perl has rightly been called the Swiss ArmyŽ Knife of data processing. And both of these power-tools are no doubt right now at your beck-and-call, and will be, anywhere your solution might need to be deployed. |
Quote:
Code:
sed 's/ /\:/' $InFile \ I am learning awk and sed and Linux programming in general. I tackled this problem as a learning exercise -- that's the reason for working on two solutions. Learn by doing. So the question remains on the table: is there a clever RegEx which can do the whole job? Daniel B. Martin |
Quote:
Code:
# FirstName = \1, LastName = \2, MiddleFields = \3, Code:
sed 's/\([^:]*\) \([^:]*\):/\2, \1:/; s/\(.*\)\(:[0-9]*500$\)/\1/' $InFile Code:
awk -F: -vOFS=: '{ |
Hi.
This looks a bit shorter: Code:
$ sed -r 's/([^:]*) ([^: ]*):/\2, \1:/; s/:[0-9]*500$//' infile |
Thank you, ntubski and firstfire, for your valued input. We still don't have a solution using a single RegEx... but there's no sense in beating one's brains out to construct one complex RegEx when two simpler ones do the job nicely. This thread is solved!
Daniel B. Martin |
Quote:
|
Yes, ntubski already gave single-regex solution. Here is another one
Code:
$ sed -r 's=([^:]*) ([^: ]*)(:.*/..)(:[0-9]*500)?$=\2, \1\3=' in EDIT: Now I see, it did not work (text in red). EDIT: Using perl's non-greedy (*?) pattern: Code:
$ perl -pe 's/([^:]*) ([^: ]*):(.*?)(:[0-9]*500)?$/\2, \1:\3/' infile Code:
$ perl -pe 's/(.*?) (\w*):(.*?)(:[0-9]*500)?$/\2, \1:\3/' in |
Quote:
Code:
% sed -r 's=([^:]*) ([^: ]*)(:.*/..)(:[0-9]*500$)?=\2, \1\3=' people.txt |
Quote:
|
Quote:
Daniel B. Martin |
Awk:
Code:
awk -F: '{$1 = gensub(/(.*) (.*)/,"\\2, \\1","1",$1)}$NF ~ /500$/{NF = NF - 1}1' OFS=":" file Code:
ruby -F: -ape '$_ = [$F[0].scan(/(.*) (.*)/)[0].reverse.join(", "),$F[1..($F[-1]=~/500$/?-2:-1)]].join(":").chomp + "\n"' file |
All times are GMT -5. The time now is 11:18 PM. |