Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
03-12-2011, 05:12 PM
|
#1
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78
Rep:
|
How to merge this awk and sed codes in a single one?
Hi to all,
I have written this short code:
Code:
### 1) Printing ranges between strings to exclude unwanted lines #####
awk '/^Event/,/^$/{print $1};/Children/,/^$/{print $1};/Relative Event Code/,/^$/{print $1}' input |
#### 2) After print 1st field for only wanted lines, I Remove blank lines and lines with "--.." and "___..."
sed -e 's/^-.*//;s/^_.*$//;/^$/d' | ## I've tried to use sub(/^-.*/,"",$0) to emulate this sed line
#### 3) After remove all unwanted lines, begin to print headers, then merge all lines in a single one, FS=","
awk 'BEGIN{print "Event Code|Children Result|Relative Event Code"}
{
if ( $0~/Event||Children||Relative||.*_.*/ ) # I would like to use an array instead of "$0", I dont know how to
printf("%s,", $0) # load it in rigth way the array and after that, manipulate it.
else
printf(" %s\n", $0)
}' |
#### 4) After step 3, remove unneeded strings and at the same time separate lines corresponding
#### to each "Even Code" block
sed -e 's/^Event,//g;s/,Event,/\n/g;s/,Children,/|/g;s/,Relative,/|/g;s/,$//' | # I would like to replace this too
# with sub() to include within awk code
#### 5) Printing fields separated by "|"
awk -F"|" '{print $1,$2,$3}' OFS="|" > output
Applied to this input:
Code:
______________________________________________________________________________
Start Relative
Event Code State Date? Event? Events?
-------- ------ --------------- ----- ---------
XYZGF$TY101_Procuct01 ON_HOLD Met No No
______________________________________________________________________________
Start Relative
Event Code State Date? Event? Events?
-------- ------ --------------- ----- ---------
XYZGF$TY102_Evecod01 ON_HOLD No Yes Yes
Result: s(PRQ$MAC111_xiib) and s(PRQ$MAC141_code) and
s(PRQ$MAC131_xiib_pres_sol) and s(PRQ$MAC134_pres_areatyp)
Children Result Current State T/F
---------------- -------------- ---
CORRECT(PRQ$MAC131_xiib_pres_sol) CORRECT OK
CORRECT(PRQ$MAC134_pres_areatyp) CORRECT OK
Relative Event Code Result
------------------ ---------
ABC$MAC101_dept_abc CORRECT(XYZGF$TY102_Evecod01)
______________________________________________________________________________
Start Relative
Event Code State Date? Event? Events?
-------- ------ --------------- ----- ---------
ABC$MAC101_dept_abc ON_HOLD No Yes No
Result: s(XYZGF$TY102_Evecod01)
Children Result Current State T/F
---------------- -------------- ---
CORRECT(XYZGF$TY102_Evecod01) ON_HOLD F
The output is:
Code:
Event Code|Children Result|Relative Event Code
XYZGF$TY101_Procuct01||
XYZGF$TY102_Evecod01|CORRECT(PRQ$MAC131_xiib_pres_sol),CORRECT(PRQ$MAC134_pres_areatyp)|ABC$MAC101_dept_abc
ABC$MAC101_dept_abc|CORRECT(XYZGF$TY102_Evecod01)|
The code gives the output I need, but my issue is that I would like/learn, how to join all awk parts in a single code, and use replacing awk functions instead of sed, to include all code in one single awk code.
In the code are some comments for the function of every part of code, and what I've tried to do without success so far, in order to join the code.
May somebody help me with this issue, if the joined code remains almost as the original, it would be better.
Thanks in advance
|
|
|
03-12-2011, 09:12 PM
|
#2
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
"Homework" questions are generally discouraged at LQ, but I did attempt to solve your problem. This is my solution (I tested it, and it seems to work):
Code:
BEGIN {
print "Event Code|Children Result|Relative Event Code"
RS = "[_]+\n" # records are separated at ___... lines
FS = "\n+" # fields are separated on one or more newlines
OFS = "|"
}
{
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
for(i = 1; i <= NF; i += 1) { # loop trhough fields (individual lines)
line = $i
if(line !~ /^[- \t]*$/) { # skip ---... lines and blank lines
gsub(/[ \t]+/, " ", line) # compress whitespace
sub(/^[ ]/, "", line) # remove leading whitespace
split(line, splitline, " ") # split at whitespace
if(line ~ /^Event Code/) {
if(event) {
print event, child_results, rel_code
}
event = child_results = rel_code = ""
read_next = read_event
continue
}
else if(line ~ /^Children Result/) {
read_next = read_child_result
first_child_result = 1 # true
continue
}
else if(line ~ /^Relative Event Code/) {
read_next = read_rel_code
continue
}
else {
if(read_next == read_event) {
event = splitline[1]
read_next = 0
}
if(read_next == read_child_result) {
if(first_child_result) {
child_results = splitline[1]
first_child_result = 0 # false
}
else {
child_results = child_results "," splitline[1]
}
}
if(read_next == read_rel_code) {
rel_code = splitline[1]
read_next = 0
}
}
}
}
}
END {
if(event) {
print event, child_results, rel_code
}
}
|
|
|
03-12-2011, 10:53 PM
|
#3
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,042
|
Here is an alternative that assumes knowledge if other fields in the file:
Code:
#!/usr/bin/awk -f
BEGIN{ print "Event Code|Children Result|Relative Event Code"
RS = "_+\n" #Records separated by continuous underscores
}
/Event/{ #Only interested in records that contain the string 'Event'
rel = child = 0 #Set false for whether or not a child or relative part of record have been read
for(i=1;i<=NF;i++){
test = 0 #Set test to false so nothing is printed unless test is true
if($i ~ /\$/){ #Only interested in fields that contain the dollar symbol (this was inferred from your desired output)
if($(i+1) ~ /\$/){ #If next field also contains a dollar symbol then we are looking at the relative section
test++
rel++
if(!child) #If no child section has been processed, add a preceding pipe to indicate previous field missing
$i = "|"$i
$i = $i"\n" #Always add newline after relative section
}
else if($i ~ /CORRECT\(/ && $(i+1) ~ /^(ON_HOLD|CORRECT)$/){ #Current field contain 'CORRECT(' string and next field contains strings shown, then it is a child section
test++
child++
if($(i+3) ~ /CORRECT/) #If third field from current also contains string 'CORRECT', then there is another child so use comma else pipe
$i = $i","
else
$i = $i"|"
}
else if($(i+1) == "ON_HOLD"){ #Assumes all Events will be 'ON_HOLD', inferred from input file
test++
$i = $i"|"
}
}
if(test)
printf("%s", $i)
}
if(!(rel || child)) #If neither child or rel were entered then print extra pipe to indicate missing fields
print "|"
else if(!rel && child)
print ""
}
Last edited by grail; 03-13-2011 at 03:13 AM.
Reason: To add comments to further explain code
|
|
|
03-13-2011, 01:15 AM
|
#4
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78
Original Poster
Rep:
|
Hi hda7  ,
Thanks for your help and time. Really, believe me, it's not a homework, I'm not a student at all, even a real programmer, only an more or less empiric awk/sed enthusiastic fan. This is another person's question, I'm only trying, with my scarse knowledge of programming, to help that person, like some others have helped me before.
Well, your code looks it works for me either, but I'm little lost  with it, I understand the kind of matrix "transposition" you do in BEGIN statement playing with RS, FS.
May you explain a little bit how it works when you define variables as read_XXXX=Y? why you say =0, =1, =2, =3 to that variables? and how is possible to use below that variables without the part "read_"?
If is not too much  , may you explain how it works one of the if statements? then I'll try to do an analogy to the others :-)
Code:
if(line ~ /^Event Code/) {
if(event) {
print event, child_results, rel_code
}
event = child_results = rel_code = ""
read_next = read_event
continue
Hi grail again :-),
I'm here to learn from you experts and help when I can :-), so, I'm lttle lost  with how it work your code either.
In order to not to be a nuisance, maybe may you explain the part that I feel more complicated, how to merge the lines below the block "Children Result".
Thanks again for all your help.
Best regards
|
|
|
03-13-2011, 07:54 AM
|
#5
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
Quote:
Originally Posted by cgcamal
Hi hda7  ,
Thanks for your help and time. Really, believe me, it's not a homework, I'm not a student at all, even a real programmer, only an more or less empiric awk/sed enthusiastic fan. This is another person's question, I'm only trying, with my scarse knowledge of programming, to help that person, like some others have helped me before.
Well, your code looks it works for me either, but I'm little lost  with it, I understand the kind of matrix "transposition" you do in BEGIN statement playing with RS, FS.
May you explain a little bit how it works when you define variables as read_XXXX=Y? why you say =0, =1, =2, =3 to that variables? and how is possible to use below that variables without the part "read_"?
If is not too much  , may you explain how it works one of the if statements? then I'll try to do an analogy to the others :-)
Code:
if(line ~ /^Event Code/) {
if(event) {
print event, child_results, rel_code
}
event = child_results = rel_code = ""
read_next = read_event
continue
Hi grail again :-),
I'm here to learn from you experts and help when I can :-), so, I'm lttle lost  with how it work your code either.
In order to not to be a nuisance, maybe may you explain the part that I feel more complicated, how to merge the lines below the block "Children Result".
Thanks again for all your help.
Best regards
|
I just fixed some mistakes I made. Here is the new script:
Code:
BEGIN {
print "Event Code|Children Result|Relative Event Code"
RS = "[_]+\n" # records are separated at ___... lines
FS = "\n+" # fields are separated on one or more newlines
OFS = "|"
}
{
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
for(i = 1; i <= NF; i += 1) { # loop through fields (individual lines)
line = $i
if(line !~ /^[- \t]*$/) { # skip ---... lines and blank lines
gsub(/[ \t]+/, " ", line) # compress whitespace
sub(/^[ ]/, "", line) # remove leading whitespace
split(line, splitline, " ") # split at whitespace
if(line ~ /^Event Code/) {
event = child_results = rel_code = ""
read_next = read_event
continue
}
else if(line ~ /^Children Result/) {
read_next = read_child_result
first_child_result = 1 # true
continue
}
else if(line ~ /^Relative Event Code/) {
read_next = read_rel_code
continue
}
else {
if(read_next == read_event) {
event = splitline[1]
read_next = 0
}
if(read_next == read_child_result) {
if(first_child_result) {
child_results = splitline[1]
first_child_result = 0 # false
}
else {
child_results = child_results "," splitline[1]
}
}
if(read_next == read_rel_code) {
rel_code = splitline[1]
read_next = 0
}
}
}
}
if(event) {
print event, child_results, rel_code
}
}
Now for an explanation of
Code:
if(line ~ /^Event Code/) {
event = child_results = rel_code = ""
read_next = read_event
continue
When it finds a line that starts with Event Code, it clears the event, child_results, and rel_code variables (used to hold the event code, child results, and relative event code respectively). Then it sets read_next (which tells my script what its looking for next) to my "constant" read_event, so that it sets the first thing on the next line to the event. Then the "continue" goes to the next loop iteration (and therefor to the next field in the record). It would probably be better (farther down), instead of stopping reading the event after it finds an event, stopping after it encounters a blank line. However, the script currently skips blank lines, so you would need to tweak that first.
Fell free to ask any other questions about my script. I'll try to fix my other mistake too soon.
|
|
|
03-13-2011, 08:11 AM
|
#6
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
My script so far (I have fixed several mistakes):
Code:
BEGIN {
print "Event Code|Children Result|Relative Event Code"
RS = "[_]+\n" # records are separated at ___... lines
FS = "\n" # fields are separated on newlines
OFS = "|"
}
{
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""
for(i = 1; i <= NF; i += 1) { # loop through fields (individual lines)
line = $i
sub(/^[ \t]+/, "", line) # remove leading whitespace
split(line, splitline, " ") # split at whitespace
if(line !~ /^-[- \t]*$/) { # skip ---... lines
if(line ~ /^[ \t]*$/) {
read_next = 0
continue
}
else if(line ~ /^Event Code/) {
read_next = read_event
continue
}
else if(line ~ /^Children Result/) {
read_next = read_child_result
first_child_result = 1 # true
continue
}
else if(line ~ /^Relative Event Code/) {
read_next = read_rel_code
continue
}
else {
if(read_next == read_event) {
event = splitline[1]
}
if(read_next == read_child_result) {
if(first_child_result) {
child_results = splitline[1]
first_child_result = 0 # false
}
else {
child_results = child_results "," splitline[1]
}
}
if(read_next == read_rel_code) {
rel_code = splitline[1]
}
}
}
}
if(event) {
print event, child_results, rel_code
}
}
|
|
|
03-13-2011, 12:19 PM
|
#7
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
To explain how my code merges the child result lines:
When the program encounters a Children Result line, it sets read_next to read_child_result to tell the script that the following lines (up to a blank line) should be added to the child_results variable. Then it sets first_child_result to 1 (true) to let the script know that the next line will be the first child result.
When the script reads the lines following the Children Result line, it first checks to see if first_child_result is true. If it is, it sets the variable child_results to the first thing on the line (child_results = splitline[1]), and then sets first_child_result to 0 (false). If it is not true, the script sets child_results to the concatenation of the current value of child_results, a comma, and the first thing on the line (child_results = child_results "," splitline[1]).
|
|
|
03-13-2011, 05:25 PM
|
#8
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78
Original Poster
Rep:
|
hda7/grail,
Many thanks both for your great and complete help. Your codes work perfect and are great examples of how to use and when to apply for() and if() else... statements. I think I have to run by steps the codes to understand even better the partial abtraction that can be imagine so far reading the codes.
My last question about this is:
-how it's possible to define one variable a use part of its name later? I mean, the defined variable is read_event, and below in the code is used sometimes only event, without "read_".
-What is the function of variables like _varname with the "_" in at the beginning?
Thanks again for all help.
|
|
|
03-13-2011, 07:37 PM
|
#9
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,042
|
You seem to be getting confused on naming conventions as opposed to the actual coding.
In hda7's code all of the following are listed as variables:
Code:
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""
So read_event and event are 2 distinct and different variables.
And using an underscore (_) as a delimiter is just to make the variable names more clear, ie readevent is not as clear as read_event (imo)
Hope that helps.
|
|
|
03-13-2011, 09:18 PM
|
#10
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
Quote:
Originally Posted by grail
You seem to be getting confused on naming conventions as opposed to the actual coding.
In hda7's code all of the following are listed as variables:
Code:
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""
So read_event and event are 2 distinct and different variables.
And using an underscore (_) as a delimiter is just to make the variable names more clear, ie readevent is not as clear as read_event (imo)
Hope that helps.
|
Exactly.
I originally had readevent instead of read_event and so on, but since I couldn't read it easily, I added the underscores.
|
|
|
03-14-2011, 12:59 AM
|
#11
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78
Original Poster
Rep:
|
I'm clear on this now.
Really grateful with both for the help.
Best regards. 
|
|
|
All times are GMT -5. The time now is 04:12 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|