[SOLVED] How to merge this awk and sed codes in a single one?

cgcamal · 03-12-2011, 05:12 PM

Hi to all,

I have written this short code:

Code:

### 1) Printing ranges between strings to exclude unwanted lines #####
awk  '/^Event/,/^$/{print $1};/Children/,/^$/{print $1};/Relative Event Code/,/^$/{print $1}' input | 

#### 2) After print 1st field for only wanted lines, I Remove blank lines and lines with "--.." and "___..." 
sed -e 's/^-.*//;s/^_.*$//;/^$/d' |  ## I've tried to use sub(/^-.*/,"",$0) to emulate this sed line

#### 3) After remove all unwanted lines, begin to print headers, then merge all lines in a single one, FS=","
awk 'BEGIN{print "Event Code|Children Result|Relative Event Code"}
{
if ( $0~/Event||Children||Relative||.*_.*/ )  # I would like to use an array instead of "$0", I dont know how to
printf("%s,", $0)                  # load it in rigth way the array and after that, manipulate it.
else
printf(" %s\n", $0)
}' | 

#### 4) After step 3, remove unneeded strings and at the same time separate lines corresponding 
####    to each "Even Code" block
sed -e 's/^Event,//g;s/,Event,/\n/g;s/,Children,/|/g;s/,Relative,/|/g;s/,$//' | # I would like to replace this too
                                        # with sub() to include within awk code
#### 5) Printing fields separated by "|"
awk -F"|" '{print $1,$2,$3}' OFS="|"  > output

Applied to this input:

Code:

______________________________________________________________________________

                                                                 Start   Relative
Event Code                         State           Date?     Event?     Events?
--------                         ------        ---------------   -----   ---------
XYZGF$TY101_Procuct01           ON_HOLD       Met               No      No

______________________________________________________________________________

                                                                 Start   Relative
Event Code                         State           Date?     Event?     Events?
--------                         ------        ---------------   -----   ---------
XYZGF$TY102_Evecod01             ON_HOLD       No                Yes     Yes

   Result:  s(PRQ$MAC111_xiib) and s(PRQ$MAC141_code) and
     s(PRQ$MAC131_xiib_pres_sol) and s(PRQ$MAC134_pres_areatyp)

   Children Result                              Current State T/F
   ----------------                              -------------- ---
   CORRECT(PRQ$MAC131_xiib_pres_sol)             CORRECT        OK
   CORRECT(PRQ$MAC134_pres_areatyp)              CORRECT        OK

   Relative Event Code                            Result      
   ------------------                            ---------      
   ABC$MAC101_dept_abc                         CORRECT(XYZGF$TY102_Evecod01) 

______________________________________________________________________________
                                                                 Start   Relative
Event Code                         State           Date?     Event?     Events?
--------                         ------        ---------------   -----   ---------
ABC$MAC101_dept_abc            ON_HOLD       No                Yes     No

   Result:  s(XYZGF$TY102_Evecod01)

   Children Result                              Current State T/F
   ----------------                              -------------- ---
   CORRECT(XYZGF$TY102_Evecod01)                 ON_HOLD        F

The output is:

Code:

Event Code|Children Result|Relative Event Code
XYZGF$TY101_Procuct01||
XYZGF$TY102_Evecod01|CORRECT(PRQ$MAC131_xiib_pres_sol),CORRECT(PRQ$MAC134_pres_areatyp)|ABC$MAC101_dept_abc
ABC$MAC101_dept_abc|CORRECT(XYZGF$TY102_Evecod01)|

The code gives the output I need, but my issue is that I would like/learn, how to join all awk parts in a single code, and use replacing awk functions instead of sed, to include all code in one single awk code.

In the code are some comments for the function of every part of code, and what I've tried to do without success so far, in order to join the code.

May somebody help me with this issue, if the joined code remains almost as the original, it would be better.

Thanks in advance

hda7 · 03-12-2011, 09:12 PM

"Homework" questions are generally discouraged at LQ, but I did attempt to solve your problem. This is my solution (I tested it, and it seems to work):

Code:

BEGIN {
    print "Event Code|Children Result|Relative Event Code"
    RS = "[_]+\n" # records are separated at ___... lines
    FS = "\n+" # fields are separated on one or more newlines
    OFS = "|"
}

{
    read_next = 0
    read_event = 1
    read_child_result = 2
    read_rel_code = 3

    for(i = 1; i <= NF; i += 1) { # loop trhough fields (individual lines)
	line = $i
	if(line !~ /^[- \t]*$/) { # skip ---... lines and blank lines
	    gsub(/[ \t]+/, " ", line) # compress whitespace
	    sub(/^[ ]/, "", line) # remove leading whitespace
	    split(line, splitline, " ") # split at whitespace

	    if(line ~ /^Event Code/) {
		if(event) {
		    print event, child_results, rel_code
		}
		event = child_results = rel_code = ""
		read_next = read_event
		continue
	    }
	    else if(line ~ /^Children Result/) {
		read_next = read_child_result
		first_child_result = 1 # true
		continue
	    }
	    else if(line ~ /^Relative Event Code/) {
		read_next = read_rel_code
		continue
	    }
	    else {
		if(read_next == read_event) {
		    event = splitline[1]
		    read_next = 0
		}
		if(read_next == read_child_result) {
		    if(first_child_result) {
			child_results = splitline[1]
			first_child_result = 0 # false
		    }
		    else {
			child_results = child_results "," splitline[1]
		    }
		}
		if(read_next == read_rel_code) {
		    rel_code = splitline[1]
		    read_next = 0
		}
	    }
	}
    }
}

END {
    if(event) {
	print event, child_results, rel_code
    }
}

grail · 03-12-2011, 10:53 PM

Here is an alternative that assumes knowledge if other fields in the file:

Code:

#!/usr/bin/awk -f

BEGIN{  print "Event Code|Children Result|Relative Event Code"
        RS = "_+\n" #Records separated by continuous underscores
}

/Event/{ #Only interested in records that contain the string 'Event'
    rel = child = 0 #Set false for whether or not a child or relative part of record have been read
    for(i=1;i<=NF;i++){
        test = 0 #Set test to false so nothing is printed unless test is true
        if($i ~ /\$/){ #Only interested in fields that contain the dollar symbol (this was inferred from your desired output)
            if($(i+1) ~ /\$/){ #If next field also contains a dollar symbol then we are looking at the relative section
                test++
                rel++
                if(!child) #If no child section has been processed, add a preceding pipe to indicate previous field missing
                    $i = "|"$i

                $i = $i"\n" #Always add newline after relative section
            }
            else if($i ~ /CORRECT\(/ && $(i+1) ~ /^(ON_HOLD|CORRECT)$/){ #Current field contain 'CORRECT(' string and next field contains strings shown, then it is a child section
                test++
                child++
                if($(i+3) ~ /CORRECT/) #If third field from current also contains string 'CORRECT', then there is another child so use comma else pipe
                    $i = $i","
                else
                    $i = $i"|"
            }
            else if($(i+1) == "ON_HOLD"){ #Assumes all Events will be 'ON_HOLD', inferred from input file
                test++
               $i = $i"|"
            }
        }
        if(test)
            printf("%s", $i)
   }
   if(!(rel || child)) #If neither child or rel were entered then print extra pipe to indicate missing fields
       print "|"
   else if(!rel && child)
       print ""
}

cgcamal · 03-13-2011, 01:15 AM

Hi hda7

,

Thanks for your help and time. Really, believe me, it's not a homework, I'm not a student at all, even a real programmer, only an more or less empiric awk/sed enthusiastic fan. This is another person's question, I'm only trying, with my scarse knowledge of programming, to help that person, like some others have helped me before.

Well, your code looks it works for me either, but I'm little lost

with it, I understand the kind of matrix "transposition" you do in BEGIN statement playing with RS, FS.
May you explain a little bit how it works when you define variables as read_XXXX=Y? why you say =0, =1, =2, =3 to that variables? and how is possible to use below that variables without the part "read_"?

If is not too much

, may you explain how it works one of the if statements? then I'll try to do an analogy to the others :-)

Code:

 if(line ~ /^Event Code/) {
        if(event) {
            print event, child_results, rel_code
        }
        event = child_results = rel_code = ""
        read_next = read_event
        continue

Hi grail again :-),

I'm here to learn from you experts and help when I can :-), so, I'm lttle lost

with how it work your code either.

In order to not to be a nuisance, maybe may you explain the part that I feel more complicated, how to merge the lines below the block "Children Result".

Thanks again for all your help.

Best regards

hda7 · 03-13-2011, 07:54 AM

Quote:

Originally Posted by cgcamal

Hi hda7

,

Thanks for your help and time. Really, believe me, it's not a homework, I'm not a student at all, even a real programmer, only an more or less empiric awk/sed enthusiastic fan. This is another person's question, I'm only trying, with my scarse knowledge of programming, to help that person, like some others have helped me before.

Well, your code looks it works for me either, but I'm little lost

with it, I understand the kind of matrix "transposition" you do in BEGIN statement playing with RS, FS.
May you explain a little bit how it works when you define variables as read_XXXX=Y? why you say =0, =1, =2, =3 to that variables? and how is possible to use below that variables without the part "read_"?

If is not too much

, may you explain how it works one of the if statements? then I'll try to do an analogy to the others :-)

Code:

 if(line ~ /^Event Code/) {
        if(event) {
            print event, child_results, rel_code
        }
        event = child_results = rel_code = ""
        read_next = read_event
        continue

Hi grail again :-),

I'm here to learn from you experts and help when I can :-), so, I'm lttle lost

with how it work your code either.

In order to not to be a nuisance, maybe may you explain the part that I feel more complicated, how to merge the lines below the block "Children Result".

Thanks again for all your help.

Best regards

I just fixed some mistakes I made. Here is the new script:

Code:

BEGIN {
    print "Event Code|Children Result|Relative Event Code"
    RS = "[_]+\n" # records are separated at ___... lines
    FS = "\n+" # fields are separated on one or more newlines
    OFS = "|"
}

{
    read_next = 0
    read_event = 1
    read_child_result = 2
    read_rel_code = 3

    for(i = 1; i <= NF; i += 1) { # loop through fields (individual lines)
	line = $i
	if(line !~ /^[- \t]*$/) { # skip ---... lines and blank lines
	    gsub(/[ \t]+/, " ", line) # compress whitespace
	    sub(/^[ ]/, "", line) # remove leading whitespace
	    split(line, splitline, " ") # split at whitespace

	    if(line ~ /^Event Code/) {
		event = child_results = rel_code = ""
		read_next = read_event
		continue
	    }
	    else if(line ~ /^Children Result/) {
		read_next = read_child_result
		first_child_result = 1 # true
		continue
	    }
	    else if(line ~ /^Relative Event Code/) {
		read_next = read_rel_code
		continue
	    }
	    else {
		if(read_next == read_event) {
		    event = splitline[1]
		    read_next = 0
		}
		if(read_next == read_child_result) {
		    if(first_child_result) {
			child_results = splitline[1]
			first_child_result = 0 # false
		    }
		    else {
			child_results = child_results "," splitline[1]
		    }
		}
		if(read_next == read_rel_code) {
		    rel_code = splitline[1]
		    read_next = 0
		}
	    }
	}
    }
    if(event) {
	print event, child_results, rel_code
    }
}

Now for an explanation of

Code:

if(line ~ /^Event Code/) {
    event = child_results = rel_code = ""
    read_next = read_event
    continue

When it finds a line that starts with Event Code, it clears the event, child_results, and rel_code variables (used to hold the event code, child results, and relative event code respectively). Then it sets read_next (which tells my script what its looking for next) to my "constant" read_event, so that it sets the first thing on the next line to the event. Then the "continue" goes to the next loop iteration (and therefor to the next field in the record). It would probably be better (farther down), instead of stopping reading the event after it finds an event, stopping after it encounters a blank line. However, the script currently skips blank lines, so you would need to tweak that first.

Fell free to ask any other questions about my script. I'll try to fix my other mistake too soon.

hda7 · 03-13-2011, 08:11 AM

My script so far (I have fixed several mistakes):

Code:

BEGIN {
    print "Event Code|Children Result|Relative Event Code"
    RS = "[_]+\n" # records are separated at ___... lines
    FS = "\n" # fields are separated on newlines
    OFS = "|"
}

{
    read_next = 0
    read_event = 1
    read_child_result = 2
    read_rel_code = 3
    event = child_results = rel_code = ""

    for(i = 1; i <= NF; i += 1) { # loop through fields (individual lines)
	line = $i
	sub(/^[ \t]+/, "", line) # remove leading whitespace
	split(line, splitline, " ") # split at whitespace
	if(line !~ /^-[- \t]*$/) { # skip ---... lines
	    if(line ~ /^[ \t]*$/) {
		read_next = 0
		continue
	    }
	    else if(line ~ /^Event Code/) {
		read_next = read_event
		continue
	    }
	    else if(line ~ /^Children Result/) {
		read_next = read_child_result
		first_child_result = 1 # true
		continue
	    }
	    else if(line ~ /^Relative Event Code/) {
		read_next = read_rel_code
		continue
	    }
	    else {
		if(read_next == read_event) {
		    event = splitline[1]
		}
		if(read_next == read_child_result) {
		    if(first_child_result) {
			child_results = splitline[1]
			first_child_result = 0 # false
		    }
		    else {
			child_results = child_results "," splitline[1]
		    }
		}
		if(read_next == read_rel_code) {
		    rel_code = splitline[1]
		}
	    }
	}
    }
    if(event) {
	print event, child_results, rel_code
    }
}

hda7 · 03-13-2011, 12:19 PM

To explain how my code merges the child result lines:

When the program encounters a Children Result line, it sets read_next to read_child_result to tell the script that the following lines (up to a blank line) should be added to the child_results variable. Then it sets first_child_result to 1 (true) to let the script know that the next line will be the first child result.

When the script reads the lines following the Children Result line, it first checks to see if first_child_result is true. If it is, it sets the variable child_results to the first thing on the line (child_results = splitline[1]), and then sets first_child_result to 0 (false). If it is not true, the script sets child_results to the concatenation of the current value of child_results, a comma, and the first thing on the line (child_results = child_results "," splitline[1]).

cgcamal · 03-13-2011, 05:25 PM

hda7/grail,

Many thanks both for your great and complete help. Your codes work perfect and are great examples of how to use and when to apply for() and if() else... statements. I think I have to run by steps the codes to understand even better the partial abtraction that can be imagine so far reading the codes.

My last question about this is:
-how it's possible to define one variable a use part of its name later? I mean, the defined variable is read_event, and below in the code is used sometimes only event, without "read_".

-What is the function of variables like _varname with the "_" in at the beginning?

Thanks again for all help.

grail · 03-13-2011, 07:37 PM

You seem to be getting confused on naming conventions as opposed to the actual coding.

In hda7's code all of the following are listed as variables:

Code:

read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""

So read_event and event are 2 distinct and different variables.

And using an underscore (_) as a delimiter is just to make the variable names more clear, ie readevent is not as clear as read_event (imo)

Hope that helps.

hda7 · 03-13-2011, 09:18 PM

Quote:

Originally Posted by grail

You seem to be getting confused on naming conventions as opposed to the actual coding.

In hda7's code all of the following are listed as variables:

Code:

read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""

So read_event and event are 2 distinct and different variables.

And using an underscore (_) as a delimiter is just to make the variable names more clear, ie readevent is not as clear as read_event (imo)

Hope that helps.

Exactly.

I originally had readevent instead of read_event and so on, but since I couldn't read it easily, I added the underscores.

cgcamal · 03-14-2011, 12:59 AM

I'm clear on this now.

Really grateful with both for the help.

Best regards.