LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-12-2011, 05:12 PM   #1
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Rep: Reputation: 16
How to merge this awk and sed codes in a single one?


Hi to all,

I have written this short code:
Code:
### 1) Printing ranges between strings to exclude unwanted lines #####
awk  '/^Event/,/^$/{print $1};/Children/,/^$/{print $1};/Relative Event Code/,/^$/{print $1}' input | 

#### 2) After print 1st field for only wanted lines, I Remove blank lines and lines with "--.." and "___..." 
sed -e 's/^-.*//;s/^_.*$//;/^$/d' |  ## I've tried to use sub(/^-.*/,"",$0) to emulate this sed line

#### 3) After remove all unwanted lines, begin to print headers, then merge all lines in a single one, FS=","
awk 'BEGIN{print "Event Code|Children Result|Relative Event Code"}
{
if ( $0~/Event||Children||Relative||.*_.*/ )  # I would like to use an array instead of "$0", I dont know how to
printf("%s,", $0)                  # load it in rigth way the array and after that, manipulate it.
else
printf(" %s\n", $0)
}' | 

#### 4) After step 3, remove unneeded strings and at the same time separate lines corresponding 
####    to each "Even Code" block
sed -e 's/^Event,//g;s/,Event,/\n/g;s/,Children,/|/g;s/,Relative,/|/g;s/,$//' | # I would like to replace this too
                                        # with sub() to include within awk code
#### 5) Printing fields separated by "|"
awk -F"|" '{print $1,$2,$3}' OFS="|"  > output
Applied to this input:
Code:
______________________________________________________________________________

                                                                 Start   Relative
Event Code                         State           Date?     Event?     Events?
--------                         ------        ---------------   -----   ---------
XYZGF$TY101_Procuct01           ON_HOLD       Met               No      No

______________________________________________________________________________

                                                                 Start   Relative
Event Code                         State           Date?     Event?     Events?
--------                         ------        ---------------   -----   ---------
XYZGF$TY102_Evecod01             ON_HOLD       No                Yes     Yes

   Result:  s(PRQ$MAC111_xiib) and s(PRQ$MAC141_code) and
     s(PRQ$MAC131_xiib_pres_sol) and s(PRQ$MAC134_pres_areatyp)

   Children Result                              Current State T/F
   ----------------                              -------------- ---
   CORRECT(PRQ$MAC131_xiib_pres_sol)             CORRECT        OK
   CORRECT(PRQ$MAC134_pres_areatyp)              CORRECT        OK

   Relative Event Code                            Result      
   ------------------                            ---------      
   ABC$MAC101_dept_abc                         CORRECT(XYZGF$TY102_Evecod01) 

______________________________________________________________________________
                                                                 Start   Relative
Event Code                         State           Date?     Event?     Events?
--------                         ------        ---------------   -----   ---------
ABC$MAC101_dept_abc            ON_HOLD       No                Yes     No

   Result:  s(XYZGF$TY102_Evecod01)

   Children Result                              Current State T/F
   ----------------                              -------------- ---
   CORRECT(XYZGF$TY102_Evecod01)                 ON_HOLD        F
The output is:
Code:
Event Code|Children Result|Relative Event Code
XYZGF$TY101_Procuct01||
XYZGF$TY102_Evecod01|CORRECT(PRQ$MAC131_xiib_pres_sol),CORRECT(PRQ$MAC134_pres_areatyp)|ABC$MAC101_dept_abc
ABC$MAC101_dept_abc|CORRECT(XYZGF$TY102_Evecod01)|
The code gives the output I need, but my issue is that I would like/learn, how to join all awk parts in a single code, and use replacing awk functions instead of sed, to include all code in one single awk code.

In the code are some comments for the function of every part of code, and what I've tried to do without success so far, in order to join the code.

May somebody help me with this issue, if the joined code remains almost as the original, it would be better.

Thanks in advance
 
Old 03-12-2011, 09:12 PM   #2
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 252

Rep: Reputation: 31
"Homework" questions are generally discouraged at LQ, but I did attempt to solve your problem. This is my solution (I tested it, and it seems to work):
Code:
BEGIN {
    print "Event Code|Children Result|Relative Event Code"
    RS = "[_]+\n" # records are separated at ___... lines
    FS = "\n+" # fields are separated on one or more newlines
    OFS = "|"
}

{
    read_next = 0
    read_event = 1
    read_child_result = 2
    read_rel_code = 3

    for(i = 1; i <= NF; i += 1) { # loop trhough fields (individual lines)
	line = $i
	if(line !~ /^[- \t]*$/) { # skip ---... lines and blank lines
	    gsub(/[ \t]+/, " ", line) # compress whitespace
	    sub(/^[ ]/, "", line) # remove leading whitespace
	    split(line, splitline, " ") # split at whitespace

	    if(line ~ /^Event Code/) {
		if(event) {
		    print event, child_results, rel_code
		}
		event = child_results = rel_code = ""
		read_next = read_event
		continue
	    }
	    else if(line ~ /^Children Result/) {
		read_next = read_child_result
		first_child_result = 1 # true
		continue
	    }
	    else if(line ~ /^Relative Event Code/) {
		read_next = read_rel_code
		continue
	    }
	    else {
		if(read_next == read_event) {
		    event = splitline[1]
		    read_next = 0
		}
		if(read_next == read_child_result) {
		    if(first_child_result) {
			child_results = splitline[1]
			first_child_result = 0 # false
		    }
		    else {
			child_results = child_results "," splitline[1]
		    }
		}
		if(read_next == read_rel_code) {
		    rel_code = splitline[1]
		    read_next = 0
		}
	    }
	}
    }
}

END {
    if(event) {
	print event, child_results, rel_code
    }
}
 
Old 03-12-2011, 10:53 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Here is an alternative that assumes knowledge if other fields in the file:
Code:
#!/usr/bin/awk -f

BEGIN{  print "Event Code|Children Result|Relative Event Code"
        RS = "_+\n" #Records separated by continuous underscores
}

/Event/{ #Only interested in records that contain the string 'Event'
    rel = child = 0 #Set false for whether or not a child or relative part of record have been read
    for(i=1;i<=NF;i++){
        test = 0 #Set test to false so nothing is printed unless test is true
        if($i ~ /\$/){ #Only interested in fields that contain the dollar symbol (this was inferred from your desired output)
            if($(i+1) ~ /\$/){ #If next field also contains a dollar symbol then we are looking at the relative section
                test++
                rel++
                if(!child) #If no child section has been processed, add a preceding pipe to indicate previous field missing
                    $i = "|"$i

                $i = $i"\n" #Always add newline after relative section
            }
            else if($i ~ /CORRECT\(/ && $(i+1) ~ /^(ON_HOLD|CORRECT)$/){ #Current field contain 'CORRECT(' string and next field contains strings shown, then it is a child section
                test++
                child++
                if($(i+3) ~ /CORRECT/) #If third field from current also contains string 'CORRECT', then there is another child so use comma else pipe
                    $i = $i","
                else
                    $i = $i"|"
            }
            else if($(i+1) == "ON_HOLD"){ #Assumes all Events will be 'ON_HOLD', inferred from input file
                test++
               $i = $i"|"
            }
        }
        if(test)
            printf("%s", $i)
   }
   if(!(rel || child)) #If neither child or rel were entered then print extra pipe to indicate missing fields
       print "|"
   else if(!rel && child)
       print ""
}

Last edited by grail; 03-13-2011 at 03:13 AM. Reason: To add comments to further explain code
 
Old 03-13-2011, 01:15 AM   #4
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Hi hda7,

Thanks for your help and time. Really, believe me, it's not a homework, I'm not a student at all, even a real programmer, only an more or less empiric awk/sed enthusiastic fan. This is another person's question, I'm only trying, with my scarse knowledge of programming, to help that person, like some others have helped me before.

Well, your code looks it works for me either, but I'm little lost with it, I understand the kind of matrix "transposition" you do in BEGIN statement playing with RS, FS.
May you explain a little bit how it works when you define variables as read_XXXX=Y? why you say =0, =1, =2, =3 to that variables? and how is possible to use below that variables without the part "read_"?

If is not too much, may you explain how it works one of the if statements? then I'll try to do an analogy to the others :-)
Code:
 if(line ~ /^Event Code/) {
        if(event) {
            print event, child_results, rel_code
        }
        event = child_results = rel_code = ""
        read_next = read_event
        continue
Hi grail again :-),

I'm here to learn from you experts and help when I can :-), so, I'm lttle lost with how it work your code either.

In order to not to be a nuisance, maybe may you explain the part that I feel more complicated, how to merge the lines below the block "Children Result".

Thanks again for all your help.

Best regards
 
Old 03-13-2011, 07:54 AM   #5
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 252

Rep: Reputation: 31
Quote:
Originally Posted by cgcamal View Post
Hi hda7,

Thanks for your help and time. Really, believe me, it's not a homework, I'm not a student at all, even a real programmer, only an more or less empiric awk/sed enthusiastic fan. This is another person's question, I'm only trying, with my scarse knowledge of programming, to help that person, like some others have helped me before.

Well, your code looks it works for me either, but I'm little lost with it, I understand the kind of matrix "transposition" you do in BEGIN statement playing with RS, FS.
May you explain a little bit how it works when you define variables as read_XXXX=Y? why you say =0, =1, =2, =3 to that variables? and how is possible to use below that variables without the part "read_"?

If is not too much, may you explain how it works one of the if statements? then I'll try to do an analogy to the others :-)
Code:
 if(line ~ /^Event Code/) {
        if(event) {
            print event, child_results, rel_code
        }
        event = child_results = rel_code = ""
        read_next = read_event
        continue
Hi grail again :-),

I'm here to learn from you experts and help when I can :-), so, I'm lttle lost with how it work your code either.

In order to not to be a nuisance, maybe may you explain the part that I feel more complicated, how to merge the lines below the block "Children Result".

Thanks again for all your help.

Best regards
I just fixed some mistakes I made. Here is the new script:
Code:
BEGIN {
    print "Event Code|Children Result|Relative Event Code"
    RS = "[_]+\n" # records are separated at ___... lines
    FS = "\n+" # fields are separated on one or more newlines
    OFS = "|"
}

{
    read_next = 0
    read_event = 1
    read_child_result = 2
    read_rel_code = 3

    for(i = 1; i <= NF; i += 1) { # loop through fields (individual lines)
	line = $i
	if(line !~ /^[- \t]*$/) { # skip ---... lines and blank lines
	    gsub(/[ \t]+/, " ", line) # compress whitespace
	    sub(/^[ ]/, "", line) # remove leading whitespace
	    split(line, splitline, " ") # split at whitespace

	    if(line ~ /^Event Code/) {
		event = child_results = rel_code = ""
		read_next = read_event
		continue
	    }
	    else if(line ~ /^Children Result/) {
		read_next = read_child_result
		first_child_result = 1 # true
		continue
	    }
	    else if(line ~ /^Relative Event Code/) {
		read_next = read_rel_code
		continue
	    }
	    else {
		if(read_next == read_event) {
		    event = splitline[1]
		    read_next = 0
		}
		if(read_next == read_child_result) {
		    if(first_child_result) {
			child_results = splitline[1]
			first_child_result = 0 # false
		    }
		    else {
			child_results = child_results "," splitline[1]
		    }
		}
		if(read_next == read_rel_code) {
		    rel_code = splitline[1]
		    read_next = 0
		}
	    }
	}
    }
    if(event) {
	print event, child_results, rel_code
    }
}
Now for an explanation of
Code:
if(line ~ /^Event Code/) {
    event = child_results = rel_code = ""
    read_next = read_event
    continue
When it finds a line that starts with Event Code, it clears the event, child_results, and rel_code variables (used to hold the event code, child results, and relative event code respectively). Then it sets read_next (which tells my script what its looking for next) to my "constant" read_event, so that it sets the first thing on the next line to the event. Then the "continue" goes to the next loop iteration (and therefor to the next field in the record). It would probably be better (farther down), instead of stopping reading the event after it finds an event, stopping after it encounters a blank line. However, the script currently skips blank lines, so you would need to tweak that first.

Fell free to ask any other questions about my script. I'll try to fix my other mistake too soon.
 
Old 03-13-2011, 08:11 AM   #6
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 252

Rep: Reputation: 31
My script so far (I have fixed several mistakes):
Code:
BEGIN {
    print "Event Code|Children Result|Relative Event Code"
    RS = "[_]+\n" # records are separated at ___... lines
    FS = "\n" # fields are separated on newlines
    OFS = "|"
}

{
    read_next = 0
    read_event = 1
    read_child_result = 2
    read_rel_code = 3
    event = child_results = rel_code = ""

    for(i = 1; i <= NF; i += 1) { # loop through fields (individual lines)
	line = $i
	sub(/^[ \t]+/, "", line) # remove leading whitespace
	split(line, splitline, " ") # split at whitespace
	if(line !~ /^-[- \t]*$/) { # skip ---... lines
	    if(line ~ /^[ \t]*$/) {
		read_next = 0
		continue
	    }
	    else if(line ~ /^Event Code/) {
		read_next = read_event
		continue
	    }
	    else if(line ~ /^Children Result/) {
		read_next = read_child_result
		first_child_result = 1 # true
		continue
	    }
	    else if(line ~ /^Relative Event Code/) {
		read_next = read_rel_code
		continue
	    }
	    else {
		if(read_next == read_event) {
		    event = splitline[1]
		}
		if(read_next == read_child_result) {
		    if(first_child_result) {
			child_results = splitline[1]
			first_child_result = 0 # false
		    }
		    else {
			child_results = child_results "," splitline[1]
		    }
		}
		if(read_next == read_rel_code) {
		    rel_code = splitline[1]
		}
	    }
	}
    }
    if(event) {
	print event, child_results, rel_code
    }
}
 
Old 03-13-2011, 12:19 PM   #7
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 252

Rep: Reputation: 31
To explain how my code merges the child result lines:

When the program encounters a Children Result line, it sets read_next to read_child_result to tell the script that the following lines (up to a blank line) should be added to the child_results variable. Then it sets first_child_result to 1 (true) to let the script know that the next line will be the first child result.

When the script reads the lines following the Children Result line, it first checks to see if first_child_result is true. If it is, it sets the variable child_results to the first thing on the line (child_results = splitline[1]), and then sets first_child_result to 0 (false). If it is not true, the script sets child_results to the concatenation of the current value of child_results, a comma, and the first thing on the line (child_results = child_results "," splitline[1]).
 
Old 03-13-2011, 05:25 PM   #8
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
hda7/grail,

Many thanks both for your great and complete help. Your codes work perfect and are great examples of how to use and when to apply for() and if() else... statements. I think I have to run by steps the codes to understand even better the partial abtraction that can be imagine so far reading the codes.

My last question about this is:
-how it's possible to define one variable a use part of its name later? I mean, the defined variable is read_event, and below in the code is used sometimes only event, without "read_".

-What is the function of variables like _varname with the "_" in at the beginning?

Thanks again for all help.
 
Old 03-13-2011, 07:37 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
You seem to be getting confused on naming conventions as opposed to the actual coding.

In hda7's code all of the following are listed as variables:
Code:
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""
So read_event and event are 2 distinct and different variables.

And using an underscore (_) as a delimiter is just to make the variable names more clear, ie readevent is not as clear as read_event (imo)

Hope that helps.
 
Old 03-13-2011, 09:18 PM   #10
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 252

Rep: Reputation: 31
Quote:
Originally Posted by grail View Post
You seem to be getting confused on naming conventions as opposed to the actual coding.

In hda7's code all of the following are listed as variables:
Code:
read_next = 0
read_event = 1
read_child_result = 2
read_rel_code = 3
event = child_results = rel_code = ""
So read_event and event are 2 distinct and different variables.

And using an underscore (_) as a delimiter is just to make the variable names more clear, ie readevent is not as clear as read_event (imo)

Hope that helps.
Exactly.

I originally had readevent instead of read_event and so on, but since I couldn't read it easily, I added the underscores.
 
Old 03-14-2011, 12:59 AM   #11
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
I'm clear on this now.

Really grateful with both for the help.

Best regards.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk or sed to use CSV as input and XML as template and output to a single file bridrod Linux - Newbie 6 03-13-2012 07:00 PM
[SOLVED] awk command to merge two files silkysue Linux - Newbie 7 01-27-2011 10:14 AM
[SOLVED] merge 2 files with AWK by the field value dayamoon Linux - Newbie 8 06-03-2010 02:06 AM
insert zero before single numbers using sed (or awk) chess Programming 10 10-22-2008 08:06 AM
awk command to merge columns from two separate files into single file? johnpaulodonnell Linux - Newbie 4 01-23-2007 10:10 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration