redirecting input from file in awk script

konsolebox · 09-25-2012, 09:56 PM

Is this table your standard? How bout single digits? How must it be really converted?

Code:

conv["12"]="a"
conv["23"]="b"
conv["34"]="c"
conv["45"]="d"
conv["56"]="e"
conv["67"]="f"
conv["78"]="g"
conv["87"]="h"
conv["76"]="i"
conv["65"]="j"
conv["54"]="k"
conv["43"]="l"
conv["32"]="m"
conv["21"}="n"

Trd300 · 09-25-2012, 10:04 PM

This is the conversion table from numbers to letters.

I am only working on blocks of 2 numbers.
If a block of 2 numbers is not in the table then it returns "X". The same when we start at the last position and thus only one digit makes a block.

konsolebox · 09-25-2012, 10:12 PM

But how bout if it's even like "12345"?. What do we do with the extra 5?

konsolebox · 09-25-2012, 10:17 PM

Anyway this should work already. The only missing part is the convert function. That should be easy enough already.

Code:

#!/usr/bin/env gawk -f

function truncate(file) {
    return (system(": > '" file "' >/dev/null 2>&1") == 0)
}

function reverse(string,  r, i) {
	for (i = length(string); i; --i) {
		r = r substr(string, i, 1)
	}
	return r
}

function convert(string) {
	return string
}

function print_set(x, n) {
	while (n) {
		c = convert(n)
		if (length(c) > 2) {
			printf("%s|%s\n", x, c)
		}
		n = substr(n, 2)
	}
}

BEGIN {
}

{
	if ($0 ~ /^@[A-Z]+|[A-Z]+\$/) {
		match($0, /[A-Z]+/)
		x = substr($0, RSTART, RLENGTH)
		if (getline) {
			if ($0 ~ /^[0-9]+$/) {
				print_set(x, $0)
				print_set(x, reverse($0))
			}
		}
	}
}

Usage:

Code:

gawk -f script.awk -- input.txt > output.txt

---- Add ----

Seems like I include the truncate function. It's no longer needed so you don't have to use it.

Trd300 · 09-25-2012, 10:29 PM

Quote:

But how bout if it's even like "12345"?. What do we do with the extra 5?

If a block contains only one digit, this block is not part of the table so it is replaced by "X" as well.

konsolebox · 09-25-2012, 10:33 PM

Ok. So what do you think about the script?

Trd300 · 09-25-2012, 11:02 PM

After inserting the convert function and a for loop to start at different position, it seems to work. Thanks konsolebox !

* However, printing only converted string > 2 characters in length was just an example.
For real, I have numerous steps to go through once I converted the numbers to letters.
Would it mean that all these numerous steps have to be written inside the function "print_set" ?

My original query was to be able, once I converted the numbers to letters, to start again from the converted "intermediate.txt" as an input and to process it the same way as I would have process it with a command line.
This way I would have be able to define RS=ORS="\n", FS=OFS="|", and to use the variable $1 and $2 instead of "x" and "$0" in your code. Which would have been easier after I think.

* In the function "truncate" I don't really get the expression of the system function:

Code:

": > '" file "' >/dev/null 2>&1"

I assume the "0" exit status means success.

konsolebox · 09-25-2012, 11:09 PM

Quote:

Originally Posted by Trd300

Would it mean that all these numerous steps have to be written inside the function "print_set" ?

Not necessary. Originally it was two sets of steps inside for loops in the main block. I just placed it there to make it more readable.

Quote:

My original query was to be able, once I converted the numbers to letters, to start again from the converted "intermediate.txt" as an input and to process it the same way as I would have process it with a command line.
This way I would have be able to define RS=ORS="\n", FS=OFS="|", and to use the variable $1 and $2 instead of "x" and "$0" in your code. Which would have been easier after I think.

I'm really confused about that method (using RS, FS, etc.). Perhaps I just don't want to understand it since I see a better way to do it. Is there something you want to do more about the results? Please describe it in a pseudo-like manner, not by how you plan to do it.

Quote:

* In the function "truncate" I don't really get the expression of the system function:

Code:

": > '" file "' >/dev/null 2>&1"

I assume he "0" exit status means success.

It's just a shell command emulated within awk using the system function. Or perhaps awk calls a subshell for that. Yes it means success.

Trd300 · 09-25-2012, 11:36 PM

Quote:

Is there something you want to do more about the results?

Yes there are about 100 other operations (some are a bit complex) to do after printing the line containing $2 > 2 characters long.

That's why "keeping $2 > 2 letters" was just an example to know how to make the transition between the first part of the script that uses the original input file to convert numbers to letters (redirected to "intermediate.txt"), and the use of "getline < file" to use "intermediate.txt" as a new input file (with RS="\n", FS="|") ready to be processed through numerous steps.

Instead of using 2 command lines:

Code:

gawk -f convert.awk original_input.txt ---> converted_output.txt
gawk -f numerous_steps.awk converted_output.txt ---> final_results.txt

I wanted to write all the steps in a single program that I could use like that:

Code:

gawk -f myprog.awk original_input.txt  (that includes conversion + numerous steps) ---> final_results.txt

konsolebox · 09-25-2012, 11:54 PM

If that's the case you could just use convert $0 for every function.

Code:

function do_something_1() {
	<do something with $0>
	$0 = result
	return 1
}

function do_something_2() {
	<do something with $0>
	$0 = result
	return 1
}

{
	if (!do_something_1()) {
		next
	}
	
	if (!do_something_2()) {
		next
	}

	print $0
}

konsolebox · 09-26-2012, 12:34 AM

Or better yet something that uses dynamic types:

Code:

function do_something_1(single_line) {
	<do something>
	return array_of_lines
}

function do_something_2(array_of_lines) {
	<do something>
	return array_of_lines
}

{
	output = do_something_1($0)

	if (!output) {
		next
	}

	output = do_something_2(output)

	if (!output) {
		next
	}

	print_result(output)
}

Trd300 · 09-26-2012, 01:46 AM

Thanks konsolebox but I'm just a bit lost with your 2 last posts.
I don't manage to apply them on my code. If you could show me with one function (not everything, let me some...

)

Code:

BEGIN{
         RS="@"; FS=OFS="|"; conv["12"]="a"; conv["23"]="b"; conv["34"]="c"; conv["45"]="d"; conv["56"]="e"; conv["67"]="f"; conv["78"]="g";
         conv["87"]="h"; conv["76"]="i"; conv["65"]="j"; conv["54"]="k"; conv["43"]="l"; conv["32"]="m"; conv["21"}="n"
         }

function convert(field, start){
         letter = ""
         block = substr (field, start, 2)
         while (block != ""){
              letter = letter (block in conv ? conv[block] : "x")
              start = start + 2
              block = substr (field, start, 2)
         }
         return letter
}

function rev(field){
         rever = ""
         l = length(field)
         for (i=l; 0<i; i--){
              rever = rever substr (field, i, 1)
         }
         return rever
}      



NR==1{next}

NR>1{
          sub("\n", "|")       # write second line next to the preceding one
          gsub("\n", "")
         }

{
     for(i=1; i<=(length($3); i++){                                            
          print $1 FS convert($3, i) > "intermediate.txt"    # step 1) and output in a file (we removed $2)
     }
     
     for(i=1; i<=(lentgh($3); i++){
          print $1 FS convert(rev($3), i) >> "intermediate.txt"    # step 2) (we removed $2) and 3) concatenate in the same file
     }
}

##### BLOCK BELOW DOESN'T WORK ######

{
     close("intermediate.txt");
     RS=ORS="\n"; FS=OFS="|";                 # re-define RS, FS to be able to use "intermediate.txt" as if it was the input of a second command-line
     while((getline < "intermediate.txt") > 0){
           if(length($2) > 2) {print $0}          # note that previous $3 in original input becomes $2 in "intermediate.txt"
           else{next}
  
           ... <keep processing "intermediate.txt">

}

konsolebox · 09-26-2012, 01:52 AM

Well that's disappointing since the one I presented was clearly working already. Well I guess it's just a matter of philosophy.

Trd300 · 09-26-2012, 01:56 AM

I don't forget your code, I just would like to modify both codes and see which one would suit me the best (yours I bet).

Trd300 · 09-26-2012, 04:48 AM

konsolebox: Even I am not sure I understood your previous post, I tried to modify the functions and to simplify a bit more, to fuse the 2 functions "convert" and "reverse" together and to incorporate the increment to start at different positions, into a single function.
I tried:

Code:

function total(n, start){   
     letter = ""     
     for (first=1;  first<= length(n); first++){
          ss = substr (n, first)
          block = substr (ss, start, 2)
          while (block != ""){
                letter = letter (block in conv ? conv[block] : "X")
                start = start + 2
                block = substr (ss, start, 2)
                }
          $2 = letter
          }
     letter_reverse = ""
     for (i = length(n); i=1; i--){
          n_reverse = n_reverse substr (n, i, 1)
          for (first=1;  first<= length(n); first++){
               ss_reverse = substr (n_reverse, first)
               block_reverse = substr (ss_reverse, start, 2)
               while (block_reverse != ""){
                    letter_reverse = letter_reverse (block_reverse in conv ? conv[block_reverse] : "X")
                    start = start + 2
                    block_reverse = substr (ss_reverse, start, 2)
                }
          $2 = letter_reverse
          }
     }
     return 1
}

but it doesn't work at all...

Is it possible in a same function to use one line (here "n") to produce 2 different results (here "letter" and "letter_reverse")?
How can I use the same variable (here "$2") to define these 2 distinct values????