Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Not tried it out yet, but I might read and process output1.txt on the END pattern. (The basic call remaining as
Quote:
gawk -f myprog.awk input1.txt
.)
Also, I would generate records in output1.txt iff (NR%2 == 0) and in this case generate as many lines as the length($3). Note that this would handle the missing last line in your output.
@Trd300: If you do want to run 1.awk and 2.awk simultaneously you do would need a pipe or some runtime medium for that. Using an ordinary file (in this case, that output1 file) won't work. I mean, both awk files opening the file at the same time as output and intput won't. It would if you run 1.awk first then 2.awk afterwards. I'm not sure if there are techniques to do that on an ordinary file but generally I know there's none, or perhaps it would be hard in awk. In C opened as binary maybe.
Anyway perhaps you're just giving that an example to show what you really want (myprog.awk) so how about grail's suggestion of using arrays? Do you still *need the in-between operation output1 file to be used for later operations?
Last edited by konsolebox; 09-25-2012 at 04:01 AM.
Actually the step-wise algorithm looks more like that:
Code:
original input ---> 1.awk ---> output1
output1 ---> 2.awk ---> final output
OK, I see what you mean.
I am not sure arrays will fix the problem. In myprog.awk a first function1 will produce results1, and a different function2 will produce results2. As I cannot assign the same variable for different values (results1 & results2) I though redirecting results1 and results 2 in the same file by concatenating them would sort the problem out:
Code:
BEGIN{}
<define function1 here>
<define function2 here>
{print function1($X) > output1.txt}
{print function2($X) >> output1.txt}
{ close("output1.txt");
RS=ORS="\n";
while((getline < "output1.txt") > 0){<keep working on "output1.txt" as an input>}
END{}
I am gonna try using getline from a coprocess, although I don't know if there is a way to concatenate the results of the different functions.
I must be reading this all wrong as I am getting very lost now
If we assume the 1.awk / 2.awk approach, my understanding is that you would create a temporary file after running 1.awk on the original input and then run 2.awk on the temporary
file to produce the final output. Is this correct?
If above is correct, is it not simply a case of first performing the necessary tasks on the original data and then any follow up tasks to produce the desired output?
Again I would request a before an after picture of data? It seems to me you may be trying to place the square peg in round hole when it is not necessarily the process you should be using.
If we assume the 1.awk / 2.awk approach, my understanding is that you would create a temporary file after running 1.awk on the original input and then run 2.awk on the temporary
file to produce the final output. Is this correct?
Yes it is correct.
Quote:
If above is correct, is it not simply a case of first performing the necessary tasks on the original data and then any follow up tasks to produce the desired output?
Yes it is a case like that.
Code:
original input ---> function1---> results1
----> concatenate results1 & 2 ---> process ---> final output
---> function2---> results2
If you're using a version of gawk that supports it (Version 4 does; I'm not sure about version 3), you could consider something like this:
Code:
BEGIN {
# Expand the argument list so each input file name is duplicated:
for (i=1; i<ARGC; ++i) {
# Is this a valid (readable) file?
if ((getline test < ARGV[i]) > 0) {
close(ARGV[i])
for (j=ARGC;j>i;--j) {
ARGV[j]=ARGV[j-1]
}
++ARGC
++i # So the outer loop skips the duplicate we've added . . .
}
}
process_count=0
}
BEGINFILE {
# Is this a readable file?
if (ERRNO != 0) {
# Process the non-file value.
nextfile
}
++process_count
}
process_count==1 {
# Do the stuff for the first pass through the file . . .
}
process_count=2 {
# Do your thing for the second pass through the file . . .
}
ENDFILE {
if (process_count==2) {
process_count=0
# Any other EOF processing you want . . .
}
}
END {
# Final clean-up and termination processing . . .
}
Actually if it's per-line basis, Trd300 could just use the variable ($0 or other) that stores the input twice and pass it to two functions. If it's a per-file basis, he/she could read the file twice with:
Code:
while (getline < input) {
# ...
}
close(input)
while (getline < input) {
# ...
}
close(input)
The latter is to be based from my suggestion with only using the BEGIN block.
First, writing the numbers on the same line as the preceding record separated by a pipe (and remove the "@"):
Code:
XXXXXX|YYY|12345678
...
To do that, set the RS as "@" and delete the "\n".
Then we use 2 functions:
function1: convert block of 2 numbers to letters (according to a conversion array)
function2: reverse the string of numbers
1) From the original input file , using function1, convert block of 2 numbers to letters starting from the 1st letter, then the 2nd, then the 3rd,...until the end of the string.
2) Then always with the same input, using function2, reverse the original string of numbers and do like 1) to it.
3) concatenate the results of 1) with the results of 2) in the same output (in which we removed $2), to get this intermediate file:
Code:
XXXXXX|aceg # start from 1st number (i.e. 12345678)
XXXXXX|bdfx # start from 2nd number (i.e. 2345678)
XXXXXX|ceg # start from 3rd number (i.e. 345678)
XXXXXX|dfx # start from 4th number (i.e. 45678)
XXXXXX|eg # start from 5th number (i.e. 5678)
XXXXXX|fx # start from 6th number (i.e. 678)
XXXXXX|g # start from 7th number (i.e. 78)
XXXXXX|x # start from last number (i.e. 8)
XXXXXX|hjln # same but after reversing the string starting from 1st number (i.e. 87654321)
XXXXXX|ikmx # same but after reversing the string starting from 2nd number (i.e. 7654321)
etc...
4) Keep processing the intermediate file (e.g. keep the strings with more than 2 letters, or with a specific letter,...)
Here is how I tried to do:
Code:
BEGIN{
RS="@"; FS=OFS="|"; conv["12"]="a"; conv["23"]="b"; conv["34"]="c"; conv["45"]="d"; conv["56"]="e"; conv["67"]="f"; conv["78"]="g";
conv["87"]="h"; conv["76"]="i"; conv["65"]="j"; conv["54"]="k"; conv["43"]="l"; conv["32"]="m"; conv["21"}="n"
}
function convert(field, start){
letter = ""
block = substr (field, start, 2)
while (block != ""){
letter = letter (block in conv ? conv[block] : "x")
start = start + 2
block = substr (field, start, 2)
}
return letter
}
function rev(field){
rever = ""
l = length(field)
for (i=l; 0<i; i--){
rever = rever substr (field, i, 1)
}
return rever
}
NR==1{next}
NR>1{
sub("\n", "|") # write second line next to the preceding one
gsub("\n", "")
}
{
for(i=1; i<=(length($3); i++){
print $1 FS convert($3, i) > "intermediate.txt" # step 1) and output in a file (we removed $2)
}
for(i=1; i<=(lentgh($3); i++){
print $1 FS convert(rev($3), i) >> "intermediate.txt" # step 2) (we removed $2) and 3) concatenate in the same file
}
}
##### BLOCK BELOW DOESN'T WORK ######
{
close("intermediate.txt");
RS=ORS="\n"; FS=OFS="|"; # re-define RS, FS to be able to use "intermediate.txt" as if it was the input of a second command-line
while((getline < "intermediate.txt") > 0){
if(length($2) > 2) {print $0} # note that previous $3 in original input becomes $2 in "intermediate.txt"
else{next}
... <keep processing "intermediate.txt">
}
{
for(i=1; i<=(length($3); i++){
print $1 FS convert($3, i) > "intermediate.txt" # step 1) and output in a file (we removed $2)
}
for(i=1; i<=(lentgh($3); i++){
print $1 FS convert(rev($3), i) >> "intermediate.txt" # step 2) (we removed $2) and 3) concatenate in the same file
}
}
For that I think you should use >> as well for the first step, but you truncate the file intermediate.txt in the BEGIN block, but only if it doesn't work - that is, if the file is truncated back when first step is encountered.
When I delete the "while((getline ...)" block after redirecting the output to "intermediate.txt" for the second time, the file contains the correct data.
If I do the same with ">>" at the first redirection, the file contains the data in duplicate.
Sorry. I try to examine the whole thread but it's still not apparent what is the ~final~ output that you really want to have. We could help better if we know that. It's somehow confusing to comply with the procedures at hand.
---- Add ----
I mean at least we need a real example output from original form to final.
I understand it can bee confusing.
Starting from my last post with the code explain pretty much everything. You don't need to look before this post.
input:
Code:
@XXXXXX|YYY
12345678
"intermediate.txt":
Code:
##### Results from the first call of the function ######
XXXXXX|aceg # start from 1st number (i.e. 12345678)
XXXXXX|bdfx # start from 2nd number (i.e. 2345678)
XXXXXX|ceg # start from 3rd number (i.e. 345678)
XXXXXX|dfx # start from 4th number (i.e. 45678)
XXXXXX|eg # start from 5th number (i.e. 5678)
XXXXXX|fx # start from 6th number (i.e. 678)
XXXXXX|g # start from 7th number (i.e. 78)
XXXXXX|x # start from last number (i.e. 8)
###### Results from the second call of the function after reversing the string ######
XXXXXX|hjln # same but after reversing the string starting from 1st number (i.e. 87654321)
XXXXXX|ikmx # same but after reversing the string starting from 2nd number (i.e. 7654321)
etc... # same as previous line until the end of the reverse string
final output (if, in the last block of the code when I redirect "intermediate.txt" as the new input, I want to keep $2 > 2 letters long for instance):
Code:
XXXXXX|aceg # start from 1st number (i.e. 12345678)
XXXXXX|bdfx # start from 2nd number (i.e. 2345678)
XXXXXX|ceg # start from 3rd number (i.e. 345678)
XXXXXX|dfx # start from 4th number (i.e. 45678)
XXXXXX|hjln # same but after reversing the string starting from 1st number (i.e. 87654321)
XXXXXX|ikmx # same but after reversing the string starting from 2nd number (i.e. 7654321)
etc...
The problem is the transition between the block when I use the functions and concatenate both results and the block when I want to use "intermediate.txt" as a new input.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.