LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 09-21-2012, 02:56 AM   #1
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Rep: Reputation: Disabled
redirecting input from file in awk script


Hi Linux experts !

A general question.

Inside an awk script, how do you redirect an input file (different from the original one you entered in the command line).

In a general way, if you invoke a script with:
Code:
gawk -f myprog.awk input1.txt
and myprog.awk is something like:

Code:
BEGIN{}

function dosomething{field}

{
    print dosomething($1) > output1.txt     # myprog.awk use the original input1.txt and redirect the file output1.txt at this step
}
.
.                                          # keep going farther in the script
.
{
    while(0 < (getline < "output1.txt"))     # need to stop using the original input1.txt and use output1.txt instead for the rest of the script
}

END{}
Is it doable in awk?

Last edited by Trd300; 09-21-2012 at 02:58 AM.
 
Old 09-21-2012, 06:28 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729
Quote:
Is it doable in awk?
Did you try it?

Google found this:
http://www.linuxquestions.org/questi...de-awk-874908/
 
Old 09-21-2012, 07:56 AM   #3
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Hi pixellany, thanks for your help!

Yep I've seen this one and the awk manual.

* But do you have to use the code below only in the BEGIN{} partt of the script or you can use it when you want?
Code:
while ( 0 < (getline < file))
* Do you have to use a specific function to "close" the previous section before using the code above (like exit, break or something else) ?

* And do you have to re-define FS, OFS, RS... after redirecting the new file?

Thanks !

Last edited by Trd300; 09-21-2012 at 07:58 AM.
 
Old 09-21-2012, 10:40 PM   #4
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Maybe with as simple example:

command-line:
Code:
gawk -f myprog.awk input1.txt
script:
Code:
BEGIN{FS=OFS="|"}

{
  <first block where I process the original input1.txt invoked in the command line> > "output1.txt"
}

{while((getline < "output1.txt") > 0){
     RS=ORS="\n"; FS=OFS="|"
     print $3
     }
}
if the file output1.txt generated in the middle of the script is:
Code:
AAA|BBB|CCC
DDD|EEE|FFF
GGG|HHH|III
JJJ|KKK|LLL
I got this final output after the program finishes running the script:
Code:
CCC
DDD
instead of:
Code:
CCC
FFF
III
LLL
In the bold section I tried to use "if" instead of "while" statement, getline var < file, with or without defining FS/OFS, RS/ORS but it returns the same results every time.

Any help or comments would be great.
Thanks in advance !

Last edited by Trd300; 09-21-2012 at 10:43 PM.
 
Old 09-22-2012, 01:53 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Quote:
But do you have to use the code below only in the BEGIN{} part of the script or you can use it when you want?
No you can use it anywhere you like with the understanding that once completed, the file in question will have no more reads left. It can also be used if you wish to read an entire file
each time you get to this part of your code to be used in comparison to your data from the current file.
Quote:
Do you have to use a specific function to "close" the previous section before using the code above (like exit, break or something else) ?
(g)awk comes with its own close function to close any open file. Like other languages, it is a good habit to get into closing a file opened inside the code as opposed
to one being read in.
Quote:
And do you have to re-define FS, OFS, RS... after redirecting the new file?
Only if the files require different values.

The last point leads to your next example, there is no need to reset the above values if they do not change from those set in BEGIN.
Code:
awk 'BEGIN{OFS=FS="|"; while((getline < "output1.txt") > 0)print $3}'
As per point one above, this does not have to be in BEGIN, just helped for demonstrating here
 
Old 09-22-2012, 10:46 PM   #6
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Thanks for your explanations grail !

I tried to follow your points and realised the RS of the original input file and the one of the output1.txt had to be changed.

Code:
BEGIN{RS=";"; FS=OFS="|"}

{
  <first block where I process the original input1.txt invoked in the command line> > "output1.txt"
}

close("output1.txt")                  # closing the file

{
    RS="\n"                        # re-defining the new RS
    while((getline < "output1.txt") > 0){
     print $3
     }
}
But now I get a mix of lines from the first input file with wrong RS, FS etc.
 
Old 09-23-2012, 02:43 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Maybe you need to explain a bit more what you are actually trying to do? It is important to note that as neither of your blocks has an expression before them they will be both be
executed everytime a line is read from the file passed on the command line to the script. Is this your intention?
 
Old 09-23-2012, 07:17 AM   #8
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
It is just a part of a long program. Here is a simple example.
input:
Code:
@AAA|BBB
123456
@DDD|EEE
1237890
I need to write every second line of the input front of the previous line, then to shorten the string of 0, then 1, then 2, then, 3, then 4 letters from the left. but I need to use a function as I will use it several time in the script.

I would like to get this output at this step:
Code:
AAA|BBB|123456
AAA|BBB|23456
AAA|BBB|3456
AAA|BBB|456
DDD|EEE|1237890
DDD|EEE|237890
DDD|EEE|37890
DDD|EEE|7890
then redirect that to a file that I will need later (in this example let's say I just want to print the 3rd field only just after)


script:
Code:
BEGIN{RS=@; FS=OFS="|"}

function layout(field)
        trim = ""
        l = length(field)
        for(i=1; i<=4; i++){
        trim = substr(field, i, (l-i)+1)
        }
        return trim
}

NR==1{next}
NR>1{
          sub("\n","|",$0)

          print layout($3) > "output1.txt"

}
close("output1.txt")

{ RS = "\n"
  while((getline < "output1.txt") > 0)
  print $3
}
I don't know if it makes more sense with this example...!

Last edited by Trd300; 09-23-2012 at 07:19 AM.
 
Old 09-23-2012, 11:21 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
So on first look you may need to revisit defining your own functions as good practice is to place the return items within the definition as this sets it as a kind of local variable.

Also, do we need to be outputting this data to another file? Why not store it in an array?:
Code:
#!/usr/bin/awk -f

BEGIN{  FS=OFS="|"  }

{
    getline field
    sub(/@/,"")
    for( i=1; i<=4; i++)
        arr[j++] = $0 FS substr(field,i)
}

END{
    for( x in arr){
        split(arr[x],a)
        print x,a[3]
    }
}
This is more to give you ideas, not to say your current direction is wrong. I guess as i do not understand the complexities of what you are trying to write I am struggling to follow the current logic.
 
Old 09-23-2012, 05:07 PM   #10
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,154

Rep: Reputation: 333Reputation: 333Reputation: 333Reputation: 333
Also, you should note that getline < file resets the value of $0 and then re-parses $0, resetting nf, $1, ..., $nf

If that's not what you want to happen, you need to use the getline variable < file form. See info gawk for details.

Also, note that using getline in a BEGIN section can cause variables to be set in that section which would not, normally, be set so early in the AWK processing. (Again, this is described in the getline section in the info file.)
 
1 members found this post helpful.
Old 09-23-2012, 07:37 PM   #11
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Thanks grail and PTrenholme !

@ grail: It's not easy to explain, it was just an example as I need to process the original input file before calling the function and then processing the redirected intermediate output file.
But as you mentioned I will try to store the data in an array instead of a file to get the entire records.

However I have a stupid question: If I understand well, the return part is used to return the final result the function is made to produce. Therefore, you can only return one variable that you have to define earlier in the function (usually at the first line of the function like: variable = ""). Is it right?
But I read that assigning an expression to the return statement is not compulsory. Specially when you want a function for what it does, not for what it returns. The danger is that it can return random output without warning you. Is there a way to force the function returning the right product if I don't specify an expression in the return statement?

@ PTrenholme:
I read everything I found about functions and the getline function. I tried all sort of getline types but the output is always messy (wrong RS, FS...) even with getline var < file which is supposed not to modify $0 and RS, FS...
So I assume my syntax is wrong, and I don't write the re-definition of RS and FS in the right location maybe (but I tried several combinations).

e.g.:
Code:
BEGIN{RS="@"; FS=OFS="|"}

{
    ...                         # block to process the original input file

    print $0 > "output1.txt"             # redirect output to a file
    close("output1.txt")
}

{
    while/if((getline < "output1.txt") > 0)
    RS=ORS="\n"; FS=OFS="|"
    <process "output1.txt>
}

END{}

Thanks anyway !

Last edited by Trd300; 09-23-2012 at 07:50 PM.
 
Old 09-24-2012, 04:47 AM   #12
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Yes you will need to assign values to RS and so on prior to calling getline if they are to be different to the defaults.

Of course the issue you then face is that these new values will be used when the original file is read again, ie the
one passed on the command line. To use your example:
Code:
BEGIN{RS="@"; FS=OFS="|"}

{
    ...                         # block to process the original input file

    print $0 > "output1.txt"             # redirect output to a file
    close("output1.txt")
}

{
# Up until the line below the values set in BEGIN will be used
    RS=ORS="\n"; FS=OFS="|"
# From this point on and for the rest of the script the above values will be used
    while/if((getline < "output1.txt") > 0)
    <process "output1.txt>
}

END{}
If we assume you have run the following at the command line:
Code:
$ gawk -f myprog.awk input1.txt
The first line read from input1.txt will use the variables set in BEGIN up until those set in the second block.
From the second block onwards and for all subsequent lines read from input1.txt they will use the new values hence records
will no longer be looked at as separated by '@' but rather by '\n'. This will of course change your results
 
1 members found this post helpful.
Old 09-24-2012, 05:25 AM   #13
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 16

Rep: Reputation: 233Reputation: 233Reputation: 233
Sometimes to lessen the confusion you could just use the whole BEGIN block instead. You don't have to use the other blocks.
Code:
#!/usr/bin/env gawk -f

function truncate(file) {
    return (system(": > '" file "' >/dev/null 2>&1") == 0)
}

BEGIN {
    # RS="@"; FS=OFS="|"

    if (ARGV[1]) {
        input = ARGV[1]
    }
    else {
        input = "/dev/stdin"
    }

    output = "output.txt"

    if (!truncate(output)) {
        print "Unable to truncate output file."
        exit(1)
    }

    while (getline < input) {
        # ...
        print $0 >> output
    }

    close(output)

    # RS=ORS="\n"; FS=OFS="|"

    while (getline < output) {
        # ...
    }

    close(output)

    exit(0)
}
Even though normally it's coded like this:
Code:
#!/usr/bin/env gawk -f

function truncate(file) {
    return (system(": > '" file "' >/dev/null 2>&1") == 0)
}

BEGIN {
    # RS="@"; FS=OFS="|"

    output = "output.txt"

    if (!truncate(output)) {
        print "Unable to truncate output file."
        exit(1)
    }
}

{
    print $0 >> output
}

END {
    close(output)

    # RS=ORS="\n"; FS=OFS="|"

    while (getline < output) {
        # ...
    }

    close(output)

    exit(0) # probably no longer necessary
}

Last edited by konsolebox; 09-24-2012 at 05:30 AM. Reason: forgot to move close()
 
1 members found this post helpful.
Old 09-24-2012, 06:58 AM   #14
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Thanks grail and konsolebox for these useful explanations, I see the point now !
 
Old 09-25-2012, 02:30 AM   #15
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Even by following konsolebox's strategies, I still cannot redirect the intermediate output file into the script properly (i.e. like if I used a second command line with this file as input)

I think I have to do big modifications with my functions !

Thanks for your help anyway !
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[awk script] Help me delete lines in a file using script ? sieukid Programming 5 03-20-2012 02:23 PM
[SOLVED] awk or sed to use CSV as input and XML as template and output to a single file bridrod Linux - Newbie 6 03-13-2012 08:00 PM
Using file content as input for awk search patterns srn Programming 2 09-13-2011 03:49 AM
awk: how can I print out a message to the screen when redirecting the output to file. quanba Programming 5 03-28-2010 06:25 PM
Plz tell me, how to get input in awk script intikhabalam Linux - General 1 07-27-2008 08:01 AM


All times are GMT -5. The time now is 05:28 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration