LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-02-2009, 11:34 AM   #1
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949
specifying fields for printing in gawk from command line


If I want to print out only specified lines (fields) from a file using gawk, I've found can use a bash loop that looks something like this:
Code:
#!/bin/bash

for x; do

        gawk 'BEGIN{RS="\0"; FS="\n"}
        {print '$x' ": " $'$x'}
        ' <./inputfile.txt

done

===

$ ./script.sh 1 3 2

1: value of line1
3: value of line3
2: value of line2
But this is not particularly efficient, especially if the input file is very large, as gawk has to read in the entire file for each iteration of the loop. Also, I've read that using 'RS="\0" is not recommended as a way to tell it to treat the whole file as a single record.

I think it would be better to do this entirely from within gawk so it can print out all the wanted fields at one time, but I'm not sure how to do it. I've been studying awk/gawk tutorials for hours but I can't figure it out. Should I try to use a for loop, an array, or what? Can any awk experts help me out?

(Note, printing single lines is just for the example; the actual text I want to extract will be more complex, which is why I want to use awk instead of sed or other options.)
 
Old 08-02-2009, 11:47 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Hello David the H.

A loop with the next command in it to iterate over the lines ...

Best

Charles
 
Old 08-02-2009, 11:54 AM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949
Actually, to explain what I want in more detail, I have a text file that contains several hundred sections/records, and I want to be able to print out the records that I specify.

Each record consists of about a dozen lines, but not in a completely uniform pattern, which is why I need something like awk to parse them out. The only thing that's consistent is the starting line. The general pattern looks like this:
Code:
#1#  This is record 1.

 some data
 some more data
 some more data

#2#  This is record 2.

 some data
 etc.
Edit: Sorry catkin, I don't quite follow your suggestion. I really need some specifics, because I'm completely confused here. How do I get the input from the command line into the loop? I suppose I could use gawk -v list="$@" or something, but then how do I loop through them once I have them?

Last edited by David the H.; 08-02-2009 at 11:58 AM.
 
Old 08-02-2009, 01:15 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Quote:
Originally Posted by David the H. View Post
Edit: Sorry catkin, I don't quite follow your suggestion. I really need some specifics, because I'm completely confused here. How do I get the input from the command line into the loop? I suppose I could use gawk -v list="$@" or something, but then how do I loop through them once I have them?
I understand that the values you want to pass to gawk are in the arguments to the shell script that calls gawk.

That being the case "$@" is good but will not work just like that because bash expands "$@" to "$1" "$2" ... "$n" (where n may be max 10?). This is feature is usefule whne there is whitespace in the arguments. Bash would thus expand the gawk command would expand to
Code:
gawk -v list="$1" "$2" ... "$n" <stuff>
and "$2" etc would not end up in gawk variable "list". Assuming there is no whitespace in the arguments to the bash script then you could use gawk -v list="$*" which bash would expand to a single word of space separated values and this would end up in gawk variable "list".

Will the arguments to the bash script be the numbers that appear between the "#" characters in "#1# This is record 1." and will they be in the same order they appear in your text file?

If so, you could parse the first word out of "list" and set "list" to the remainder, start the outer loop and keep doing "next" statements until you match <n> in "#<n># This is record <n>.", when you could parse the next word out of "list" ready for the next match, start an inner loop doing "next" statements and printing each line until you find another "#<*># This is record <*>." when you break out of the inner loop and iterate the outer loop.
 
Old 08-02-2009, 01:41 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949
Quote:
Originally Posted by catkin View Post
Assuming there is no whitespace in the arguments to the bash script then you could use gawk -v list="$*" which bash would expand to a single word of space separated values and this would end up in gawk variable "list".
Yeah, After a bit of experimenting I've kinda gathered that. But the whole thing is still confusing me greatly.

Quote:
Will the arguments to the bash script be the numbers that appear between the "#" characters in "#1# This is record 1." and will they be in the same order they appear in your text file?
Yes, ideally this would be the case. It would match the number in the first line of the record, then print it and every line after until the start of the next record.

Or since the records are in numerical order, it could just as well print "record number n" from the file, if that would be easier.

I should be able to pass the arguments to the script in any order however, and the records should ideally be output in that same order.

Actually, I've already found a way to do it with sed, but I have to pipe it through the command twice for each record I want. I'm sure awk would do a better job of it, once I figure out how.

Quote:
If so, you could parse the first word out of "list" and set "list" to the remainder, start the outer loop and keep doing "next" statements until you match <n> in "#<n># This is record <n>.", when you could parse the next word out of "list" ready for the next match, start an inner loop doing "next" statements and printing each line until you find another "#<*># This is record <*>." when you break out of the inner loop and iterate the outer loop.
Would you mind posting some code for this? I think I get the concept (well, maybe), but I don't comprehend at all how to go about implementing it. I've been trying to bend my head around variables and arrays and loops in awk for half a day now, and I still can't really grasp how any of it is supposed to work. Nothing I've tried so far has come anywhere close to giving me a usable output, or even anything other than an error most of the time.

Last edited by David the H.; 08-02-2009 at 01:47 PM.
 
Old 08-02-2009, 03:50 PM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Quote:
Originally Posted by David the H. View Post
I should be able to pass the arguments to the script in any order however, and the records should ideally be output in that same order.
That shifts the goal posts! awk essentially runs through the file line by line, looking for patterns and, when it matches one, does actions. If you move away from that sequential approach then you have to bend awk. Fortunately it's flexible and powerful so what you have now asked for is possible but requires a different approach -- reading the whole file into awk variable(s) -- to allow printing lines in a sequence different from the one in the input file.
 
Old 08-02-2009, 08:27 PM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,696
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
here's an approach not using RS.
Code:
#!/bin/bash
args="$*"
awk  -v args="$args" 'BEGIN{
    # split up the args and store in array
    m=split(args,a," ")
}
f && /^#/{f=0}
/^#/{
    ++c #set counter whenever the line starts with #
    f=1
}
f{
    g=0
    for(i=1;i<=m;i++){       
        if(a[i] == c){
            g=1
        }
    }
    if (g){  print }
}' file
output
Code:
# more file
#1#  This is record 1.

 some data
 some more data
 some more data

#2#  This is record 2.

 some data
 etc.

#3#  This is record 3.

 some data
 some more data
 some more data

#4#  This is record 4.

 some data
 etc.
 last ...4

# ./test.sh 1 3
#1#  This is record 1.

 some data
 some more data
 some more data

#3#  This is record 3.

 some data
 some more data
 some more data

# ./test.sh 1 2 4
#1#  This is record 1.

 some data
 some more data
 some more data

#2#  This is record 2.

 some data
 etc.

#4#  This is record 4.

 some data
 etc.
 last ...4
 
Old 08-03-2009, 11:52 AM   #8
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Hello David
Quote:
Originally Posted by David the H. View Post
Would you mind posting some code for this? I think I get the concept (well, maybe), but I don't comprehend at all how to go about implementing it. I've been trying to bend my head around variables and arrays and loops in awk for half a day now, and I still can't really grasp how any of it is supposed to work. Nothing I've tried so far has come anywhere close to giving me a usable output, or even anything other than an error most of the time.
Could do, although it would not be easy because I haven't used awk in a non-trivial way for a while. I did want to get clear on your requirements first, though, especially as your last requirements would mean a very different overall algorithm from the first.

Best

Charles
 
Old 08-04-2009, 03:32 PM   #9
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949
Sorry to be late replying. I had a tiring couple of days.

Ghostdog74, Thank you so much. It works perfectly. Now I just need to go through it to understand exactly what it's doing. Of course I wasn't married to using RS or anything. I just didn't know of any other way to go about it.

And Catkin, no, I don't absolutely NEED the output to be in the same order as the input, but it seems to me that a script should generally process things in the order that they're given. And having the output in a different order from the input can be a bit confusing sometimes. In any case, the code above does just what I want.
 
  


Reply

Tags
awk, fields, gawk, input, print


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Printing from command line noeffred Ubuntu 3 08-12-2006 01:26 PM
How to use awk command to parse fields in a line johnsanty Programming 9 05-25-2006 09:56 PM
Command Line Printing lasalsademuerte Linux - General 1 02-18-2006 03:38 PM
command line printing rb3ng Linux - Newbie 3 04-22-2004 12:35 PM
Command line printing? HappyDude Linux - Software 2 10-25-2003 01:00 PM


All times are GMT -5. The time now is 07:40 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration