Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
08-02-2009, 11:34 AM
|
#1
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,697
|
specifying fields for printing in gawk from command line
If I want to print out only specified lines (fields) from a file using gawk, I've found can use a bash loop that looks something like this:
Code:
#!/bin/bash
for x; do
gawk 'BEGIN{RS="\0"; FS="\n"}
{print '$x' ": " $'$x'}
' <./inputfile.txt
done
===
$ ./script.sh 1 3 2
1: value of line1
3: value of line3
2: value of line2
But this is not particularly efficient, especially if the input file is very large, as gawk has to read in the entire file for each iteration of the loop. Also, I've read that using 'RS="\0" is not recommended as a way to tell it to treat the whole file as a single record.
I think it would be better to do this entirely from within gawk so it can print out all the wanted fields at one time, but I'm not sure how to do it. I've been studying awk/gawk tutorials for hours but I can't figure it out. Should I try to use a for loop, an array, or what? Can any awk experts help me out?
(Note, printing single lines is just for the example; the actual text I want to extract will be more complex, which is why I want to use awk instead of sed or other options.)
|
|
|
|
08-02-2009, 11:47 AM
|
#2
|
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian Squeeze (server), Slackware 13.37 (netbook), Slackware64 14.0 (desktop),
Posts: 8,367
|
Hello David the H.
A loop with the next command in it to iterate over the lines ...
Best
Charles
|
|
|
|
08-02-2009, 11:54 AM
|
#3
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,697
Original Poster
|
Actually, to explain what I want in more detail, I have a text file that contains several hundred sections/records, and I want to be able to print out the records that I specify.
Each record consists of about a dozen lines, but not in a completely uniform pattern, which is why I need something like awk to parse them out. The only thing that's consistent is the starting line. The general pattern looks like this:
Code:
#1# This is record 1.
some data
some more data
some more data
#2# This is record 2.
some data
etc.
Edit: Sorry catkin, I don't quite follow your suggestion. I really need some specifics, because I'm completely confused here. How do I get the input from the command line into the loop? I suppose I could use gawk -v list="$@" or something, but then how do I loop through them once I have them?
Last edited by David the H.; 08-02-2009 at 11:58 AM.
|
|
|
|
08-02-2009, 01:15 PM
|
#4
|
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian Squeeze (server), Slackware 13.37 (netbook), Slackware64 14.0 (desktop),
Posts: 8,367
|
Quote:
Originally Posted by David the H.
Edit: Sorry catkin, I don't quite follow your suggestion. I really need some specifics, because I'm completely confused here. How do I get the input from the command line into the loop? I suppose I could use gawk -v list="$@" or something, but then how do I loop through them once I have them?
|
I understand that the values you want to pass to gawk are in the arguments to the shell script that calls gawk.
That being the case "$@" is good but will not work just like that because bash expands "$@" to "$1" "$2" ... "$n" (where n may be max 10?). This is feature is usefule whne there is whitespace in the arguments. Bash would thus expand the gawk command would expand to
Code:
gawk -v list="$1" "$2" ... "$n" <stuff>
and "$2" etc would not end up in gawk variable "list". Assuming there is no whitespace in the arguments to the bash script then you could use gawk -v list="$*" which bash would expand to a single word of space separated values and this would end up in gawk variable "list".
Will the arguments to the bash script be the numbers that appear between the "#" characters in "#1# This is record 1." and will they be in the same order they appear in your text file?
If so, you could parse the first word out of "list" and set "list" to the remainder, start the outer loop and keep doing "next" statements until you match <n> in "#<n># This is record <n>.", when you could parse the next word out of "list" ready for the next match, start an inner loop doing "next" statements and printing each line until you find another "#<*># This is record <*>." when you break out of the inner loop and iterate the outer loop.
|
|
|
|
08-02-2009, 01:41 PM
|
#5
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,697
Original Poster
|
Quote:
Originally Posted by catkin
Assuming there is no whitespace in the arguments to the bash script then you could use gawk -v list="$*" which bash would expand to a single word of space separated values and this would end up in gawk variable "list".
|
Yeah, After a bit of experimenting I've kinda gathered that. But the whole thing is still confusing me greatly.
Quote:
|
Will the arguments to the bash script be the numbers that appear between the "#" characters in "#1# This is record 1." and will they be in the same order they appear in your text file?
|
Yes, ideally this would be the case. It would match the number in the first line of the record, then print it and every line after until the start of the next record.
Or since the records are in numerical order, it could just as well print "record number n" from the file, if that would be easier.
I should be able to pass the arguments to the script in any order however, and the records should ideally be output in that same order.
Actually, I've already found a way to do it with sed, but I have to pipe it through the command twice for each record I want. I'm sure awk would do a better job of it, once I figure out how.
Quote:
|
If so, you could parse the first word out of "list" and set "list" to the remainder, start the outer loop and keep doing "next" statements until you match <n> in "#<n># This is record <n>.", when you could parse the next word out of "list" ready for the next match, start an inner loop doing "next" statements and printing each line until you find another "#<*># This is record <*>." when you break out of the inner loop and iterate the outer loop.
|
Would you mind posting some code for this? I think I get the concept (well, maybe), but I don't comprehend at all how to go about implementing it. I've been trying to bend my head around variables and arrays and loops in awk for half a day now, and I still can't really grasp how any of it is supposed to work. Nothing I've tried so far has come anywhere close to giving me a usable output, or even anything other than an error most of the time.
Last edited by David the H.; 08-02-2009 at 01:47 PM.
|
|
|
|
08-02-2009, 03:50 PM
|
#6
|
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian Squeeze (server), Slackware 13.37 (netbook), Slackware64 14.0 (desktop),
Posts: 8,367
|
Quote:
Originally Posted by David the H.
I should be able to pass the arguments to the script in any order however, and the records should ideally be output in that same order.
|
That shifts the goal posts! awk essentially runs through the file line by line, looking for patterns and, when it matches one, does actions. If you move away from that sequential approach then you have to bend awk. Fortunately it's flexible and powerful so what you have now asked for is possible but requires a different approach -- reading the whole file into awk variable(s) -- to allow printing lines in a sequence different from the one in the input file.
|
|
|
|
08-02-2009, 08:27 PM
|
#7
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
here's an approach not using RS.
Code:
#!/bin/bash
args="$*"
awk -v args="$args" 'BEGIN{
# split up the args and store in array
m=split(args,a," ")
}
f && /^#/{f=0}
/^#/{
++c #set counter whenever the line starts with #
f=1
}
f{
g=0
for(i=1;i<=m;i++){
if(a[i] == c){
g=1
}
}
if (g){ print }
}' file
output
Code:
# more file
#1# This is record 1.
some data
some more data
some more data
#2# This is record 2.
some data
etc.
#3# This is record 3.
some data
some more data
some more data
#4# This is record 4.
some data
etc.
last ...4
# ./test.sh 1 3
#1# This is record 1.
some data
some more data
some more data
#3# This is record 3.
some data
some more data
some more data
# ./test.sh 1 2 4
#1# This is record 1.
some data
some more data
some more data
#2# This is record 2.
some data
etc.
#4# This is record 4.
some data
etc.
last ...4
|
|
|
|
08-03-2009, 11:52 AM
|
#8
|
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian Squeeze (server), Slackware 13.37 (netbook), Slackware64 14.0 (desktop),
Posts: 8,367
|
Hello David 
Quote:
Originally Posted by David the H.
Would you mind posting some code for this? I think I get the concept (well, maybe), but I don't comprehend at all how to go about implementing it. I've been trying to bend my head around variables and arrays and loops in awk for half a day now, and I still can't really grasp how any of it is supposed to work. Nothing I've tried so far has come anywhere close to giving me a usable output, or even anything other than an error most of the time.
|
Could do, although it would not be easy because I haven't used awk in a non-trivial way for a while. I did want to get clear on your requirements first, though, especially as your last requirements would mean a very different overall algorithm from the first.
Best
Charles
|
|
|
|
08-04-2009, 03:32 PM
|
#9
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,697
Original Poster
|
Sorry to be late replying. I had a tiring couple of days.
Ghostdog74, Thank you so much. It works perfectly. Now I just need to go through it to understand exactly what it's doing.  Of course I wasn't married to using RS or anything. I just didn't know of any other way to go about it.
And Catkin, no, I don't absolutely NEED the output to be in the same order as the input, but it seems to me that a script should generally process things in the order that they're given. And having the output in a different order from the input can be a bit confusing sometimes. In any case, the code above does just what I want.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 07:56 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|