ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
As a novice awker I cannot fully follow it. I've groped through the darkness only this far:
NF is a System Variable which is the number of fields for the current input record. What is its significance here?
RS is a System Variable which is the record separator. In this thread the individual paragraphs are separated by a string of dashes so RS='--+' might be defining each paragraph as a record. Is this right? Why the +? Why is it cited at the end of this awk instead of the beginning?
$0 is the current input record in its entirety.
d is apparently a variable because if I change it to e or f the code still works.
So... (and here I get shaky)... we read the entire file one paragraph at a time, and each time we overwrite the contents of variable d with the most recent paragraph. Then we hit END which tells us to stop reading and start printing. There's only one thing to print, and that is d, the last paragraph.
NF - you are correct about its origin. You then need to remember that everything in front of {} is evaluated to eventually be true or false. As a record with zero fields would have an NF value of
zero, the braces would not get entered and the value of variable 'd' will not change. The significance to the OPs example is because there are dashes after the last visible record, awk will say that
the final record is the empty one after the last dashes, which of course we do not wish to print.
RS - again origin is correct. The trick to remember with awk is that there are actually 3 places you can set 'system variables':
1. Use -v ... awk -vRS="--+"
2. In the BEGIN ... BEGIN{RS = "--+"}
3. After the 'program' ... this is of course what I have used here
My general rule of thumb is if only one and it is less typing I use after the program, otherwise I use the BEGIN. I reserve the -v option only for those I wish to draw from the environment (usually)
As for the '+', * is zero or more and + is one or more. The data leant itself to the latter (try changing for a * and see the difference)
Quote:
Then we hit END which tells us to stop reading and start printing.
Slight correction here, END is only processed once all files have finished being read (gawk v4+ now also contains ENDFILE which allows you to set things to occur when each file
completes)
Please let me know if you need any further information
Please let me know if you need any further information
All questions answered; thank you.
Purely as a learning exercise, I propose making the OP's problem a bit more difficult. Suppose he wanted the penultimate paragraph. How could that be done?
I would suggest using 2 variables and print the alternate one. If you then extend to any line I would suggest storing in an array and print length - N of array
I know nothing of internal implementation, hence this question.
Does tac really begin reading a file from its last record, or does it read the entire file and buffer it (or parts of it)?
The answer has performance implications. If the input file is huge, then a solution to OP's problem which begins with tac might faster than another solution which reads the entire file, start to end. That assumes tac is clever enough to read only enough to satisfy the following piped commands.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.