ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The point of the script is to number the occurences of the respective words.
And I run the following:
Code:
man ls | col -b | awk -f awk_script
And I get some results.
My question is, how are the strings actually turned into digits? I understand "i" is used as a counter for "words". And then words are taken one at a time, so as the variable "words" is filled with these values. But it seems that this array variable already contains digits, and not strings, or am I wrong? And if so, how does it recognise a variable such as words["ls"]? How does it know to associate that with the number of occurences?
$1 is the first word of the line, $2 is the second, $3 is the third.... $NF is the last.
words[] is something called associative array, it will look like:
words["is"], words["ls"] and so on and all of them are counters which will be incremented by words[$i]++.
$1 is the first word of the line, $2 is the second, $3 is the third.... $NF is the last.
words[] is something called associative array, it will look like:
words["is"], words["ls"] and so on and all of them are counters which will be incremented by words[$i]++.
How come all of them are counters? Is that something specific to arrays?
no, they are incremented (that is ++), so they will be used as counters of occurrence of the indices of the array (which are actually the words appeared in the input text)
Instead of running code on such a large amount of data, use a small subset and check the output for yourself
As an example, from your own post:
Code:
echo 'My question is, how are the strings actually turned into digits? I understand "i" is used as a counter for "words".' | ./vincix.awk
is=1,ls=0,the=1,with=0
I simply don't understand. I've been trying to understand this script for some time, I haven't come to this forum right away. But there's something more complicated in here which might seem obvious to someone more knowledgeable, but not to me.
@grail the code is actually very small. I understand the printf part perfectly. What I really don't understand is how the words[$i] works exactly, that's why I was looking for someone who might break it apart for me.
I know you're trying to be helpful, but for instance what pan64 said I had already understood. I know that's an increment, I know that i is incremented until it reaches NF (by the way, the original script was <NF, not <=NF, which I found strange, because I think it's missing a word if it's at the end of the line, but that's secondary). But it's probably the way awk interprets the results that I don't understand. I'm not sure, really.
So I'd like a more didactic and explicit explanation, if anyone's willing to do that.
So, i is the counter, that's obviously. But if i is the counter, shouldn't it be a number? And yet, there's words[$i] and later one we're talking about words["string"]. Do you know what I mean? How is this translation from string to number actually being made?
Someone said that it's a little bit more similar to objective programming, in a way. Does that make sense?
'i' is the loop counter, which is incremented from 1 to NF. (You appear to be correct about <=NF by the way.)
So you expect the usage words[$i] to be a numeric index, like words[17] when i=17. But that is not correct.
Remember: In awk expressions, the $ operator always references fields of the input stream. When awk sees the $ operator it expects it to be followed by a number and so parses anything else as a variable name and evaluates the value as a number. Non-numeric string values evaluate numerically to zero.
So suppose the 17th field to be the word "with", and i=17, then the expression words[$i] becomes words[$17] which evaluates to words[with], the value of which is then incremented.
The result is an associative array named words, the indexes of which are the input words and the values of which are the accumulated counts per word.
Great explanation! That's exactly what I wanted So that was the whole idea - the way in which awk processes the script and the meaning of $. That's one of things I thought I understood, but only now do I understand it. I thought $i was simply invoking the variable i. I didn't consider it in the awk context. So that's an essential distinction
But even if $ references fields, awk still processes the whole line before going to the next one, right? I mean, it works like sed from this point of view, or does it not?
Thanks! I'm happy someone actually understood what I was trying to say
(thanks for moving the thread. I didn't even know "Programming" actually existed )
But even if $ references fields, awk still processes the whole line before going to the next one, right? I mean, it works like sed from this point of view, or does it not?
The awk programming model has three major parts, two of which are optional. It looks something like this:
Code:
BEGIN{ /*This block, if present, is executed once before the input is processed... */ }
_______________________________________________________________________________________
Main loop, may include multiple blocks and is processed once per line of input
----------- (line 1)
----------- (line 2)
----------- ...
----------- (line n)
_______________________________________________________________________________________
END{ /*This block, if present, is executed once after all input has been processed... */ }
In your script there is no BEGIN block, the for loop constitutes the main loop, and the END block prints the final result. So to answer your question, the main loop processes once per line similar to sed, yes.
At the END there is the challenge to print the whole hashed (text-addressed) array.
(Where the stored values are numbers.)
The usual way is to loop over all the (text-)keys
Code:
END { for (key in words) { printf "%s=%d\n",key,words[key] } }
It is possible to dispense with the for loop and the issue of $1, $2, $3, etc.
With this InFile ...
Code:
Once upon a midnight dreary, while I pondered weak and weary,
Over many a quaint and curious volume of forgotten lore,
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
''Tis some visitor,' I muttered, 'tapping at my chamber door -
Only this, and nothing more.'
But even if $ references fields, awk still processes the whole line before going to the next one, right? I mean, it works like sed from this point of view, or does it not?
Awk works one record at a time. Records are lines by default, but you can change that by changing RS, the record separator (as in danielbmartin's solution).
The solution offered in post #12 is unsatisfactory for words containing embedded punctuation such as won't or don't. I attempted to correct this shortcoming but cannot figure out the syntax.
With this InFile ...
Code:
John likes chicken, prefers turkey, and won't eat ham.
Luke likes chicken McNuggets.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.