ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I'm looking to take a CSV file which will have two fields:-
Customer Number
Balance
The file will have duplicate records which need merging along with the value of the second field (balance). In theory you could have the following file:
I would need them merged into a separate CSV file so that only one record would be found for each customer number, this having the "balance" of the values sum'd.
BEGIN{ FS = ","; getline }
{
# populate array with customer as index and sum of balances as values
balance[$1] = balance[$1] + $2
}
END{ # sort the indices of the array and put them in array indices
n = asorti(balance,indices)
# print the sorted array
for (i = 1; i <= n; i++)
printf "%s, %5.2f\n", indices[i], balance[indices[i]]
}
Would it be too much to ask just to explain a little as to what is going on in the syntax please? I assume the $ is not a variable but the field location?
Would it be too much to ask just to explain a little as to what is going on in the syntax please? I assume the $ is not a variable but the field location?
Yes. In awk the $1, $2, $3... refer to the field number, using FS (an internal built-in variable) as separator. In my version I put FS equal to comma inside the awk code, while in osor's version it is specified with the -F option.
In more details:
Code:
BEGIN{ FS = ","; getline }
this is the BEGIN section of awk, that is the code executed once at the beginning of the process. The getline statement serves to skip the header of the file.
Code:
balance[$1] = balance[$1] + $2
this assign values to the array "balance" using as index the customer number (first field) and as value the cumulated balance (second field). In awk indices of arrays can be any string, not only numeric.
Code:
n = asorti(balance,indices)
this sorts the indices of the array balance and put the sorted indices into a new array called "indices". This function is available since GNU awk 3.1.2. For older versions there are workarounds.
Code:
for (i = 1; i <= n; i++)
printf "%s, %5.2f\n", indices[i], balance[indices[i]]
This finally prints out the desired values using the format specified in the printf statement.
to find out. If you have version prior to 3.1.2 try the following
Code:
BEGIN{ FS = ","; getline }
{
# populate array with customer as index and sum of balances as values
balance[$1] = balance[$1] + $2
}
END{ # sort the indices of the array and put them in array indices
ii = 1
for (i in balance) {
indices[ii] = i
ii++
}
n = asort(indices)
# print the sorted array
for (i = 1; i <= n; i++)
printf "%s, %5.2f\n", indices[i], balance[indices[i]]
}
Edit: osor, beat me this time, but different solution
Not sure about the "first date occurrence" but you can assign the different fields to different arrays using the last field as index. When assigning the date you can previously check if the array element is null, then assign the date in field 2. I'm thinking about something like this:
However, in you example you print out the first date in absolute, not the first date of each file listed in the last column. In that case you have to modify the code to not create the date array, but a single variable.
Hi,
Im trying to reuse this code on this same input file but under Solaris with gawk
Although the translation by profesor Colucix is done perfectly, I dont know how to modify it a bit.
The result I need is slight different:
The input file.txt:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.