LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sum up values from each columns (awk) (https://www.linuxquestions.org/questions/linux-newbie-8/sum-up-values-from-each-columns-awk-4175412191/)

lcvs 06-19-2012 02:27 AM

sum up values from each columns (awk)
 
I have different files with variable numbers of columns.

What I try to do is to output the values of each columns:
- with the single value if it is the same in the entire column,
- or the values separated by "," if it exist different ones in the same column

example 1:
input:
1|2|3|4
1|1|3|4
1|2|3|3

output:
1|1,2|3|3,4

example 2:
input:
1|2
1|3
7|2

output:
1,7|2,3

(the order of numbers separated by "," doesn't matter)

Thanks for your help !

pan64 06-19-2012 03:26 AM

you can define associative arrays for every column: col1, col2, col3, col4. line is automatically split into $1, $2, $3 ... (-F\| is used to define separator). col1[$1] = 1; col2[$2] = 2 ... will define the element of arrays. finally (using END) you need to print the defined elements of those arrays







_____________________________________
If someone helps you, or you approve of what's posted, click the "Add to Reputation" button, on the left of the post.
Happy with solution ... mark as SOLVED
(located in the "thread tools")

lcvs 06-19-2012 04:10 AM

Hi Pan 64, thanks for your help !

I am not sure to understand the term "associative array".
Is it the same as an array that you define when using the split function?

If we look at the first input I wrote:
Code:

1|2|3
1|1|3
1|2|3

Are these values equivalent to these elements below?
Code:

col1[$1]|col2[$1]|col3[$1]
col1[$2]|col2[$2]|col3[$2]
col1[$3]|col2[$3]|col3[$3]


pan64 06-19-2012 04:14 AM

see here, you will find some useful tips





_____________________________________
If someone helps you, or you approve of what's posted, click the "Add to Reputation" button, on the left of the post.
Happy with solution ... mark as SOLVED
(located in the "thread tools")

lcvs 06-19-2012 08:09 AM

ok, associative arrays are still not very clear for me, but maybe something that would look like that:
Code:

BEGIN{FS=OFS="|"}

{
    for (i=1; i<=NF; i++){
        coli[$i]++
            if(coli[$i] <is unique????>){
                print $i}

            else
                print coli[$i] <with a "," somewhere ?????>
}
}


pan64 06-19-2012 10:38 AM

that's fine, you are almost ready.
Code:

function printarray (a)
{
    for (i=1; i<=length(a); i++)
    {
        print a[i];
        if ( i<length(a) ) printf ",";
    } 
}

this will print an array. Hope this helps


_____________________________________
If someone helps you, or you approve of what's posted, click the "Add to Reputation" button, on the left of the post.
Happy with solution ... mark as SOLVED
(located in the "thread tools")

chrism01 06-19-2012 06:59 PM

An associative array is often referred to as a hash table, although strictly speaking one is an implementation of the other

https://en.wikipedia.org/wiki/Associative_array
Quote:

In computer science, an associative array, map, or dictionary is an abstract data type composed of a collection of (key,value) pairs, such that each possible key appears at most once in the collection.
.
.
.
The dictionary problem is the task of designing a data structure that implements an associative array. A standard solution to the dictionary problem is a hash table;
https://en.wikipedia.org/wiki/Hash_table
Quote:

Thus, a hash table implements an associative array.

lcvs 06-20-2012 01:18 AM

Quote:

Code:

function printarray (a)
{
    for (i=1; i<=length(a); i++)
    {
        print a[i];
        if ( i<length(a) ) printf ",";
    }
}



- What is the point of "function printarray (a)" ?
- Why we don't mention fields?
- Does "length" refers to the length of array a, in other words the number of lines in the column?

pan64 06-20-2012 02:19 AM

this function will print the list of the values of the array passed to the func (separated by a comma). We do not need fields, records and other funny things, just a for loop. length is a built-in function (of awk), it will return the length of a string or the length of an array (see man page).





_____________________________________
If someone helps you, or you approve of what's posted, click the "Add to Reputation" button, on the left of the post.
Happy with solution ... mark as SOLVED

lcvs 06-20-2012 04:03 AM

I don't really see where I am going with that.
I need to learn a bit more I guess.

Thanks anyway !

pan64 06-20-2012 04:16 AM

the function is used to print col1, col2, col3 and col4, the four arrays where you collected your data







_____________________________________
If someone helps you, or you approve of what's posted, click the "Add to Reputation" button, on the left of the post.
Happy with solution ... mark as SOLVED


All times are GMT -5. The time now is 12:31 AM.