LinuxQuestions.org - How to multiply columns and add them from a file?

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - How to multiply columns and add them from a file? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-multiply-columns-and-add-them-from-a-file-4175450356/)

How to multiply columns and add them from a file?

I have a file similar to this:

3 hello 3.4
5 hi 4.5
1 hey 4.4

(The second column can be completely ignored)

I am attempting to multiply the two numbers (first and third column) and then add those numbers together. The number should be stored in variable.

The variable in this case would equal 37.1

3 * 3.4
+ 5 * 4.5
+ 1 * 4.4
-------
37.1

I'm new to shell scripting and I am messing around with awk to get it done but having difficulties. Thank you for your time.

You are absoulutely on the right track with awk.

Can you give us some more info, what have you been able to do so far with awk?

I have this so far:

while read a b c; do echo $a*$c | bc; done < testfile

this gives me the three numbers as output:

10.2
22.0
4.4

and i found this code online:

awk '{s+=$1}END{print s}'

which gives me the desired output!

While piping through bc will work I think you'll find if you look into awk a bit more you'll come up with some cleaner, more scalable code. It does very well at handling arithmetic and field delination on it's own.

Your current awk is close, you just need to consider what other fields need to be included on the right side of the equation.

Since this sounds a little like a homework assignment, I'll show you one way to do it in PERL, and then you can convert the script to use shell and/or awk if that is what is required. That way you have to do at least do SOME work if indeed this is homework assignment. ;) If you are wanting to learn this just for your own knowledge enhancement, I recommend you learn PERL. I think PERL is much better for this type of thing than shell/awk.

Assuming your data is in the file "mydata" and in the format that you illustrated in your initial post:

Code:

cat mydata | perl -e 'while (<>) { @_ = split; $total += $_[0] * $_[2]; } print $total, "\n";'

Looking at the above command written as a single script in a more "standard" code layout for easier readability:

Code:

#!/usr/bin/perl

while (<DATA>) {

    @_ = split;

    $total += $_[0] * $_[2];

}

print $total, "\n";

__END__

3 hello 3.4

5 hi 4.5

1 hey 4.4

Whilst the perl example is useful ... please do not use cat in this way ... see here for reasons why.

And I would totally disagree with this statement:

Quote:

I think PERL is much better for this type of thing than shell/awk.

Always use the right tool for the job and in this instance, awk is more than appropriate.

Quote:

Originally Posted by grail (Post 4893485)

Whilst the perl example is useful ... please do not use cat in this way ... see here for reasons why.

Any reasons for saving one process by not using cat like this are overshadowed by the clarity added by seeing your data file up front in the command string. It is immediately obvious that you are sending a file to be processed. Piping data to stdin also allows for simple and easy to understand modification of the command in the future where you might not obtain your original data via something simple like "cat mydata". For example, what if the example was changed in the future to "my data is in several files named mydata1, mydata2, ..." As the complexity of gathering your source data grows it becomes much harder to understand, and I propose less efficient, to use backticks and redirection to stdin via "<". But you don't have to use cat like this if you don't want. There are a million different ways to do things in a *nix environment.

If your sole reason to recommend against cat is to save one extra process, I submit that this is a very weak reason, and I would prefer to eat that extra process overhead in order to provide additional clarity in 99.999% of circumstances. If using cat this way actually hurt anything, then I might agree with you. But it doesn't, so saying this is "wrong" is nothing more than a mental gymnastics exercise. Besides, recommending using shell and awk over PERL results in even more extra processes and higher resource usage. Shell scripts are not efficient by any means. Doing multiplication and addition in a shell script is horrendously inefficient. PERL would beat a shell script on this in spades. And of course PERL could be beat at the task with a compiled language, or even more so with assembly code. Shell scripts may be time-wise efficient to write, but system-wise they are not good regarding resource usage. This is all relative though. Using shell scripts is not a bad thing. Nobody would argue that you should do this kind of stuff in, for example, assembly code to save resources.

Quote:

Always use the right tool for the job and in this instance, awk is more than appropriate.

But it also uses more resources than PERL, which I thought was a point you were trying to make initially regarding the "extra" use of cat.

My point for recommending PERL, is that it is a far better tool for a wide range of general purpose scripting. PERL pretty much combines stuff that originated in shell, in awk, and in higher level programming languages into a very nice package. For a new person wanting to learn scripting in a *nix environment, I would recommend "Learn enough shell to get by, totally skip awk, and learn PERL". There is really nothing awk can do that PERL cannot do better. I used to use shell and awk quite extensively, but that was 20 to 30 years ago. With PERL, there really is no reason to go back to those days. There are other scripting languages that are worth learning too, Python, etc., but I would recommend those for later. I have not found any compelling reason to do much with Python given that PERL is readily available, even though Python is a nice little language. sed is good to know also. That is more specialized, but definetly worthwhile to have in your toolkit.

Anyway, if the homework assignment (assuming that is what it was) said to use shell and awk, then that is how the assignment should be done. I gave a PERL example for the specific purpose of NOT doing the homework assignment for somebody. Rather, my intent was to give examples of the thinking process needed to complete the assignment. There is enough similarity between PERL and awk in a few concepts that I feel seeing PERL code might trigger one to say, "Hey, I can do that in awk too!" And then the student goes off and implements the task using awk, learning something in the process. My bringing up PERL as preferable is based on my own decades and decades of work experience with *nix and programming in most every available language. It the OP's request is not a student homework assignment, I still submit that if you're going to learn about scripting for your own personal enrichment, there are better things than awk to study. PERL being one of those.

If you absolutely must have the file first on the line, you can still get rid of cat and just use a straight redirection.

Code:

<data.txt awk .....

It doesn't matter where redirections appear in the command, only the order that they come in, left to right.

True, but as a matter of course in the real world, you never see it written that way.

Well I would quote more of your response but the only part needed I feel is this:

Quote:

But it also uses more resources than PERL, which I thought was a point you were trying to make initially regarding the "extra" use of cat.

No this was not my point completely. Whilst i can understand your choice to want the files at the start of the line I see no relevance to this as I see no
reason to deliver data to a program that can read a file itself to now have to come from stdin.

My one argument I would hold against your method would be that this practice of using cat becomes so second nature that people also like to use it in the following format (as they are used to it):

Code:

cat file | while read ...

Whilst on the surface you are correct this is harmless enough, try getting data from any of your variables that have been updated inside the while loop!

Lastly, your point about multiple files is flawed as I see less benefit in trying to add multiple names to the start, ie insert prior to pipe, the simply add them at the end.

Perl will do the right thing, and read files listed on its commandline, one by one, rather than reading stdin. Perl doesn't care whether the list of files is of length 0, or 1, or any very large number. Piping input to a Perl script is un-needed (but acceptable, if needed).

Code:

#!/usr/bin/perl



  while(<>){

      #  read input here. Might be stdin, or might be any 

      #  of a series of files. We just read it.

      if( $_ =~ m/^\s*#/ ){

          print "'$_' could be a comment\n";

      }

  }

grail's argument that sloppy scripting becomes a habit is well taken. If you get in the habit of doing the right thing, you will be well served, even if it only serves you infrequently. Useless use of cat is sloppy scripting.

--- rod.

Folks... lets focus on what the OP is after here and not what the best language is. This is like arguing what the best distro is and we could go on all day.

This is where the OP is at:

Code:

awk '{s+=$1}END{print s}'

Now I applaud your nice Perl script but that's a huge jump and misdirection from where he is at, I'm sure it could be very overwhelming as well.

Awk is a great language to learn for this, think about grail's post you are almost there you just need to do a little more equation to get there. You could also try setting a variable in your first sequence to store the individual multiplication totals. It's not best practice but writing it out in a simple matter such as that and then making it better will help you understand what is happening.

Awk reads lines by default and so doesn't need a while loop to iterate records.

It also doesn't normally need a whle lopp to iterate fields. Given tht the field-separtor is unaltered, f$1 refers to th first field, $2 to the second and so on. So $1*$3 give the product of the first and third fields for each record.

Because of it's C like structure, a+=b implies a is assigned the value of a+b. Variables don't have to be declared before first use.

So you may suitable change the one-liner you got off the web awk '{s+=$1}END{print s}' to achieve what you need.

OK