LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-06-2016, 07:43 PM   #1
beca123456
LQ Newbie
 
Registered: Apr 2016
Posts: 10

Rep: Reputation: Disabled
awk: print sum of array values and remaining record


Hi,

input:
Code:
comment_1|A_(1);B_(2)|comment_2|comment_3
comment_4|A_(3)|comment_5|comment_6
comment_7|A_(4);B_(5);C_(6);C_(7)|comment_8|comment_9
output:
Code:
comment_1|3|A_(1);B_(2)|comment_2|comment_3
comment_4|3|A_(3)|comment_5|comment_6
comment_7|22|A_(4);B_(5);C_(6);C_(7)|comment_8|comment_9
Details:
The input file has a constant number of fields (NR=4).
Using awk, I am trying to place (as the second field of the output) the sum of the figure(s) between brackets in field 2 of the input.

So far, I have only been able to simplify the second field of the input and print the entire corresponding record using this code,
Code:
gawk 'BEGIN{FS=OFS="|"}{sum1=$2; print gensub(/(^[^_]+_\()|(\))|([^;]+_\()/,"","g",sum1) FS $0}'
ending up with this .temp output:
Code:
1;2|comment_1|A_(1);B_(2)|comment_2|comment_3
3|comment_4|A_(3)|comment_5|comment_6
4;5;6;7|comment_7|A_(4);B_(5);C_(6);C_(7)|comment_8|comment_9
From this output, I cannot manage to get the sum of the figures in $1. I know how to sum values from the same column but not the same record/array.

Last edited by beca123456; 05-06-2016 at 07:48 PM. Reason: color coding did not work
 
Old 05-06-2016, 09:16 PM   #2
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,339

Rep: Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963
You want to extract the numbers from the $2 field of your input. A regular expression to match numbers is '[:digit:]+'. This can be used in the patsplit function.
Code:
awk -F"|" '{t=0; n=patsplit($2,a,/[[:digit:]]+/); for (i in a) t+=a[i]; print n,t}' input

Last edited by allend; 05-06-2016 at 09:18 PM.
 
1 members found this post helpful.
Old 05-07-2016, 02:47 PM   #3
beca123456
LQ Newbie
 
Registered: Apr 2016
Posts: 10

Original Poster
Rep: Reputation: Disabled
Thanks allend !
I have never heard about patsplit before, it is great !

Although, it seems to work only with gawk ( you used awk in your command).
 
Old 05-07-2016, 03:05 PM   #4
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
If you don't speak awk fluently, you could use Python.

Code:
#!/usr/bin/env python3

toInsert = 0
outFile = open("beca_new.txt", "w")

# Open the file
with open("beca.txt") as inFile:
	for line in inFile: # Read line by line
		listLine = line.split('|') # Split into list using | as delim
		for char in listLine[1]: # Add up the digits in index 1
			if char.isdigit():
				toInsert += int(char)
		listLine.insert(1, toInsert) # Insert into our list at index 1
		toInsert = 0
		listLine[1] = str(listLine[1])
		print("|".join(listLine), end="") # Print or...
		outFile.write("|".join(listLine)) # write to file

outFile.close()

exit(0)
Yes, a lot more characters, I know. But it gets the job done:
Code:
./beca.py 
comment_1|3|A_(1);B_(2)|comment_2|comment_3
comment_4|3|A_(3)|comment_5|comment_6
comment_7|22|A_(4);B_(5);C_(6);C_(7)|comment_8|comment_9
I have to admit though, that the awk oneliner was impressive in its, err, incomprehensibleness (yes, that's a word - I just now invented it).

Best regards,
HMW
 
Old 05-07-2016, 06:50 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,409

Rep: Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056
Quote:
Originally Posted by beca123456 View Post
Although, it seems to work only with gawk ( you used awk in your command).
It is documented as a gawk extension - you should check the doco yourself once offered a solution.
 
Old 05-07-2016, 08:56 PM   #6
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,339

Rep: Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963Reputation: 1963
'awk' is symlinked to 'gawk' on my setup. I believe patsplit appeared circa 2010.

"incomprehensibleness" has an air of incomprehensibility about it.
 
Old 05-07-2016, 09:14 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,409

Rep: Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056Reputation: 3056
I hope this doesn't add to the incomprehensibolity, but you could replicate the patsplit with "split" using the regex allend posted earlier and use the sep array to sum.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
print values from array (Is there a better way?) awreneau Programming 2 04-29-2013 09:27 PM
[SOLVED] How to sum only specific column values in a row using awk? jv61 Linux - Newbie 3 01-17-2013 12:41 PM
sum up values from each columns (awk) lcvs Linux - Newbie 10 06-20-2012 05:16 AM
awk to remove first 3 lines and print remaining $1, $2 fields phyx Linux - General 1 01-10-2007 06:21 PM
How to return values into an array using awk Helene Programming 1 05-01-2004 11:05 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration