LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-25-2014, 07:30 PM   #1
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Rep: Reputation: Disabled
parse file with awk and sum totals


I attached the input file to Daniel Martin's script.

Code:
#!/bin/bash
./disk-usage-JM.pl | grep -v '/net' | grep -v 'Usage' | grep -v 'Average' | sed s/\(LOCKED\)\://g > jm_out
InFile="jm_out"
OutFile="jm_temp"
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >>$OutFile
which only gives me the output of:
Code:
zahrobsk:     lisa                485 MB
But it should give me output like this:
Code:
adallman:   sideshow: 	bob               12065 MB
adallman:   sideshow: 	mel                 488 MB
adallman:   simpsons: 	bart              32965 MB
afkham:   simpsons: 	lisa             102466 MB
agnewjo:   flanders: 	ned               70847 MB
agnewjo:   flanders: 	rod                2657 MB
ahoang:   flanders: 	rod                2896 MB
akrishna:   flanders: 	ned                3310 MB
akrishna:   moes: 	barney             1850 MB
akrishna:   moes: 	carl              15674 MB
akrishna:   moes: 	lenny             10723 MB
akrishna:   sideshow: 	bob                   0 MB
akrishna:   sideshow: 	mel              101700 MB
akrishna:   simpsons: 	bart                  0 MB
akrishna:   simpsons: 	lisa                  0 MB
alexu    flanders: 	maude              4041 MB
alexu    flanders: 	ned                5011 MB
alexu    simpsons: 	bart               1326 MB
alexu    simpsons: 	marge              1855 MB
alin:   moes: 	lenny               272 MB
alindema:   moes: 	barney                0 MB
alindema:   sideshow: 	bob                   0 MB
*****************TRUNCATED************************
It only works on a smaller subset of the attached file.
Code:
adallman:
  sideshow:
	bob               12065 MB
	mel                 488 MB
  simpsons:
	bart              32965 MB
afkham:
  simpsons:
	lisa             102466 MB
agnewjo:
  flanders:
	ned               70847 MB
	rod                2657 MB
ahoang:
  flanders:
	rod                2896 MB
akrishna:
  flanders:
	ned                3310 MB
  moes:
	barney             1850 MB
	carl              15674 MB
	lenny             10723 MB
  sideshow:
	bob                   0 MB
	mel              101700 MB
  simpsons:
	bart                  0 MB
	lisa                  0 MB
alexu 
  flanders:
	maude              4041 MB
	ned                5011 MB
  simpsons:
	bart               1326 MB
	marge              1855 MB
alin:
  moes:
	lenny               272 MB
alindema:
  moes:
	barney                0 MB
  sideshow:
	bob                   0 MB
Could Daniel or anybody try Daniel's script or your own on the attached file? Thanks.
Attached Files
File Type: txt jm_out.txt (38.0 KB, 19 views)

Last edited by master-of-puppets; 09-27-2014 at 07:29 AM.
 
Old 09-25-2014, 09:46 PM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by master-of-puppets View Post
I'm not asking anybody to write my script for me. ... Any advice?
You might simplify your code by first restructuring the data.

With this InFile ...
Code:
adallman:
  sideshow:
	bob               12065 MB
	mel                 488 MB
  simpsons:
	bart              32965 MB
afkham:
  simpsons:
	lisa             102466 MB
agnewjo:
  flanders:
	ned               70847 MB
	rod                2657 MB
ahoang:
  flanders:
	rod                2896 MB
akrishna:
  flanders:
	ned                3310 MB
  moes:
	barney             1850 MB
	carl              15674 MB
	lenny             10723 MB
  sideshow:
	bob                   0 MB
	mel              101700 MB
  simpsons:
	bart                  0 MB
	lisa                  0 MB
alexu (LOCKED):
  flanders:
	maude              4041 MB
	ned                5011 MB
  simpsons:
	bart               1326 MB
	marge              1855 MB
alin:
  moes:
	lenny               272 MB
alindema:
  moes:
	barney                0 MB
  sideshow:
	bob                   0 MB
... this awk ...
Code:
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >$OutFile
... produced this OutFile ...
Code:
adallman:   sideshow: 	bob               12065 MB
adallman:   sideshow: 	mel                 488 MB
adallman:   simpsons: 	bart              32965 MB
afkham:   simpsons: 	lisa             102466 MB
agnewjo:   flanders: 	ned               70847 MB
agnewjo:   flanders: 	rod                2657 MB
ahoang:   flanders: 	rod                2896 MB
akrishna:   flanders: 	ned                3310 MB
akrishna:   moes: 	barney             1850 MB
akrishna:   moes: 	carl              15674 MB
akrishna:   moes: 	lenny             10723 MB
akrishna:   sideshow: 	bob                   0 MB
akrishna:   sideshow: 	mel              101700 MB
akrishna:   simpsons: 	bart                  0 MB
akrishna:   simpsons: 	lisa                  0 MB
alexu (LOCKED):   flanders: 	maude              4041 MB
alexu (LOCKED):   flanders: 	ned                5011 MB
alexu (LOCKED):   simpsons: 	bart               1326 MB
alexu (LOCKED):   simpsons: 	marge              1855 MB
alin:   moes: 	lenny               272 MB
alindema:   moes: 	barney                0 MB
alindema:   sideshow: 	bob                   0 MB
Does this transformation make life easier?

Daniel B. Martin
 
Old 09-25-2014, 09:56 PM   #3
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
You might simplify your code by first restructuring the data.

With this InFile ...
Code:
adallman:
  sideshow:
	bob               12065 MB
	mel                 488 MB
  simpsons:
	bart              32965 MB
afkham:
  simpsons:
	lisa             102466 MB
agnewjo:
  flanders:
	ned               70847 MB
	rod                2657 MB
ahoang:
  flanders:
	rod                2896 MB
akrishna:
  flanders:
	ned                3310 MB
  moes:
	barney             1850 MB
	carl              15674 MB
	lenny             10723 MB
  sideshow:
	bob                   0 MB
	mel              101700 MB
  simpsons:
	bart                  0 MB
	lisa                  0 MB
alexu (LOCKED):
  flanders:
	maude              4041 MB
	ned                5011 MB
  simpsons:
	bart               1326 MB
	marge              1855 MB
alin:
  moes:
	lenny               272 MB
alindema:
  moes:
	barney                0 MB
  sideshow:
	bob                   0 MB
... this awk ...
Code:
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >$OutFile
... produced this OutFile ...
Code:
adallman:   sideshow: 	bob               12065 MB
adallman:   sideshow: 	mel                 488 MB
adallman:   simpsons: 	bart              32965 MB
afkham:   simpsons: 	lisa             102466 MB
agnewjo:   flanders: 	ned               70847 MB
agnewjo:   flanders: 	rod                2657 MB
ahoang:   flanders: 	rod                2896 MB
akrishna:   flanders: 	ned                3310 MB
akrishna:   moes: 	barney             1850 MB
akrishna:   moes: 	carl              15674 MB
akrishna:   moes: 	lenny             10723 MB
akrishna:   sideshow: 	bob                   0 MB
akrishna:   sideshow: 	mel              101700 MB
akrishna:   simpsons: 	bart                  0 MB
akrishna:   simpsons: 	lisa                  0 MB
alexu (LOCKED):   flanders: 	maude              4041 MB
alexu (LOCKED):   flanders: 	ned                5011 MB
alexu (LOCKED):   simpsons: 	bart               1326 MB
alexu (LOCKED):   simpsons: 	marge              1855 MB
alin:   moes: 	lenny               272 MB
alindema:   moes: 	barney                0 MB
alindema:   sideshow: 	bob                   0 MB
Does this transformation make life easier?

Daniel B. Martin
Thanks Daniel I'll plug it in and try it out. Amazing.
 
Old 09-25-2014, 10:45 PM   #4
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by master-of-puppets View Post
Thanks Daniel I'll plug it in and try it out. Amazing.
I ran the script:

Code:
+ ./disk-usage-JM.pl
+ InFile=jm_out
+ OutFile=jm_temp
+ awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' jm_out
+ rm jm_out
and this is all I got:
Code:
[20:41:33] ituser@sideshow:/share/es-ops/scripts/BUILD_FARM $ cat jm_temp
/net/simpsons/export/ws/simpsons-ws1/woesteho is owned by locked user.
Usage by user / host / workspace directory:
zzhu (LOCKED):     flanders-ws1       4857 MB
Not sure what happened?

Here's the script:
Code:
#!/bin/bash
./disk-usage-JM.pl > jm_out
InFile="jm_out"
OutFile="jm_temp"
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >$OutFile
rm jm_out
Did I miss something?
 
Old 09-26-2014, 03:42 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
I would do the following:
Code:
awk '/^[^ ]*:$/ { user = ...; next }
    '/^ .*:$/   { server = ...; next }
    { workspace = $1;
      sum_s[server] += $3;
      sum_u[user]   += $3;
      sum_w[workspace] += $3;
      sum_w_s[workspace][server] += $3;
    # whatever kind of sum do you want
   }
   END { loop on servers, users, workspaces and print what you want }
 
Old 09-26-2014, 07:29 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by master-of-puppets View Post
Did I miss something?
I don't understand anything in your script. This may be my limitation, not yours. This is the entire script I wrote (and ran) to respond to your Original Post.
Code:
#!/bin/bash
# Daniel B. Martin   Sep14
#
# To execute this program, launch a terminal session and enter:
# bash /home/daniel/Desktop/LQfiles/dbm1250.bin
#
# This program inspired by:
#  http://www.linuxquestions.org/questions/programming-9/
#    parse-file-with-awk-and-sum-totals-4175520091/
# File identification
    Path=${0%%.*}
  InFile=$Path"inp.txt"
 OutFile=$Path"out.txt"

echo "Method of LQ Member danielbmartin"
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >$OutFile
echo "OutFile ..."; cat $OutFile; echo "End Of File"     

echo; echo "Normal end of job.";echo; exit
Daniel B. Martin

Last edited by danielbmartin; 09-26-2014 at 10:51 AM. Reason: Cosmetic improvement, no change to code
 
Old 09-26-2014, 11:24 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Here is an alternate on pan64's wavelength:
Code:
awk -F"[ :]+" 'NF == 2{ user   = $1 }
               NF == 3{ server = $2 }
               NF  > 3{ ws     = $2;
                        sum_s[server] += $3;
                        sum_u[user]   += $3;
                        sum_w[ws]     += $3
                      }
   END { loop on servers, users, workspaces and print what you want }' file
 
Old 09-26-2014, 03:50 PM   #8
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
I don't understand anything in your script. This may be my limitation, not yours. This is the entire script I wrote (and ran) to respond to your Original Post.
Code:
#!/bin/bash
# Daniel B. Martin   Sep14
#
# To execute this program, launch a terminal session and enter:
# bash /home/daniel/Desktop/LQfiles/dbm1250.bin
#
# This program inspired by:
#  http://www.linuxquestions.org/questions/programming-9/
#    parse-file-with-awk-and-sum-totals-4175520091/
# File identification
    Path=${0%%.*}
  InFile=$Path"inp.txt"
 OutFile=$Path"out.txt"

echo "Method of LQ Member danielbmartin"
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >$OutFile
echo "OutFile ..."; cat $OutFile; echo "End Of File"     

echo; echo "Normal end of job.";echo; exit
Daniel B. Martin
Hi Daniel thanks for checking back in. I ran your script against the truncated version of the whole file and I got the same results that you did. I uploaded the input file so you could try it out. It's way too long to paste into a code bracketed field on here.

I modified the script slightly to remove unneeded lines and the ever popular '(LOCKED):' string.

Code:
#!/bin/bash
./disk-usage-JM.pl | grep -v '/net' | grep -v 'Usage' | sed s/\(LOCKED\)\://g > jm_out
InFile="jm_out"
OutFile="jm_temp"
awk '{if (!match(" \t",substr($0,1,1))) {server=$0; next}
      if (substr($0,1,1)==" ") {workspace=$0; next}
        print server,workspace,$0}' \
  $InFile >>$OutFile
 
Old 09-26-2014, 03:53 PM   #9
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Here is an alternate on pan64's wavelength:
Code:
awk -F"[ :]+" 'NF == 2{ user   = $1 }
               NF == 3{ server = $2 }
               NF  > 3{ ws     = $2;
                        sum_s[server] += $3;
                        sum_u[user]   += $3;
                        sum_w[ws]     += $3
                      }
   END { loop on servers, users, workspaces and print what you want }' file
The print what you want part is being able to print the top 5 space consumers for each workspace. I can probably figure out how to do that but can you at least check out the version of the input file jm_out that I attached and see what you can do with it? Thanks so much for your interest and help.
 
Old 09-26-2014, 03:55 PM   #10
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
I would do the following:
Code:
awk '/^[^ ]*:$/ { user = ...; next }
    '/^ .*:$/   { server = ...; next }
    { workspace = $1;
      sum_s[server] += $3;
      sum_u[user]   += $3;
      sum_w[workspace] += $3;
      sum_w_s[workspace][server] += $3;
    # whatever kind of sum do you want
   }
   END { loop on servers, users, workspaces and print what you want }
I will try it when I get home tonight I have to invent the 'do what you want' part. Thanks so much for yout help. I've attached the input file 'jm_out'.
 
Old 09-27-2014, 06:08 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Well I am big believer in helping and not doing. So here is a nudge in the right direction. You will need to figure out the logic on how to best sort the data and only show 5 in each section.
Code:
#!/usr/bin/awk -f

BEGIN{ FS = "[ :]+" }

NF == 2{ user   = $1 }
NF == 3{ server = $2 }

NF > 3{
  ws = $2

  sum[server][ws][user] += $3
}

END{
  for(i in sum)
  {
    print "Server: ",i
    for(j in sum[i])
    {   
      print "Top 5 space consumers on",j":"
      for(k in sum[i][j])
        print k," with size ",sum[i][j][k]
    }   
  }
}
 
Old 09-27-2014, 07:11 AM   #12
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Daniel,

I attached the full file finally. I realized that I didn't have the ".txt" extension on the file previously. If you have time and you wouldn't mind, can you try your script on the attached file as the input file? Thanks so much.
Attached Files
File Type: txt jm_out.txt (38.0 KB, 16 views)
 
Old 09-27-2014, 09:21 AM   #13
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by master-of-puppets View Post
Daniel ... can you try your script on the attached file as the input file?
The sample input file contains tab characters. The large "real world" input file does not. Please investigate and reconcile this discrepancy.

Daniel B. Martin
 
Old 09-27-2014, 10:53 AM   #14
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
So this file is generated with a perl script. I would suggest you to sum up those things in that perl script. In perl you can print the required result easily there is a built-in sort function. (so print what you want means reverse sort list and print the ordered result)
 
Old 09-27-2014, 01:43 PM   #15
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
The sample input file contains tab characters. The large "real world" input file does not. Please investigate and reconcile this discrepancy.

Daniel B. Martin
Daniel I know what caused the no tabs. It's the sed in the first line that gets rid of the locked string. I'll get rid of it when I get home. Thanks for pointing that out.

Last edited by master-of-puppets; 09-27-2014 at 01:44 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sum up values from each columns (awk) lcvs Linux - Newbie 10 06-20-2012 04:16 AM
Help needed for using awk to parse a file to make array for bash script tallmtt Programming 12 04-14-2012 01:16 PM
awk's sum not working? grob115 Linux - Newbie 4 04-29-2010 12:44 PM
bash: use file as input into array, parse out other variables from array using awk beeblequix Linux - General 2 11-20-2009 10:07 AM
ssimple shell script to parse a file ~sed or awk stevie_velvet Programming 7 07-14-2006 03:41 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:06 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration