LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-21-2008, 01:59 PM   #1
sharky
Member
 
Registered: Oct 2002
Posts: 569

Rep: Reputation: 84
Script / two files and matching multiple columes


I have two files. One is a generated report that list jobs executed on our load sharing facility. It has a column with userids and how much CPU time was spent on each job. Any user could be listed one or more times in no particular order.

example

u4 31.3
u1 61.3
u4 381.2
u3 1.5
u1 34.8
u1 0.3
u5 9.0
u2 111.1
etc...

The second file list each userid, their real name and the dept. they belong to.

example - as delimited but I can remove the double quotes if needed.

"u1" "John Doe" "D1"
"u2" "Jane Doe" "D1"
"u3" "Bart Simpson" "D2"
"u4" "Homer Simpson" "D1"
"u5" "Julius Ceasar" "D3"

The goal is a script to modify the first file so that the dept. and real name is added. I can do this in a spreadsheet but would prefer a more automated method.

I can come with something eventually but what I'm wondering is if there is some trivial awk or perl that would make it easy. I'm always looking for something easy. :-)
 
Old 10-21-2008, 02:58 PM   #2
Disillusionist
Senior Member
 
Registered: Aug 2004
Location: England
Distribution: Ubuntu
Posts: 1,039

Rep: Reputation: 98
Read the files into two seperate arrays
Check the first word from each line of the report array
Compare that to your reference array

Merge the relevant data from the reference and report arrays and create a new output file.

Was going to post code, but thought this smelled a little too much like homework.

If you get stuck, post the code that you have written and we will suggest where you may have gone wrong.
 
Old 10-21-2008, 03:18 PM   #3
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by Disillusionist View Post
Read the files into two seperate arrays
Check the first word from each line of the report array
Compare that to your reference array

Merge the relevant data from the reference and report arrays and create a new output file.

Was going to post code, but thought this smelled a little too much like homework.

If you get stuck, post the code that you have written and we will suggest where you may have gone wrong.
It ain't homework. I'm 48 years old and work work for a living. :-)

I can create a perl script or something to do what you describe. What really had me curious was the possibility of something less tedious.

For example in the raw data file I can calculate the total users CPU usage with a single line of awk:

cat ./usage.report | grep $U | awk '{SUM += $6} END {print $1, SUM}'

A foreach block is wrapped around it to parse through each user.

However, your 'algorithm' looks fairly straightforward and probably as simple as it'll get.

thx,
 
Old 10-21-2008, 03:24 PM   #4
jcookeman
Member
 
Registered: Jul 2003
Location: London, UK
Distribution: FreeBSD, OpenSuse, Ubuntu, RHEL
Posts: 417

Rep: Reputation: 33
quick python -- not pretty :)

Code:
#!/usr/bin/env python2.5

from __future__ import with_statement

users = {}
entries = []

with open('users.map', 'r') as user_map:
    for line in user_map:
        entry = [str.strip('"') for str in line.split()]
        users[entry[0]] = {'uname':entry[1] + ' ' + entry[2],
                           'dept':entry[3]}

with open('job.log', 'r') as jobs:
    for line in jobs:
        uid = line.split()[0]
        entries.append(line.strip() + ' ' +
                       users[uid]['uname'] + ' ' +
                       users[uid]['dept'] + '\n')

print entries
 
Old 10-21-2008, 04:17 PM   #5
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by jcookeman View Post
Code:
#!/usr/bin/env python2.5

from __future__ import with_statement

users = {}
entries = []

with open('users.map', 'r') as user_map:
    for line in user_map:
        entry = [str.strip('"') for str in line.split()]
        users[entry[0]] = {'uname':entry[1] + ' ' + entry[2],
                           'dept':entry[3]}

with open('job.log', 'r') as jobs:
    for line in jobs:
        uid = line.split()[0]
        entries.append(line.strip() + ' ' +
                       users[uid]['uname'] + ' ' +
                       users[uid]['dept'] + '\n')

print entries
I used pearl. This a
Code:
#!/usr/bin/perl

open(USERS, "user_tbl.csv");
open(CPUTIME, "sum_user_cputime.csv");
  while ($users = <USERS>)
    {
      @listusers = split (/ /,$users);
      while ($cputime = <CPUTIME>)
      {
        @listcputime = split (/ /,$cputime);
        if ( "$listusers[0]" eq "$listcputime[0]" )
        {
          # had to chop a linefeed here
          chop $listusers[2];
          print "$listusers[0] $listusers[1] $listusers[2] $listcputime[1]";

        }
      }
      seek CPUTIME, 0, 0;
    }
close USERS;
close CPUTIME;
How do you like python? I here a lot of good things about it - except from the perl worshippers. :-(
 
Old 10-21-2008, 05:05 PM   #6
jcookeman
Member
 
Registered: Jul 2003
Location: London, UK
Distribution: FreeBSD, OpenSuse, Ubuntu, RHEL
Posts: 417

Rep: Reputation: 33
Perl is excellent, but I believe Python is more elegant. I, however, am not a member of any zealot movement. So, use what makes you comfortable.

...that doesn't mean from time to time I don't take a stab at others' expense.
 
Old 10-21-2008, 05:19 PM   #7
forrestt
Senior Member
 
Registered: Mar 2004
Location: Cary, NC, USA
Distribution: Fedora, Kubuntu, RedHat, CentOS, SuSe
Posts: 1,288

Rep: Reputation: 99
If you can easily get file2 to look like:

Code:
u1 "John Doe" "D1"
u2 "Jane Doe" "D1"
u3 "Bart Simpson" "D2"
u4 "Homer Simpson" "D1"
u5 "Julius Ceasar" "D3"
Then you can run:
Code:
awk 'NR==FNR{ users[$1]=$2" "$3" "$4; next } {print $1,users[$1],$2}' file2 file1
What it is doing is testing to see if the total record count (NR) equals the file record count (FNR) which will only be true during the reading of the first file. If it is, then it pushes fields 2, 3 and 4 into a users array with a space between them and goes onto the next record (so that it won't do the printing part). After the first file is read, it starts reading the second (and now NR != to FNR) so it doesn't hit next, and gets the printing part. It prints the first field, the values that were pushed into the users array at the location with an index of the first field, then the second field.

Last edited by forrestt; 10-22-2008 at 09:58 AM. Reason: changed the word "record" to "field" to be accurate.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Bash Script - Rename Multiple Files embsupafly Programming 16 04-02-2010 03:50 AM
script to change multiple files clstanton Linux - Newbie 11 07-21-2008 04:52 AM
AWK/SED Multiple pattern matching over multiple lines issue GigerMalmensteen Programming 15 12-03-2006 05:08 PM
run script on multiple files statmobile Programming 6 07-16-2004 11:35 PM
Need help with shell script - renaming multiple files NiallC Linux - Newbie 25 07-04-2004 10:45 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration