How can I sort out the first and last of a list of uniques?!?!!

vous · 03-21-2005, 09:03 AM

Hello All,

I am trying to analyze a whole bunch of entries in my log files, and I'm stuck at this point...

I would like to go through this list:

20050312,RED,54343717
20050313,RED,54343717
20050316,RED,54343717
20050317,RED,54343717
20050318,RED,54343717
20050311,BLUE,54355389
20050318,BLUE,54355389
20050318,GREEN,54355555
20050320,GREEN,54355555

And get the following out:

1- Based on the 2nd and 3rd fields (thus for the first entry it would be this for example: "RED,54343717" <---so this would be used as the unique identifier) I would like to know 3 things:

a. How many occurances are of each (regardless of the first date field obviously).
b. When was the first time it came up.
c. When was the last time it came up.

Any thoughts?

farmerjoe · 03-21-2005, 09:11 AM

I think there are a number of ways to do this, but i will post what just popped in my head.

User grep.

Quote:

a. How many occurances are of each (regardless of the first date field obviously).

grep -c "RED,54343717" $FILE

##This should give you the number of times it found that string.

Quote:

b. When was the first time it came up.

grep -n "RED,54343717 $FILE | head -1

##This should give you the first match it found, along with the line number.

Quote:

c. When was the last time it came up.

grep -n "RED,54343717 $FILE | tail -1

#This should give you the last match it found, along with the line number.

vous · 03-21-2005, 09:21 AM

Indeed, that's how you could do it one by one.

What I need is to have this done inside a loop that would produce this output automatically based on that list.

Expected output would be something like:

RED,54343717 .......5 times....First time: 20050312....Last time:20050318
BLUE,54355389......2 Times....First time:20050311.....Last time: 20050318
GREEN,54355555.....2 Times....Firsttime:20050318....Last time: 20050320

Any thoughts how this could be done in a loop????

keefaz · 03-21-2005, 11:04 AM

Try this php code :

PHP Code:



#!/usr/bin/php 
<?php 
if(count($argv) < 2) { 
    die("Usage: ".$argv[0]." file..."); 
} 
 
$DATAS = Array(); 
 
$file = fopen($argv[1], "r"); 
 
if(!$file) { 
    die("could not open ".$argv[1]." in reading mode\n"); 
} 
 
while(($data = fgetcsv($file, 128, ",")) !== FALSE) { 
    $DATAS[$data[1]][$data[2]][] = $data[0]; 
} 
 
fclose($file); 
 
foreach($DATAS as $color => $data) { 
    $i = 0; 
    foreach($data as $number => $dates) { 
        $i = count($dates); 
        echo "$color,$number......$i"; 
        echo "Times.....First time:"; 
        echo $dates[0].".....Last time:".$dates[$i-1]."\n"; 
    } 
} 
?>

Call it for example: report.php, chmod +x it and run it as
./report.php /path/to/your/file

homey · 03-21-2005, 11:20 AM

Edit: Got it!

keefaz · 03-21-2005, 11:24 AM

I just edited my post, also make sure your file looke like :
20050312,RED,54343717
20050313,RED,54343717
20050316,RED,54343717
20050317,RED,54343717
20050318,RED,54343717
20050311,BLUE,54355389
20050318,BLUE,54355389
20050318,GREEN,54355555
20050320,GREEN,54355555

Note that fgetcsv($file, 128, ",") while explode the current line with a "," delimiter
and in my loop, 3 fields are required ($data[0],[1],[3])

vous · 03-21-2005, 12:12 PM

thanks keefaz, it'll take me a bit till I transport this to bash....

I'll let you know what comes out of it...

TheLinuxDuck · 03-21-2005, 04:54 PM

Ok, so I'm a big fat nerd. People always tell me that I should use a different tool, than bash, for some of the scripting things that I want to do.. and I scoff.. openly.. and mockingly.. (=

So, using farmerjoe's grep examples, I created a bash script to do this.. I noticed you said something about converting it to bash or something, so here's my take on it:

Code:

#!/bin/bash

datafile='./data';

#  hold which items have been found and displayed
#
used=""

#  assume that no line will ever contain a space
#
for line in `cat $datafile`; do
  pieces=( `echo "$line" | tr ',' ' '` )  # 0-date 1-colorname 2-value

  #  see if we've examined this one before
  #
  item="${pieces[1]},${pieces[2]}"
  in_used=`echo "$used" | grep "$item"`

  #  if it's not yet been examined
  #
  if [ -z "$in_used" ]; then
    #  first, add it to the used list
    #
    used="$used $item"

    count_l=`grep -c "$item" $datafile`
    first_l=( `grep "$item" $datafile | head -1 | tr ',' ' '` )
    last_l=( `grep "$item" $datafile | tail -1 | tr ',' ' '` )
    printf "$item\t$count_l times\tfirst: $first_l\tlast: $last_l\n"
  fi
done

Assuming that the file "./data" contains the data you listed above, and assuming that the file is in cronological order by date, the output should be:

Code:

~/bash> ./parse.sh
RED,54343717    5 times first: 20050312 last: 20050318
BLUE,54355389   2 times first: 20050311 last: 20050318
GREEN,54355555  2 times first: 20050318 last: 20050320

vous · 03-22-2005, 09:05 AM

Hi "TheLinuxDuck",

thanks a bunch!

This did it!