[SOLVED] Counting repeated entries

khandu · 06-16-2016, 12:06 AM

Guys, I am lost here..

I have a file with following format

Quote:

dn: 1st
cn: asdsd
cn: adasd
sn: ad

dn: 2nd
cn: 123
cn: 12312
cn: 1
sn: aansd
sn: aa

Want to get such an output

Quote:

dn: 1st
cn: 2 (total)
sn: 1 (total)

dn: 2nd
cn: 3 (total)
sn: 2 (total)

The format doesn't to be as above.. but basically print the dn line and print how many times cn is repeating for each dn.. i can feed in the attribute name manually so it does not have to magically find all attributes and count them.. So can pipe the file into some awk or something with mentioning dn , cn, sn and it will print the above

Thanks

ondoho · 06-16-2016, 01:50 AM

have you considered using uniq?

Code:

man uniq
(...)
 -c, --count
              prefix lines by the number of occurrences

khandu · 06-16-2016, 01:52 AM

Quote:

Originally Posted by ondoho

have you considered using uniq?

Code:

man uniq
(...)
 -c, --count
              prefix lines by the number of occurrences

Yes.. But how do u use it to get the above output? its a huge file with multiple dn prints with counts under it

bigearsbilly · 06-16-2016, 02:23 AM

quiet day at work

Code:

#!/usr/bin/perl -w

use strict;
use Data::Dumper;

my $dn;
my %H ;

while (<>) {

    # print;   # debug
    chomp;
    /^dn:/ and $dn = $_;
    next unless defined $dn;
    $H{$dn}->{count}++ if /^cn:/;

}

print Dumper(\%H);

Code:

[billy@donald:0]$ ./khandu.pl 1

dn: 1st
cn: asdsd
cn: adasd
sn: ad

dn: 2nd
cn: 123
cn: 12312
cn: 1
sn: aansd
sn: aa 
$VAR1 = {
          'dn: 2nd' => {
                         'count' => 3
                       },
          'dn: 1st' => {
                         'count' => 2
                       }
        };

khandu · 06-16-2016, 02:29 AM

awesome.. how do i do it for multiple counts

like

Quote:

dn: 2nd
cn: 123
cn: 12312
cn: 1
sn: aansd
sn: aa

dn: 2nd
cn: count 3
sn: count 2

grail · 06-16-2016, 02:40 AM

How about you try and do some of the work yourself. You have been shown a possible solution so now it is up to you to try and augment it as you need.

HMW · 06-16-2016, 02:43 AM

Quote:

Originally Posted by khandu

awesome.. how do i do it for multiple counts

Use *logic*.

Pseudocode:

Code:

if line starts with dn
   if line starts with cn or sn
        start counting
   if line starts with dn
        print count
        reset counter

Best regards,
HMW

khandu · 06-16-2016, 02:47 AM

lol..

I tried

Quote:

#!/usr/bin/perl -w

use strict;
use Data:

umper;

my $dn;
my %H ;
my %J ;

while (<>) {

# print; # debug
chomp;
/^dn:/ and $dn = $_;
next unless defined $dn;
$H{$dn}->{count}++ if /^cn:/;
$J{$dn}->{count}++ if /^sn:/;
}

print Dumper(\%H);
print Dumper(\%J);

gives me what i need.. need to make it a bit more prettier and with actual words of cn and sn etc in it..

pan64 · 06-16-2016, 03:01 AM

You may try another language too, I do not know which one do you prefer (awk/python/perl/whatever).
Here is another approach (pseudocode too)

Code:

if line starts with dn:
   new record
   ix += 1  (index, counter of dn's)
else
   split line to name/value pairs
   store name/value in record[ix] (this looks like hash or associative array)

finally count the number of elements in the hash (or count number of occurrences of something or ...)

By the way what will/should happen if you have two identical lines next to each other?

khandu · 06-16-2016, 03:03 AM

Thanks guys.. currently its giving me basic of what i need in a dirty way which is ok..

have modified the print of $VAR1 via qw..

will get back if i hit a hiccup..

bigearsbilly · 06-16-2016, 03:08 AM

haha! I enjoy a bit of nifty Perl! I am trying to avoid doing a conformance test today anyway

Code:

#!/usr/bin/perl -w

use strict;
use Data::Dumper;

my $dn;
my %H ;
my $thing;

while (<>) {

    print;
    chomp;
    next unless /./;

    if (m/^dn:/) {
        $dn = $_;
        next;
    }
    next unless defined $dn;
    next unless ($thing) = /^(\w+):/;
    $H{$dn}->{$thing}++ ;

}

print Dumper(\%H);

Code:

$VAR1 = {
          'dn: 2nd' => {
                         'cn' => 3,
                         'sn' => 2
                       },
          'dn: 1st' => {
                         'cn' => 2,
                         'sn' => 1
                       }
        };

Now I better get on with my work

grail · 06-16-2016, 04:04 AM

Code:

awk '/dn/{for(i in a)print i,a[i];delete a;print}/[cs]n/{a[$1]++}END{for(i in a)print i,a[i]}' file

Might need to play with sort orders but you get the idea

syg00 · 06-16-2016, 04:11 AM

Quote:

Originally Posted by grail

... but you get the idea

Not 90+% of the population I suspect ....