LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-18-2012, 05:05 AM   #16
hanae
Member
 
Registered: May 2012
Posts: 33

Original Poster
Rep: Reputation: Disabled

Thank you now it i working.
But, the program doesn't give the correct results. It attached 1 with every entry while here are repeated entries that should have the frequency number. It seems to me as it it doesn't take into account the fields, the entry is considered all as one entry. especially that there are some words that appear in different files:
examples
Quote:
sky losn_revue-1981-2 40234
sky note4-0-7 6787
fly kivre1-0-0 1236
fly kivre1-0-0 1240
sky file1-1-3 1567
In this case we should have:
fly kivre1-0-0 1236,1240 2
sky note4-0-7,file1-1-3 6787,1567 2
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 05-18-2012, 05:16 AM   #17
hanae
Member
 
Registered: May 2012
Posts: 33

Original Poster
Rep: Reputation: Disabled
does it really matter when the input is a large file?? because i tested with a 3 entry sample input and it works while with my input that contains over 1000 entries doesnt!!
 
Old 05-18-2012, 06:21 AM   #18
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,607

Rep: Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243
no, the size does not matter, at least it should not be. I would rather think you didn't tell us that, so it has not been implemented.
Code:
#!/usr/bin/perl -w

$DEBUG = 0;

%result = ();

# loop over lines
while (<>) {

@word = split;
$key = shift @word;
next if ! defined $key;
map { $result{$key}{$_}{$word[$_]} = 1 } (0..$#word); 
$counter{$key}++; 
}


foreach  $k ( keys %result ) {
    print "$k ";
    foreach $n ( sort keys %{ $result{$k} } ) {
        $a = join ",", ( keys %{ $result{$k}{$n} } );
	print "$a ";
    }
    print $counter{$k} . "\n";
}
maybe this one works better

Last edited by pan64; 05-18-2012 at 06:33 AM. Reason: I missed counter
 
1 members found this post helpful.
Old 05-18-2012, 06:32 AM   #19
hanae
Member
 
Registered: May 2012
Posts: 33

Original Poster
Rep: Reputation: Disabled
Yeah it does, but it doesn't give the count:
Quote:
fly kivre1-0-0 1236,1240 2
sky note4-0-7,file1-1-3 6787,1567 2
2 in the examples is the count.

Thank you,
 
Old 05-18-2012, 06:33 AM   #20
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,607

Rep: Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243
It is now fixed, sorry
 
Old 05-18-2012, 06:38 AM   #21
hanae
Member
 
Registered: May 2012
Posts: 33

Original Poster
Rep: Reputation: Disabled
what do you mean by fixed??
 
Old 05-18-2012, 06:40 AM   #22
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,607

Rep: Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243Reputation: 1243
I edited the post, so you can now find the modified code. It handles the counters also. So please check post #18 again.
 
Old 05-18-2012, 06:54 AM   #23
hanae
Member
 
Registered: May 2012
Posts: 33

Original Poster
Rep: Reputation: Disabled
It is really extremely helpful!!Thank you very much pan
 
Old 05-18-2012, 09:58 AM   #24
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
I thought the two first fields were supposed to be the key. #16 shows that only the first field is the key, and that both the second and third fields should be gathered in lists.

Here is the modified, commented awk script version:
Code:
#!/usr/bin/awk -f
BEGIN {
    # Each line (using any newline convention) is a separate record.
    RS = "(\r\n|\n\r|\r|\n)"

    # Fields are separated by any amount of whitespace.
    FS = "[\t\v\f ]+"

    # For output, use explicitly the Linux newline convention.
    ORS = "\n"

    # For output, use a single space between fields.
    OFS = " "
}

# Consider only records with three or more fields.
(NF >= 3) {

    # First field is the key.
    k = $1

    # Keep track of each unique key:
    # If count has no key k, then k is a new key.
    if (!(k in count))
        key[++keys] = k

    # Add to the number of times this key has been seen.
    count[k]++

    # Add second field to list1, comma-separated.
    list1[k] = list1[k] "," $2

    # Add third field to list2, comma-separated.
    list2[k] = list2[k] "," $3
}

END {
    # Loop over each unique key k.
    for (i = 1; i <= keys; i++) {
        k = key[i]

        # The number of times this key has been seen.
        n = count[k]

        # The comma-separated lists for this key.
        s1 = list1[k]
        s2 = list2[k]

        # Replace consecutive runs of commas with a single comma.
        # Note: This really only happens if the second or third
        #       fields start or end with a comma.
        gsub(/,,+/, ",", s1)
        gsub(/,,+/, ",", s2)

        # Because we add a comma before each entry, there will always be
        # a leading comma. Remove it by skipping the first character.
        s1 = substr(s1, 2)
        s2 = substr(s2, 2)

        # Output the line.
        print k, s1, s2, n
    }
}
 
1 members found this post helpful.
Old 05-18-2012, 06:19 PM   #25
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,149

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
I was intrigued by this problem, and thought of another AWK program, using the new multi-dimensional arrays:
Code:
#!/bin/gawk -f
# Print the values  as a comma separated string and the dimensions
# as a colon separated string.
#
# Based on the walk_array function found in /usr/share/walkarray.awk
function print_array( arr, name,  i)
{
  comma=""
  for (i in arr) {
    if (isarray(arr[i])) {
      if (i) printf(":")
      print_array(arr[i], name "[" i "]")
    }
    else {
      if (i) {
	printf("%s%s", comma, i)
	comma=", "
      }
    }
  }
}
# Read the input file storing the information in a 3-dimensional array,
# with the number of occurrences of each first word in words["word"][""][""]
# and the number of occurrences of each additional field in words["word][field#][text].
{
  words[$1][""][""]++
  for (i=2;i<=NF;++i) {
    words[$1][i][$i]++
  }
}
# Print the summary information, with the count at the end enclosed in parenthesis
END {
  for (i in words) { 
    printf("%s",i)
    print_array(words[i],"words[" i "]")
    printf(" (%d)\n", words[i][""][""])
  }
}
Using the two sample data sets, this produces for the first data set:
Code:
$ ./count_by_first_word data 
sky:losn_revue-1981-2:40234 (1)
fly:kivre1-0-0:1240, 1236 (2)
and, for the second:
Code:
$ ./count_by_first_word data2
sky:losn_revue-1981-2, note4-0-7, file1-1-3:1567, 40234, 6787 (3)
fly:kivre1-0-0:1240, 1236 (2)
<edit>
Note that there is no assumption made in that code that there are only three fields in the input file. It also finds the unique values in each of the input fields, and only prints those unique values. Thus, for example, concatenating the second data set with itself prodices this:
Code:
$ cat data2 data2 > data3
$ ./count_by_first_word data3
sky:losn_revue-1981-2, note4-0-7, file1-1-3:1567, 40234, 6787 (6)
fly:kivre1-0-0:1240, 1236 (4)

Last edited by PTrenholme; 05-18-2012 at 06:31 PM.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help in word count command grunge_1 Linux - General 4 03-20-2009 04:01 AM
variable length string using GD (word wrap, carriage return, word/character count)? frieza Programming 1 02-14-2009 05:21 PM
word count issue George2 Programming 6 11-27-2007 06:11 AM
Word count with grep DiagonalArg Linux - Software 3 02-13-2006 12:46 PM
word count pantera Programming 2 08-31-2004 07:23 AM


All times are GMT -5. The time now is 04:33 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration