LinuxQuestions.org - printing rows with repeated strings

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - printing rows with repeated strings (https://www.linuxquestions.org/questions/linux-newbie-8/printing-rows-with-repeated-strings-919404/)

verse123

12-18-2011 10:34 PM

printing rows with repeated strings

Hi guys,

I am trying to print the rows with repeated strings (in this example it is the word DOG with some numbers) in a file. so for example:

col1 col2 col3
DOG1 233 1
DOG1 231 1
DOG4 230 5

I can do something like

Code:

awk '{if($3>=1){print}}' | sort -k1,1

but I do not know what to use to print the rows with every single repeat found. Any ideas?

David the H.

12-19-2011 03:32 PM

Could you explain your goal in more detail? For example:

Do you need to print out lines that match a specific string, or any repeating string in the file?
Are the matching lines sequential, or can they be scattered through the file?
Are the matched words all in the same column, or can they be in different ones?
Does the order of the output matter, and if so, how should it appear?

You might want to give use a larger example of the input, and perhaps tell us the context of what you're trying to do.

verse123

12-19-2011 07:40 PM

Hi,

I need it to print out lines that match a repeating string and they are scattered throughout the file. The matched words are also in the same column and the order of the output does not matter as long as all of the lines with the repeated strings are printed in the output file. So in the example below, the word "DOG1" should appear in the output file 4 times with the corresponding information from the rest of the columns. The word "DOG3" should appear twice in the output file with the corresponding information from the rest of the columns (for example, it should look like this: DOG3 0.04 2)

col1 col2 col3
DOG1 233 1
DOG1 231 1
DOG4 230 5
DOG1 0.5 3
DOG3 0.04 2
DOG0 4 23
DOG1 5 0.1
DOG3 63 5

Telengard

12-19-2011 09:47 PM

Hi, verse123. Here's how I read your program specification.

Quote:

Originally Posted by verse123 (Post 4554105)

. . . print out lines that match . . . The matched words are also in the same column . . . the order of the output does not matter . . . "DOG1" should appear in the output file 4 times . . . "DOG3" should appear twice in the output

Hope I understand you. Here's the input file I copied from your post.

Code:

$ cat dogs.txt

DOG1 233 1

DOG1 231 1

DOG4 230 5

DOG1 0.5 3

DOG3 0.04 2

DOG0 4 23

DOG1 5 0.1

DOG3 63 5

Here's the program I use to parse the file.

Code:

$ cat dog-finder.awk

#! /usr/bin/awk -f



BEGIN {

    # FS=" "

    # OFS=" "

}



{

    i=$1

    if (i in array) {

        print array[i]

        delete array[i]

        print $0

    } else {

        array[i]=$0

    }

}

Here's the output my program produced when I fed it the input file. I believe this program meets your specifications as I understood them.

Code:

$ ./dog-finder.awk dogs.txt

DOG1 233 1

DOG1 231 1

DOG1 0.5 3

DOG1 5 0.1

DOG3 0.04 2

DOG3 63 5

HTH

Cedrik

12-20-2011 04:56 AM

A Perl version:

Code:

perl -ane 'push @{$s{$F[0]}},$_;END{for(keys %s){print @{$s{$_}} if @{$s{$_}}>1;}}' file.txt

Edit: actually I prefer:

Code:

perl -ane '$s{$F[0]}.=$_;END{print for(grep {/\n./}values %s)}' file.txt

Edit Edit, works as well without for() and values

Code:

perl -ane '$s{$F[0]}.=$_;END{print grep{/\n./}%s}' file.txt

All times are GMT -5. The time now is 01:11 AM.