LinuxQuestions.org - [SOLVED] Search File For Each Occurrence of Pattern Less Than X Lines Apart

Page 1 of 2

Show 50 post(s) from this thread on one page

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Search File For Each Occurrence of Pattern Less Than X Lines Apart (https://www.linuxquestions.org/questions/linux-newbie-8/search-file-for-each-occurrence-of-pattern-less-than-x-lines-apart-4175589780/)

RandyTech

09-20-2016 02:25 AM

Search File For Each Occurrence of Pattern Less Than X Lines Apart

Find each instance of same pattern occurring within x number of lines. For example, where does the word "dog" repeat in this source file only within a four line range? Lines 22 and 27 do not qualify because those lines are too far apart:

Code:

LINE#:        FILE:

-------------------

1        detailers

2        dog

3        grizzling

4        hitched

5        dog

6        hyphenations

7        indispensible

8        burgundy

9        clamour

10        cosmopolitan

11        dog

12        dowels

13        dog

14        elegance

15        exactness

16        filtration

17        lucent

18        misbehaving

19        morning

20        nicest

21        nonresident

22        dog

23        overplay

24        pelvic

25        proton

26        robbins

27        dog

28        slender

29        cyclic

30        knockers

31        liquidize

32        stockings

33        dog

34        dog

35        instinct

36        jackknife

Actual source file is just the list of words -- no line numbers present. My search results should return:
2 5 11 13 33 34

Or simply:
2 11 33 would be fine.

Conversely:
5 13 34 would also be fine.

I was trying to grep -B3 -n dog <file>
But I don't know how to test each instance to see if it qualifies the 4 line range limit, much less print only those results.

Thanks in advance for taking your time to read.
Big thanks if you know of a solution.
[EDIT]...simple non-loop...^ <<<(8:57PM 09-20-2016)

Turbocapitalist

09-20-2016 03:51 AM

This is a task for perl, in my opinion.

1. read file or input line by line
2. for each line push() or unshift() the word onto a list
3. if the list is longer than the limit, remove the oldest item
4. check if the latest word occurs more than once using grep()
4a. if yes, then clear the counter

Though you could probably do it with an array in awk or in python, too.

syg00

09-20-2016 04:28 AM

Run (normal) grep - assign output to a (bash) array. Compare relevant index entries.
KISS.

Homework ?.

HMW	09-20-2016 05:10 AM

Actually, I prefer to loop. Consider this infile:

Code:

cat -n dogs.txt 

    1        dog

    2        foo

    3        foo

    4        dog

    5        foo

    6        foo

    7        dog

    8        foo

    9        foo

    10        foo

    11        foo

    12        foo

    13        dog

I get this result using a loop with a bunch of variables and some conditionals.

Code:

./findDogs.sh 

1 4

4 7

So, if I understood the decription correct, this shows that you have 'dog' with < 4 lines between lines 1 to 4 and 4 to 7. It does NOT print the last dog, because there are 5 lines between line 7 and 13, where the last dog is located. (Phew...)

Axel van Moorsel

09-20-2016 07:05 AM

Hello Randy Tech
Maybe this does the trick

Code:

#!/bin/bash



if [ -z $1 ]; then echo "No file to search!"; exit; fi

if [ ! -f "$1" ]; then echo "$1 is no file!"; exit; fi

if [ -z $2 ]; then echo "No word to search for!"; exit; fi

if [ -z $3 ]; then echo "Give max number of lines between matches!"; exit; fi

iCount=0

iPrev=0

sResult1=""

sResult2=""

while read line          

do

  iCount=$((iCount+1))

  if grep -q $2 <<<$line; then

      if [ $iPrev == 0 ]; then 

        iPrev=$iCount

      else

        if [ $(( iCount - iPrev )) -le $3 ]; then 

            sResult1=$sResult1"$iPrev "; 

            sResult2=$sResult2"$iPrev-$iCount;"; 

        fi

        iPrev=$iCount

      fi

  fi

done <$1

echo $sResult1

echo $sResult2

exit

Save file as script, and run it as

Quote:

(/path/to/)script (/path/to/)file needle lines

So in your eample, output of

Quote:

./script ./file dog 4

would be

Quote:

4 13 35
4-7;13-15;35-36;

if you'd replace "indispensible" for "dog" in line 9, output would become:

Quote:

4 7 9 13 35
4-7;7-9;9-13;13-15;35-36;

Hope this will help you.

RandyTech

09-20-2016 09:33 AM

@syg00

Quote:

Homework ?.

Asking if I tried to search, research, study and find my own solution? Or asking if I am trying to cheat my way around some kind of class assignment so I don't have to *earn* my grade? Yes I tried to find an elegant solution and no I am not enrolled as a student so this inquiry is not inspired by any class homework assignment. Never sure what people mean or intend to learn speaking in incomplete sentences, but thanks for leading comments.

@ALL
I failed to mention, I have skills enough to research commands and assemble a bash script with lots of variables and loops and pretty colors, but I was looking for something more elegant. The 'grep' command so soooo powerful and soooo efficient at spotting patterns and providing line numbers on every match. I hoped it could somehow be tweaked to expand on that adept native behavior so loops and variables would not be necessary. Colors are optional and can be added when I have nothing better to do. ;)

Axel van Moorsel

09-20-2016 09:49 AM

Sorry, I thought you just needed a solution that works.

I too prefer simple, elegant solutions, but sometimes that is just not possible (or it is, but I just don't know how).

I know there are "nerds" (no offense) who are wizards at piping commands together into one single line, but halas, I'm not one of them. I do a lot from the command prompt and try to create scripts that are clear an understandable for the average user (and for me).

Hope you'll find what you are looking for.

RandyTech

09-20-2016 10:05 AM

Thanks Axel. Sounds like you and I are in much the same boat. I come up with these odd challenges once in a while and want to post but so many forum choices, I never know where to post to be seen by the right bash "guru" for that elegant solution.

grail

09-20-2016 10:35 AM

Code:

awk '/dog/{if(l && c < 4)print l,NR;c=0;l=NR;next}l{c++}' file

Sefyir

09-20-2016 10:35 AM

Code:

margincheck() { margin="${2:-1}"; set $(grep -on dog "$1" | cut -d\: -f1 | tr '\n' ' '); while [[ $# -ne 1 ]]; do value=$(($2 - $1)); [[ $value -le "$margin" ]] && echo -e "$1\n$2"; shift; done; }

Code:

margincheck() { 

  margin="${2:-1}"

  set $(grep -on dog "$1" | cut -d\: -f1 | tr '\n' ' ')

  while [[ $# -ne 1 ]]

    do

      value=$(($2 - $1))

      [[ $value -le "$margin" ]] && echo -e "$1\n$2"

      shift

  done

}

Your sample output

Code:

margincheck file 4

2

5

11

13

33

34

34

HMW	09-20-2016 10:48 AM

Quote:

Originally Posted by RandyTech (Post 5607751)

I was looking for something more elegant

"Elegant" is highly subjective. Personally, I prefer code that is: well written (properly indented), well commented, with proper use of exit codes and thus is easy to maintain and modify.

But... to each his/her own.

chrism01

09-22-2016 05:51 AM

Funnily enough the very first piece of Perl I wrote was after looking for a way to speed up a bash script to do almost exactly this sort of problem.
(real work requirement)
Some years ago though.. ;)

It wasn't that elegant but it was straightforward. Basically step though the file line by line and remember the nth prev line and compare the curr line with that.
That's when I realised how fast Perl was, which was handy because the real prod files were very long.

RandyTech

10-02-2016 11:37 PM

Sincere thanks to all who contributed their ideas and/or code samples. We have one winner that came in from a local "LQ Guru". (No surprise there :)) Most elegant, concise and efficient sample yet. I hope it can serve many others as it will serve me. Also, I hope one day I can come to understand how it works exactly. I'm very weak on the 'awk' command.

RandyTech

10-02-2016 11:41 PM

Best Solution ! ! !

Quote:

Originally Posted by grail (Post 5607787)

Code:

awk '/dog/{if(l && c < 4)print l,NR;c=0;l=NR;next}l{c++}' file

Indeed elegant beyond (my) comprehension and I have to say you are top 10 in my list of Linux Gurus. Thank you very much Grail!!

Randy

Turbocapitalist

10-03-2016 12:58 AM

Quote:

Originally Posted by RandyTech (Post 5612986)

Also, I hope one day I can come to understand how it works exactly. I'm very weak on the 'awk' command.

It's easier to see if you indent it a little:

Code:

/dog/ {

        if(l && c < 4)

                print l,NR;

        c=0;

        l=NR;

        next

      }

l{

        c++

}

There are two implied if statements there. if( $0 ~ /dog/ ) for the one, and if( l ) for the other. Very fine.

All times are GMT -5. The time now is 01:04 AM.

Page 1 of 2

Show 50 post(s) from this thread on one page