LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-20-2016, 02:25 AM   #1
RandyTech
Member
 
Registered: Oct 2010
Posts: 62

Rep: Reputation: 3
Search File For Each Occurrence of Pattern Less Than X Lines Apart


Find each instance of same pattern occurring within x number of lines. For example, where does the word "dog" repeat in this source file only within a four line range? Lines 22 and 27 do not qualify because those lines are too far apart:
Code:
LINE#:	FILE:
-------------------
1	detailers
2	dog
3	grizzling
4	hitched
5	dog
6	hyphenations
7	indispensible
8	burgundy
9	clamour
10	cosmopolitan
11	dog
12	dowels
13	dog
14	elegance
15	exactness
16	filtration
17	lucent
18	misbehaving
19	morning
20	nicest
21	nonresident
22	dog
23	overplay
24	pelvic
25	proton
26	robbins
27	dog
28	slender
29	cyclic
30	knockers
31	liquidize
32	stockings
33	dog
34	dog
35	instinct
36	jackknife
Actual source file is just the list of words -- no line numbers present. My search results should return:
2 5 11 13 33 34

Or simply:
2 11 33 would be fine.

Conversely:
5 13 34 would also be fine.

I was trying to grep -B3 -n dog <file>
But I don't know how to test each instance to see if it qualifies the 4 line range limit, much less print only those results.

Thanks in advance for taking your time to read.
Big thanks if you know of a solution.
[EDIT]...simple non-loop...^ <<<(8:57PM 09-20-2016)

Last edited by RandyTech; 09-20-2016 at 08:59 AM.
 
Old 09-20-2016, 03:51 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
This is a task for perl, in my opinion.

1. read file or input line by line
2. for each line push() or unshift() the word onto a list
3. if the list is longer than the limit, remove the oldest item
4. check if the latest word occurs more than once using grep()
4a. if yes, then clear the counter

Though you could probably do it with an array in awk or in python, too.
 
2 members found this post helpful.
Old 09-20-2016, 04:28 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Run (normal) grep - assign output to a (bash) array. Compare relevant index entries.
KISS.

Homework ?.
 
2 members found this post helpful.
Old 09-20-2016, 05:10 AM   #4
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Actually, I prefer to loop. Consider this infile:
Code:
cat -n dogs.txt 
     1	dog
     2	foo
     3	foo
     4	dog
     5	foo
     6	foo
     7	dog
     8	foo
     9	foo
    10	foo
    11	foo
    12	foo
    13	dog
I get this result using a loop with a bunch of variables and some conditionals.
Code:
./findDogs.sh 
1 4
4 7
So, if I understood the decription correct, this shows that you have 'dog' with < 4 lines between lines 1 to 4 and 4 to 7. It does NOT print the last dog, because there are 5 lines between line 7 and 13, where the last dog is located. (Phew...)
 
Old 09-20-2016, 07:05 AM   #5
Axel van Moorsel
Member
 
Registered: Jan 2011
Location: Netherlands (Zuid Holland)
Distribution: Debian 8
Posts: 31

Rep: Reputation: 4
Hello Randy Tech
Maybe this does the trick

Code:
#!/bin/bash

if [ -z $1 ]; then echo "No file to search!"; exit; fi
if [ ! -f "$1" ]; then echo "$1 is no file!"; exit; fi
if [ -z $2 ]; then echo "No word to search for!"; exit; fi
if [ -z $3 ]; then echo "Give max number of lines between matches!"; exit; fi
iCount=0
iPrev=0
sResult1=""
sResult2=""
while read line           
do
   iCount=$((iCount+1))
   if grep -q $2 <<<$line; then
      if [ $iPrev == 0 ]; then 
         iPrev=$iCount
      else
         if [ $(( iCount - iPrev )) -le $3 ]; then 
            sResult1=$sResult1"$iPrev "; 
            sResult2=$sResult2"$iPrev-$iCount;"; 
         fi
         iPrev=$iCount
      fi
   fi
done <$1
echo $sResult1
echo $sResult2
exit
Save file as script, and run it as
Quote:
(/path/to/)script (/path/to/)file needle lines
So in your eample, output of
Quote:
./script ./file dog 4
would be
Quote:
4 13 35
4-7;13-15;35-36;
if you'd replace "indispensible" for "dog" in line 9, output would become:
Quote:
4 7 9 13 35
4-7;7-9;9-13;13-15;35-36;
Hope this will help you.
 
1 members found this post helpful.
Old 09-20-2016, 09:33 AM   #6
RandyTech
Member
 
Registered: Oct 2010
Posts: 62

Original Poster
Rep: Reputation: 3
@syg00
Quote:
Homework ?.
Asking if I tried to search, research, study and find my own solution? Or asking if I am trying to cheat my way around some kind of class assignment so I don't have to *earn* my grade? Yes I tried to find an elegant solution and no I am not enrolled as a student so this inquiry is not inspired by any class homework assignment. Never sure what people mean or intend to learn speaking in incomplete sentences, but thanks for leading comments.

@ALL
I failed to mention, I have skills enough to research commands and assemble a bash script with lots of variables and loops and pretty colors, but I was looking for something more elegant. The 'grep' command so soooo powerful and soooo efficient at spotting patterns and providing line numbers on every match. I hoped it could somehow be tweaked to expand on that adept native behavior so loops and variables would not be necessary. Colors are optional and can be added when I have nothing better to do.
 
Old 09-20-2016, 09:49 AM   #7
Axel van Moorsel
Member
 
Registered: Jan 2011
Location: Netherlands (Zuid Holland)
Distribution: Debian 8
Posts: 31

Rep: Reputation: 4
Sorry, I thought you just needed a solution that works.

I too prefer simple, elegant solutions, but sometimes that is just not possible (or it is, but I just don't know how).

I know there are "nerds" (no offense) who are wizards at piping commands together into one single line, but halas, I'm not one of them. I do a lot from the command prompt and try to create scripts that are clear an understandable for the average user (and for me).

Hope you'll find what you are looking for.
 
Old 09-20-2016, 10:05 AM   #8
RandyTech
Member
 
Registered: Oct 2010
Posts: 62

Original Poster
Rep: Reputation: 3
Thanks Axel. Sounds like you and I are in much the same boat. I come up with these odd challenges once in a while and want to post but so many forum choices, I never know where to post to be seen by the right bash "guru" for that elegant solution.
 
Old 09-20-2016, 10:35 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Code:
awk '/dog/{if(l && c < 4)print l,NR;c=0;l=NR;next}l{c++}' file
 
2 members found this post helpful.
Old 09-20-2016, 10:35 AM   #10
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 634

Rep: Reputation: 316Reputation: 316Reputation: 316Reputation: 316
Code:
margincheck() { margin="${2:-1}"; set $(grep -on dog "$1" | cut -d\: -f1 | tr '\n' ' '); while [[ $# -ne 1 ]]; do value=$(($2 - $1)); [[ $value -le "$margin" ]] && echo -e "$1\n$2"; shift; done; }
Code:
margincheck() { 
  margin="${2:-1}"
  set $(grep -on dog "$1" | cut -d\: -f1 | tr '\n' ' ')
  while [[ $# -ne 1 ]]
    do
      value=$(($2 - $1))
      [[ $value -le "$margin" ]] && echo -e "$1\n$2"
      shift
  done
}


Your sample output

Code:
margincheck file 4
2
5
11
13
33
34
34
 
1 members found this post helpful.
Old 09-20-2016, 10:48 AM   #11
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by RandyTech View Post
I was looking for something more elegant
"Elegant" is highly subjective. Personally, I prefer code that is: well written (properly indented), well commented, with proper use of exit codes and thus is easy to maintain and modify.

But... to each his/her own.
 
Old 09-22-2016, 05:51 AM   #12
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Funnily enough the very first piece of Perl I wrote was after looking for a way to speed up a bash script to do almost exactly this sort of problem.
(real work requirement)
Some years ago though..

It wasn't that elegant but it was straightforward. Basically step though the file line by line and remember the nth prev line and compare the curr line with that.
That's when I realised how fast Perl was, which was handy because the real prod files were very long.
 
Old 10-02-2016, 11:37 PM   #13
RandyTech
Member
 
Registered: Oct 2010
Posts: 62

Original Poster
Rep: Reputation: 3
Sincere thanks to all who contributed their ideas and/or code samples. We have one winner that came in from a local "LQ Guru". (No surprise there ) Most elegant, concise and efficient sample yet. I hope it can serve many others as it will serve me. Also, I hope one day I can come to understand how it works exactly. I'm very weak on the 'awk' command.
 
Old 10-02-2016, 11:41 PM   #14
RandyTech
Member
 
Registered: Oct 2010
Posts: 62

Original Poster
Rep: Reputation: 3
Thumbs up Best Solution ! ! !

Quote:
Originally Posted by grail View Post
Code:
awk '/dog/{if(l && c < 4)print l,NR;c=0;l=NR;next}l{c++}' file
Indeed elegant beyond (my) comprehension and I have to say you are top 10 in my list of Linux Gurus. Thank you very much Grail!!

Randy
 
Old 10-03-2016, 12:58 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by RandyTech View Post
Also, I hope one day I can come to understand how it works exactly. I'm very weak on the 'awk' command.
It's easier to see if you indent it a little:

Code:
/dog/ {
        if(l && c < 4)
                print l,NR;
        c=0;
        l=NR;
        next
       }
l{
        c++
}
There are two implied if statements there. if( $0 ~ /dog/ ) for the one, and if( l ) for the other. Very fine.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Need help inserting 2 lines into a file following a specific pattern rmori Linux - Newbie 6 10-06-2014 08:23 PM
[SOLVED] How to search file by pattern and then delete corresponding lines in shell cyatomato Programming 8 09-17-2010 08:08 AM
sed: delete lines after last occurrence of a pattern in a file zugvogel Programming 4 11-17-2009 01:49 AM
grep till the 1st occurrence of a pattern raghu123 Programming 2 04-15-2009 05:47 AM
Replace every other occurrence of pattern Wynd Linux - General 8 12-14-2005 03:43 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration