[SOLVED] Search File For Each Occurrence of Pattern Less Than X Lines Apart
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Search File For Each Occurrence of Pattern Less Than X Lines Apart
Find each instance of same pattern occurring within x number of lines. For example, where does the word "dog" repeat in this source file only within a four line range? Lines 22 and 27 do not qualify because those lines are too far apart:
Code:
LINE#: FILE:
-------------------
1 detailers
2 dog
3 grizzling
4 hitched
5 dog
6 hyphenations
7 indispensible
8 burgundy
9 clamour
10 cosmopolitan
11 dog
12 dowels
13 dog
14 elegance
15 exactness
16 filtration
17 lucent
18 misbehaving
19 morning
20 nicest
21 nonresident
22 dog
23 overplay
24 pelvic
25 proton
26 robbins
27 dog
28 slender
29 cyclic
30 knockers
31 liquidize
32 stockings
33 dog
34 dog
35 instinct
36 jackknife
Actual source file is just the list of words -- no line numbers present. My search results should return:
2 5 11 13 33 34
Or simply:
2 11 33 would be fine.
Conversely:
5 13 34 would also be fine.
I was trying to grep -B3 -n dog <file>
But I don't know how to test each instance to see if it qualifies the 4 line range limit, much less print only those results.
Thanks in advance for taking your time to read.
Big thanks if you know of a solution.
[EDIT]...simple non-loop...^ <<<(8:57PM 09-20-2016)
1. read file or input line by line
2. for each line push() or unshift() the word onto a list
3. if the list is longer than the limit, remove the oldest item
4. check if the latest word occurs more than once using grep()
4a. if yes, then clear the counter
Though you could probably do it with an array in awk or in python, too.
cat -n dogs.txt
1 dog
2 foo
3 foo
4 dog
5 foo
6 foo
7 dog
8 foo
9 foo
10 foo
11 foo
12 foo
13 dog
I get this result using a loop with a bunch of variables and some conditionals.
Code:
./findDogs.sh
1 4
4 7
So, if I understood the decription correct, this shows that you have 'dog' with < 4 lines between lines 1 to 4 and 4 to 7. It does NOT print the last dog, because there are 5 lines between line 7 and 13, where the last dog is located. (Phew...)
#!/bin/bash
if [ -z $1 ]; then echo "No file to search!"; exit; fi
if [ ! -f "$1" ]; then echo "$1 is no file!"; exit; fi
if [ -z $2 ]; then echo "No word to search for!"; exit; fi
if [ -z $3 ]; then echo "Give max number of lines between matches!"; exit; fi
iCount=0
iPrev=0
sResult1=""
sResult2=""
while read line
do
iCount=$((iCount+1))
if grep -q $2 <<<$line; then
if [ $iPrev == 0 ]; then
iPrev=$iCount
else
if [ $(( iCount - iPrev )) -le $3 ]; then
sResult1=$sResult1"$iPrev ";
sResult2=$sResult2"$iPrev-$iCount;";
fi
iPrev=$iCount
fi
fi
done <$1
echo $sResult1
echo $sResult2
exit
Save file as script, and run it as
Quote:
(/path/to/)script (/path/to/)file needle lines
So in your eample, output of
Quote:
./script ./file dog 4
would be
Quote:
4 13 35
4-7;13-15;35-36;
if you'd replace "indispensible" for "dog" in line 9, output would become:
Asking if I tried to search, research, study and find my own solution? Or asking if I am trying to cheat my way around some kind of class assignment so I don't have to *earn* my grade? Yes I tried to find an elegant solution and no I am not enrolled as a student so this inquiry is not inspired by any class homework assignment. Never sure what people mean or intend to learn speaking in incomplete sentences, but thanks for leading comments.
@ALL
I failed to mention, I have skills enough to research commands and assemble a bash script with lots of variables and loops and pretty colors, but I was looking for something more elegant. The 'grep' command so soooo powerful and soooo efficient at spotting patterns and providing line numbers on every match. I hoped it could somehow be tweaked to expand on that adept native behavior so loops and variables would not be necessary. Colors are optional and can be added when I have nothing better to do.
Sorry, I thought you just needed a solution that works.
I too prefer simple, elegant solutions, but sometimes that is just not possible (or it is, but I just don't know how).
I know there are "nerds" (no offense) who are wizards at piping commands together into one single line, but halas, I'm not one of them. I do a lot from the command prompt and try to create scripts that are clear an understandable for the average user (and for me).
Thanks Axel. Sounds like you and I are in much the same boat. I come up with these odd challenges once in a while and want to post but so many forum choices, I never know where to post to be seen by the right bash "guru" for that elegant solution.
"Elegant" is highly subjective. Personally, I prefer code that is: well written (properly indented), well commented, with proper use of exit codes and thus is easy to maintain and modify.
Funnily enough the very first piece of Perl I wrote was after looking for a way to speed up a bash script to do almost exactly this sort of problem.
(real work requirement)
Some years ago though..
It wasn't that elegant but it was straightforward. Basically step though the file line by line and remember the nth prev line and compare the curr line with that.
That's when I realised how fast Perl was, which was handy because the real prod files were very long.
Sincere thanks to all who contributed their ideas and/or code samples. We have one winner that came in from a local "LQ Guru". (No surprise there ) Most elegant, concise and efficient sample yet. I hope it can serve many others as it will serve me. Also, I hope one day I can come to understand how it works exactly. I'm very weak on the 'awk' command.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.