Search File For Each Occurrence of Pattern Less Than X Lines Apart
Find each instance of same pattern occurring within x number of lines. For example, where does the word "dog" repeat in this source file only within a four line range? Lines 22 and 27 do not qualify because those lines are too far apart:
Code:
LINE#: FILE: 2 5 11 13 33 34 Or simply: 2 11 33 would be fine. Conversely: 5 13 34 would also be fine. I was trying to grep -B3 -n dog <file> But I don't know how to test each instance to see if it qualifies the 4 line range limit, much less print only those results. Thanks in advance for taking your time to read. Big thanks if you know of a solution. [EDIT]...simple non-loop...^ <<<(8:57PM 09-20-2016) |
This is a task for perl, in my opinion.
1. read file or input line by line 2. for each line push() or unshift() the word onto a list 3. if the list is longer than the limit, remove the oldest item 4. check if the latest word occurs more than once using grep() 4a. if yes, then clear the counter Though you could probably do it with an array in awk or in python, too. |
Run (normal) grep - assign output to a (bash) array. Compare relevant index entries.
KISS. Homework ?. |
Actually, I prefer to loop. Consider this infile:
Code:
cat -n dogs.txt Code:
./findDogs.sh |
Hello Randy Tech
Maybe this does the trick Code:
#!/bin/bash Quote:
Quote:
Quote:
Quote:
|
@syg00
Quote:
@ALL I failed to mention, I have skills enough to research commands and assemble a bash script with lots of variables and loops and pretty colors, but I was looking for something more elegant. The 'grep' command so soooo powerful and soooo efficient at spotting patterns and providing line numbers on every match. I hoped it could somehow be tweaked to expand on that adept native behavior so loops and variables would not be necessary. Colors are optional and can be added when I have nothing better to do. ;) |
Sorry, I thought you just needed a solution that works.
I too prefer simple, elegant solutions, but sometimes that is just not possible (or it is, but I just don't know how). I know there are "nerds" (no offense) who are wizards at piping commands together into one single line, but halas, I'm not one of them. I do a lot from the command prompt and try to create scripts that are clear an understandable for the average user (and for me). Hope you'll find what you are looking for. |
Thanks Axel. Sounds like you and I are in much the same boat. I come up with these odd challenges once in a while and want to post but so many forum choices, I never know where to post to be seen by the right bash "guru" for that elegant solution.
|
Code:
awk '/dog/{if(l && c < 4)print l,NR;c=0;l=NR;next}l{c++}' file |
Code:
margincheck() { margin="${2:-1}"; set $(grep -on dog "$1" | cut -d\: -f1 | tr '\n' ' '); while [[ $# -ne 1 ]]; do value=$(($2 - $1)); [[ $value -le "$margin" ]] && echo -e "$1\n$2"; shift; done; } Code:
margincheck() { Your sample output Code:
margincheck file 4 |
Quote:
But... to each his/her own. |
Funnily enough the very first piece of Perl I wrote was after looking for a way to speed up a bash script to do almost exactly this sort of problem.
(real work requirement) Some years ago though.. ;) It wasn't that elegant but it was straightforward. Basically step though the file line by line and remember the nth prev line and compare the curr line with that. That's when I realised how fast Perl was, which was handy because the real prod files were very long. |
Sincere thanks to all who contributed their ideas and/or code samples. We have one winner that came in from a local "LQ Guru". (No surprise there :)) Most elegant, concise and efficient sample yet. I hope it can serve many others as it will serve me. Also, I hope one day I can come to understand how it works exactly. I'm very weak on the 'awk' command.
|
Best Solution ! ! !
Quote:
Randy |
Quote:
Code:
/dog/ { |
All times are GMT -5. The time now is 01:04 AM. |