LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Display lines before and after string until blank line (https://www.linuxquestions.org/questions/linux-newbie-8/display-lines-before-and-after-string-until-blank-line-4175487993/)

druuna 12-17-2013 07:58 AM

Quote:

Originally Posted by cosminel (Post 5082307)
Uhm, sorry druuna but I already explained in my previous post. I will retry:
- fgrep defines a fixed data window containing the STRING but it also includes additional lines because...
- the data packets containing the STRING have a variable number of lines, some packets with STRING are shorter (let's say 10 lines) while other packets with STRING are longer (let's say 26)

Stop right there.

When you use fgrep -B 6 -A 20 as shown in your previous post then there cannot be 26 lines after the STRING!! The window is 6 lines before, the STRING and 20 lines after.

All the info wanted is inside that window, just ignore or remove the empty/unwanted lines. Problem solved.

I'm going to ask one more time (!!): Post valid data input examples and wanted output examples.

cosminel 12-17-2013 08:08 AM

druuna, it was just an example. Please try to read my previous post (#30), ignoring all the details I gave up to that point. This way you will understand why only using fgrep will never filter out the data as desired.

And as I said in post #21, I am using fgrep as an intermediary filtering tool because it is much faster especially when working with live packets. This way I offload a lot of processing from awk by running fgrep first and thinning out the packet information.

I already obtained what I wanted by using timestamps as RS. Should have done this from the beginning, it was my mistake I did not think about it.

The example I provided in my previous post (#30) reflects precisely the way my data is formatted before applying the filtering. The real data is basically the same, the only difference is that the lines look a bit uglier.

grail 12-17-2013 08:20 AM

Ok ... so here is a little curve ball based on the most recent information.

If I understand correctly, once fgrep has been run we end up with data which is separated by both blank lines and dashes (--)
Assuming I am on the right track we could then instigate an RS which is the same and hence simply search for records containing STRING

Using the example in post #30:
Code:

awk '/STRING/' RS="(--|\n)\n"
Also, if druuna is correct that the data is actually terminated with Windows line endings, awk is able to be altered so you can use '\r\n' instead of '\n' if needed

druuna 12-17-2013 08:24 AM

And we are back at the example in post #1 (which is basically the same as the example in post #30).

Use
Code:

$ awk 'BEGIN{ RS="\n\n" } $0 ~ /STRING/' input
Output
Code:

timestamp 1
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump
timestamp 3
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump

And don't tell me that it doesn't work with valid, real data as those two data sets are the same (your words, not mine).

cosminel 12-17-2013 09:03 AM

Yes, it works just fine except for the fact that there is no longer a single black line between the end of a previous packet and the beginning of a new one. Firstly this makes the individual packet output more difficult to read.

Secondly, on the data stream, from time to time there are remnants from other packets like so:
Code:

TIMESTAMP 1
line of text
line of text
(etc)
line of text
STRING
line of text
line of text
(etc)
line of text
TIMESTAMP 2
line of text
line of text
--
TIMESTAMP 3
line of text
line of text
(etc)
line of text
STRING
line of text
line of text
(etc)
line of text

As you can see this part:
Code:

TIMESTAMP 2
line of text
line of text
--

...is extra information that doesn't get filtered out because of the fgrep data window that "cuts" the lines with its marker "--" and leaves no spaces between packets. This is why timestamps as RS are able to filter out the data as intended.

druuna 12-17-2013 09:12 AM

My last contribution to this thread, it is pointless without relevant data.
Code:

tcpdump <options> |  awk 'BEGIN{ RS="\n\n" } $0 ~ /STRING/ { printf("%s\n\n",$0)}'
# output
timestamp 1
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump

timestamp 3
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump


cosminel 12-17-2013 09:30 AM

Your and grail's contribution is highly appreciated and I've learned some things from your posts.

I worked on an output from the actual production server which I pasted in a file on my testing server. It appears that that specific data set did not cotain all the possible situations. Only when I retested on the live tcpdump | fgrep stream I could see these discrepancies which occur only here and there but nonetheless break my intended output.

It is a lot of raw data through which I must filter and I need to run test after test to check the output. It is nobody's fault here really, I adjust the commands as I do more tests on the raw input and read the output. The data stream on which I want to apply this is changing from packet to packet and there's a lot of them.

Now ideally I would have the RS printed but I can manage without the timestamps and in fact It's fine just with the seconds part.

And so by testing I found that using the input raw packets timestamps as RS actually produce the most consistent results, I have yet to see any discrepancy in the output.

Again, I really do appreciate taking your time to help and if you feel it was for nothing, I assure you it is not the case. I will reference this thread and your and grail's awk syntaxes and explanations in the future when I will need to do stuff with awk.

grail 12-17-2013 09:30 AM

Yeah I am starting to get confused too :( Now you are saying that fgrep can return data from 2 timestamps with no blank line in between??

As for the adding of a line between the data:
Code:

awk '/STRING/' RS="(--|\n)\n" ORS="\n\n"

cosminel 12-17-2013 09:40 AM

grail, I just tested your syntax and it seems to solve all the issues. I will do some further testing but results look promising.

Sorry for confusing you guys but the input data stream was/is tricky for me too so there is always room for surprises.


All times are GMT -5. The time now is 02:53 PM.