Quote:
When you use fgrep -B 6 -A 20 as shown in your previous post then there cannot be 26 lines after the STRING!! The window is 6 lines before, the STRING and 20 lines after. All the info wanted is inside that window, just ignore or remove the empty/unwanted lines. Problem solved. I'm going to ask one more time (!!): Post valid data input examples and wanted output examples. |
druuna, it was just an example. Please try to read my previous post (#30), ignoring all the details I gave up to that point. This way you will understand why only using fgrep will never filter out the data as desired.
And as I said in post #21, I am using fgrep as an intermediary filtering tool because it is much faster especially when working with live packets. This way I offload a lot of processing from awk by running fgrep first and thinning out the packet information. I already obtained what I wanted by using timestamps as RS. Should have done this from the beginning, it was my mistake I did not think about it. The example I provided in my previous post (#30) reflects precisely the way my data is formatted before applying the filtering. The real data is basically the same, the only difference is that the lines look a bit uglier. |
Ok ... so here is a little curve ball based on the most recent information.
If I understand correctly, once fgrep has been run we end up with data which is separated by both blank lines and dashes (--) Assuming I am on the right track we could then instigate an RS which is the same and hence simply search for records containing STRING Using the example in post #30: Code:
awk '/STRING/' RS="(--|\n)\n" |
And we are back at the example in post #1 (which is basically the same as the example in post #30).
Use Code:
$ awk 'BEGIN{ RS="\n\n" } $0 ~ /STRING/' input Code:
timestamp 1 |
Yes, it works just fine except for the fact that there is no longer a single black line between the end of a previous packet and the beginning of a new one. Firstly this makes the individual packet output more difficult to read.
Secondly, on the data stream, from time to time there are remnants from other packets like so: Code:
TIMESTAMP 1 Code:
TIMESTAMP 2 |
My last contribution to this thread, it is pointless without relevant data.
Code:
tcpdump <options> | awk 'BEGIN{ RS="\n\n" } $0 ~ /STRING/ { printf("%s\n\n",$0)}' |
Your and grail's contribution is highly appreciated and I've learned some things from your posts.
I worked on an output from the actual production server which I pasted in a file on my testing server. It appears that that specific data set did not cotain all the possible situations. Only when I retested on the live tcpdump | fgrep stream I could see these discrepancies which occur only here and there but nonetheless break my intended output. It is a lot of raw data through which I must filter and I need to run test after test to check the output. It is nobody's fault here really, I adjust the commands as I do more tests on the raw input and read the output. The data stream on which I want to apply this is changing from packet to packet and there's a lot of them. Now ideally I would have the RS printed but I can manage without the timestamps and in fact It's fine just with the seconds part. And so by testing I found that using the input raw packets timestamps as RS actually produce the most consistent results, I have yet to see any discrepancy in the output. Again, I really do appreciate taking your time to help and if you feel it was for nothing, I assure you it is not the case. I will reference this thread and your and grail's awk syntaxes and explanations in the future when I will need to do stuff with awk. |
Yeah I am starting to get confused too :( Now you are saying that fgrep can return data from 2 timestamps with no blank line in between??
As for the adding of a line between the data: Code:
awk '/STRING/' RS="(--|\n)\n" ORS="\n\n" |
grail, I just tested your syntax and it seems to solve all the issues. I will do some further testing but results look promising.
Sorry for confusing you guys but the input data stream was/is tricky for me too so there is always room for surprises. |
All times are GMT -5. The time now is 02:53 PM. |