LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-17-2013, 08:58 AM   #31
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387

Quote:
Originally Posted by cosminel View Post
Uhm, sorry druuna but I already explained in my previous post. I will retry:
- fgrep defines a fixed data window containing the STRING but it also includes additional lines because...
- the data packets containing the STRING have a variable number of lines, some packets with STRING are shorter (let's say 10 lines) while other packets with STRING are longer (let's say 26)
Stop right there.

When you use fgrep -B 6 -A 20 as shown in your previous post then there cannot be 26 lines after the STRING!! The window is 6 lines before, the STRING and 20 lines after.

All the info wanted is inside that window, just ignore or remove the empty/unwanted lines. Problem solved.

I'm going to ask one more time (!!): Post valid data input examples and wanted output examples.
 
Old 12-17-2013, 09:08 AM   #32
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
druuna, it was just an example. Please try to read my previous post (#30), ignoring all the details I gave up to that point. This way you will understand why only using fgrep will never filter out the data as desired.

And as I said in post #21, I am using fgrep as an intermediary filtering tool because it is much faster especially when working with live packets. This way I offload a lot of processing from awk by running fgrep first and thinning out the packet information.

I already obtained what I wanted by using timestamps as RS. Should have done this from the beginning, it was my mistake I did not think about it.

The example I provided in my previous post (#30) reflects precisely the way my data is formatted before applying the filtering. The real data is basically the same, the only difference is that the lines look a bit uglier.
 
Old 12-17-2013, 09:20 AM   #33
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,423

Rep: Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823
Ok ... so here is a little curve ball based on the most recent information.

If I understand correctly, once fgrep has been run we end up with data which is separated by both blank lines and dashes (--)
Assuming I am on the right track we could then instigate an RS which is the same and hence simply search for records containing STRING

Using the example in post #30:
Code:
awk '/STRING/' RS="(--|\n)\n"
Also, if druuna is correct that the data is actually terminated with Windows line endings, awk is able to be altered so you can use '\r\n' instead of '\n' if needed
 
Old 12-17-2013, 09:24 AM   #34
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
And we are back at the example in post #1 (which is basically the same as the example in post #30).

Use
Code:
$ awk 'BEGIN{ RS="\n\n" } $0 ~ /STRING/' input
Output
Code:
timestamp 1
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump
timestamp 3
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
And don't tell me that it doesn't work with valid, real data as those two data sets are the same (your words, not mine).
 
Old 12-17-2013, 10:03 AM   #35
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
Yes, it works just fine except for the fact that there is no longer a single black line between the end of a previous packet and the beginning of a new one. Firstly this makes the individual packet output more difficult to read.

Secondly, on the data stream, from time to time there are remnants from other packets like so:
Code:
TIMESTAMP 1
line of text
line of text
(etc)
line of text
STRING
line of text
line of text
(etc)
line of text
TIMESTAMP 2
line of text
line of text
--
TIMESTAMP 3
line of text
line of text
(etc)
line of text
STRING
line of text
line of text
(etc)
line of text
As you can see this part:
Code:
TIMESTAMP 2
line of text
line of text
--
...is extra information that doesn't get filtered out because of the fgrep data window that "cuts" the lines with its marker "--" and leaves no spaces between packets. This is why timestamps as RS are able to filter out the data as intended.
 
Old 12-17-2013, 10:12 AM   #36
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
My last contribution to this thread, it is pointless without relevant data.
Code:
tcpdump <options> |  awk 'BEGIN{ RS="\n\n" } $0 ~ /STRING/ { printf("%s\n\n",$0)}'
# output
timestamp 1
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump

timestamp 3
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
STRING
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
raw data from tcpdump
 
Old 12-17-2013, 10:30 AM   #37
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
Your and grail's contribution is highly appreciated and I've learned some things from your posts.

I worked on an output from the actual production server which I pasted in a file on my testing server. It appears that that specific data set did not cotain all the possible situations. Only when I retested on the live tcpdump | fgrep stream I could see these discrepancies which occur only here and there but nonetheless break my intended output.

It is a lot of raw data through which I must filter and I need to run test after test to check the output. It is nobody's fault here really, I adjust the commands as I do more tests on the raw input and read the output. The data stream on which I want to apply this is changing from packet to packet and there's a lot of them.

Now ideally I would have the RS printed but I can manage without the timestamps and in fact It's fine just with the seconds part.

And so by testing I found that using the input raw packets timestamps as RS actually produce the most consistent results, I have yet to see any discrepancy in the output.

Again, I really do appreciate taking your time to help and if you feel it was for nothing, I assure you it is not the case. I will reference this thread and your and grail's awk syntaxes and explanations in the future when I will need to do stuff with awk.
 
Old 12-17-2013, 10:30 AM   #38
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,423

Rep: Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823
Yeah I am starting to get confused too Now you are saying that fgrep can return data from 2 timestamps with no blank line in between??

As for the adding of a line between the data:
Code:
awk '/STRING/' RS="(--|\n)\n" ORS="\n\n"
 
Old 12-17-2013, 10:40 AM   #39
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
grail, I just tested your syntax and it seems to solve all the issues. I will do some further testing but results look promising.

Sorry for confusing you guys but the input data stream was/is tricky for me too so there is always room for surprises.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed to display the pattern string, the line above it and the first line of that para rockie321 Linux - Newbie 3 04-03-2011 03:48 PM
how to read a blank line ie a string of length 0. ishandutta2007 Programming 3 04-20-2010 06:13 AM
How to display a file, omitting lines that contain a string? Thelionroars Linux - Newbie 12 01-22-2010 08:03 AM
How to grep lines containing a certain string PLUS the line following that line? kmkocot Linux - Newbie 5 09-01-2009 04:54 PM
vim: mark end-of-line? trim blank lines? prell Linux - Software 3 09-21-2004 12:04 AM


All times are GMT -5. The time now is 01:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration