tcp/ip capture analysis tools

acid_kewpie · 09-01-2007, 02:24 PM

howdy,

is anyone able to point me towards tools suitable for analysing tcp/ip captures at a high enough level to be able to suggest whether a connection is faulty or not, for someone who's not clued up on tcp/ip itself? wireshark's expert info's are useful, but they don't even pick up thing's like tcp connection failures and successive retries. sure they tell me an open was attempted, but not if looks like it never opened etc...

basiocally i have 2000 adsl sites i'm responsible for and *really* need something to push back to the lower levels to stop them chucking fault's at me as they've run out of ideas / knowledge with the application in question and just blame my beloved network. they really would need happy face / sad face level...

Road_map · 09-01-2007, 03:16 PM

Take a look here:
http://sectools.org/sniffers.html
http://www.linuxlinks.com/Software/N...urity/Sniffers

or try one of these older apps:
http://www.thedumbterminal.co.uk/software/sniff.shtml
http://www.laurentconstantin.com/en/netw/netwox/

unSpawn · 09-02-2007, 05:58 AM

If you don't mind me asking

Quote:

Originally Posted by acid_kewpie

they've run out of ideas / knowledge with the application in question

What's the application or what is it supposed to provide?
Are these captures made centrally or at the end-user/application end?
Pcaps made continuously, triggered or manually after the shit hit the fan?
Any other local diags done like for instance tcptraceroute?
Any centralised network perf tools already in use?
Who or what processes the pcaps?

Quote:

Originally Posted by acid_kewpie

wireshark's expert info's are useful, but they don't even pick up thing's like tcp connection failures and successive retries. sure they tell me an open was attempted, but not if looks like it never opened etc...

Why? Say if you filter TCP traffic for one session between local and remote IP, set the time starting point at first SYN and the end time at $STACK_TIMEOUT (as opposed to say a browser's request timeout value), and if within that period there's no reply, then that's it, right? Or not?.. Is the real "problem" maybe that everything is swell FDDI level and up and they mainly encounter shit only at L7?

Road_map · 09-02-2007, 07:56 AM

Edited without special reasons. My reply was simply not useful.

unSpawn · 09-02-2007, 08:10 AM

Quote:

Originally Posted by Road_map

where in my reply, who deleted it and why.

There is a chasm
of carbon and silicon
this software can't bridge.

Road_map · 09-02-2007, 08:47 AM

Edited without special reasons. My reply was simply not useful.

pixellany · 09-02-2007, 09:22 AM

Road map;
It sounds like you should be sending PMs direct to a moderator. If you are going to put this in front of everyone, please have the courtesy to tell us what the )(*^&$(*&Y)(* you are talking about!!!

unSpawn · 09-02-2007, 09:45 AM

Quote:

Originally Posted by Road_map

There is no chasm, and acid_kewpie knows that, but if this is your answer, it's ok. It's just not fair.

My "answer" was meant jocularly, mainly because there was no trace at all of any deletions nor what you where on about. Pix is right, if there's any problem with deletions then it's a discussion between you and the moderator. That should be handled outside of the LQ fora by email.

acid_kewpie · 09-02-2007, 11:52 AM

Well the app is literally anything at all that runs on our network. our general IT support staff are sadly far too unskilled to really ever know what they're looking at and as is so often the way, if they don't understand why things aren't working with any application at all, they try to blame the network acnd shift it over to me. i then spend an hour at a time inspecting the connection, checking counters and packet captures before i say "there is no sign of any network problem". 10 of these, and my day is totally wasted. i need a way to push back the responsibility of proving to a reasonable level of confidence that a fault is indeed to networks fault, i.e. lost packets, unopened tcp connections, excess retransmissions, which is what i'm looking for. as they literally don't know what an subnet mask even is though, i need a way to hopefully reduce things to traffic lights or something other generally insultingly patronizing level of answer.

my plan would be to do some remote capture of the traffic in the site in question whilst they directly recreate a failure (something *I* as a generic network bod have no business messing with) and then they get a "rating" on the resultant capture. i can rig up winpcap and wireshark and such to get a capture, but it's complicated stuff to someone who isn't trained... and complicated to some who are at times. now wireshark's "expert info's" do appear to be just about *exactly* what i'm after, BUT they don't pick up enough things, posisbly just down to a smaller than desirable database of things to look for...

essentially, i need to be given calls that *ARE* network problems, not ones that *MIGHT* be network problems, as the different between those two is a full time replica of me.

in terms of things already running, well i do already poll snmp data from the adsl routes in each site, which is a start, but it's the acute reactive angle i'm looking at really. also have plans for proactive stuff of my own, but if they say "application X can't do this process" i want proof that i have to care.

Road_map, i unhid your post having hidden it earlier, as it wasn't really pitching at the level i was looking for. thanks.

acid_kewpie · 09-02-2007, 12:02 PM

there is a pretty sexy looking box from netqos which does include the capacity to hang of a span port of a tap and watch ip traffic on a wan link and make recommendations if it's seeing a lot of latency on a certain application, intermittens icmp responses from certain ip's etc... i.e. the kind of thing i'm looking for, but as the failing system are on the other side of the WAN, on an ADSL line at that, we'd be looking for the KO punch of capturing the traffic on the client machine itself, to remove all possible doubt.

certainly a nice looking appliance though...

jiml8 · 09-02-2007, 12:11 PM

Sounds like a pretty tough problem to me. To identify physical layer problems, it would seem to me that you have to have data collection software running on every node of the network you wish to monitor.

After all, if A is connected to B, and A tries to talk to C through B and the A<=>B connection fails, A will know this but B won't since as far as B is concerned there never was an attempt at a connection from A.

The only way you catch that is if A tells you.

Now, you could set up something like tcpdump on every node. Not a trivial undertaking though, and would have a visible impact on network performance.

You also might try establishing a database in which you post complete reports of every such failure, including source, destination, and route. Over time, you'll establish enough datapoints to do a statistical analysis which COULD point you to an intermittent problem. Actually, you probably ought to be doing that anyway.

Beyond that, is there any evidence that your network is decrepit? Places where there has been moisture intrusion or some such?

If not - if there genuinely isn't a solid reason to suspect the network - then you need to have a meeting with your bosses and with IT staff who are passing these things on to you to point out the problem it is causing you and work on a plan to abate the problem.

jiml8 · 09-02-2007, 12:14 PM

Quote:

Originally Posted by acid_kewpie

there is a pretty sexy looking box from netqos which does include the capacity to hang of a span port of a tap and watch ip traffic on a wan link and make recommendations if it's seeing a lot of latency on a certain application, intermittens icmp responses from certain ip's etc... i.e. the kind of thing i'm looking for, but as the failing system are on the other side of the WAN, on an ADSL line at that, we'd be looking for the KO punch of capturing the traffic on the client machine itself, to remove all possible doubt.

certainly a nice looking appliance though...

You need that box anyway.

acid_kewpie · 09-02-2007, 12:41 PM

well it's not necessarily problems with any layer per se, just that in the most cases when i get into packet analysis i usually see somethign like an http server responding to an http client with an HTTP 200 message and no payload, then closely follwed by a very polie FIN, FIN/ACK, ACK closing handshake. clearly *not* a network issue in my book when nothign else looks untoward either.

the entire length of the network connectivity is my problem in the main.. the ADSL, the WAN, the LAN's at either end, so it's not a case of fault finding which part of the network is at fault, just that there is a fault somewhere between the two endpoints of the application in question.

acid_kewpie · 09-02-2007, 12:43 PM

Quote:

Originally Posted by jiml8

You need that box anyway.

well we're thinking about it in the main. we can do netflow analysis with a decent commercial product (ntop really doesn't cut it at all...) for about £2k. we can then get something like netqos for about £15k which is a huge step up, but provides a lot more information using many different protocols and techniques. or we can do what my manager stupidly keeps mentioning and use a product called ipanema which would cost £500k for a starting implementation, despite the fact that it really does look awesome...