LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 06-29-2015, 11:36 AM   #1
JockVSJock
Senior Member
 
Registered: Jan 2004
Posts: 1,420
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
Troubleshooting Socket Issues


I'm not sure where the issue is, however I have a RHEL v5 that has an Oracle Database along with a Java Based Application on it.

The end users try to run this Java Based Application from their pc and it takes along time for it to connect. Sometimes it doesn't connect at all. All network traffic is going across the LAN. I've also been able to ping and traceroute from the server back to a pc and vice versa, so I know its not iptables.

The DBA is looking into the database side and I'm looking at the OS side. The error message is referencing an ORA message, however I still want to do my part and make sure its not the OS.

I can see the network connections seem to be ok with the following command:

Code:
netstat -ap | grep ESTA
I'm also looking at socket settings as well, such as setting under /proc/sys/net/ipv4

Code:
[root@foo ipv4]# cat tcp_keepalive_intvl ; cat tcp_keepalive_probes ; cat tcp_keepalive_time
75
9
7200
[root@foo ipv4]#
I've also looked the following values under /proc/sys/net/core

Code:
[root@ameda4aisrx0223 core]# cat rmem_default ; cat rmem_max ; cat wmem_default ; cat wmem_max ; cat optmem_max
4194304
4194304
262144
1048576
10240
[root@ameda4aisrx0223 core]#

Is there anything else that I should look at to troubleshoot, or have I taken it as far as I can take it?

thanks

Last edited by JockVSJock; 06-29-2015 at 02:23 PM.
 
Old 06-29-2015, 02:23 PM   #2
JockVSJock
Senior Member
 
Registered: Jan 2004
Posts: 1,420

Original Poster
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
I ran tcpdump against the interface (eth0) while traffic was being sent to port 1521 and noticed this:

Code:
192.168.50.8.50144 > destination: P, cksum 0xe0c1 (correct), 58632:58653(21) ack 692526 win 11 
13:0310.801291 IP (tos 0x0, ttl 64, id 2774, offset 0, flags [DF], proto: TCP (6), length: 52) 
destination > 192.168.50.8.50144: P, cksum 0xe596 (incorrect (-> 0xef59), 
692526:692538 (12) ack 58653 win 218
I'm not sure what is going on with the incorrect value from the destination server to the 192.168.50.8.

I looked at the values of eth0 using ethtool, and looked at a few blogs online where either checksum offloading or tcpoffloading is turned off, however this is the first time I have seen this, so I'm not sure what would be the best course of action:

Code:
[root@foo core]# ethtool eth0

Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   1000baseT/Full
        Supports auto-negotiation: No
        Advertised link modes:  Not reported
        Advertised auto-negotiation: No
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        Link detected: yes

 
[root@foo core]# ethtool -k eth0

Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off

Last edited by JockVSJock; 06-29-2015 at 02:24 PM.
 
Old 06-30-2015, 09:18 PM   #3
padeen
Member
 
Registered: Sep 2009
Location: Perth, W.A.
Distribution: Slackware, Debian, Gentoo, FreeBSD, OpenBSD
Posts: 208

Rep: Reputation: 41
Isn't that error msg saying that the destination sent a corrupt packet? I may be laying a red herring, but that would tie in with the symptom of long login times while corrupt pkts are discarded until valid ones are received. Have you run the pings for a huge number of datagrams and from the Oracle machine to yours? OTOH ping datagrams may not necessarily show up as corrupt in any case.

I would be looking at the (possibly failing) nic card for replacement on the Oracle machine. Since they're so cheap, it's an easy and quick elimination of a possible contributor to the problem. (Of course, it could be your nic that is calculating the checksum incorrectly or it could be the cable corrupting the data. The joys of network troubleshooting...)
 
Old 06-30-2015, 09:40 PM   #4
JockVSJock
Senior Member
 
Registered: Jan 2004
Posts: 1,420

Original Poster
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
Quote:
Originally Posted by padeen View Post

I would be looking at the (possibly failing) nic card for replacement on the Oracle machine. Since they're so cheap, it's an easy and quick elimination of a possible contributor to the problem. (Of course, it could be your nic that is calculating the checksum incorrectly or it could be the cable corrupting the data. The joys of network troubleshooting...)
Crap, I forgot to mention that this is a VM in VMWare vCenter, not a physical machine. Which introduces a whole level of complexity to the situation.

This VM is shares a data store on a SAN with a number of other VMs, and they don't seem to have any networking issues either, or at least I don't see any issues with them.

I didn't think of the idea of trying to send bigger packets from the Oracle machine to the client. I would have to read up on this because I've never done this before.
 
Old 07-01-2015, 03:29 AM   #5
padeen
Member
 
Registered: Sep 2009
Location: Perth, W.A.
Distribution: Slackware, Debian, Gentoo, FreeBSD, OpenBSD
Posts: 208

Rep: Reputation: 41
No I didn't mean bigger datagrams, I meant more. I was thinking along the lines of a degrading nic that occasionally sends corrupt packets, in which case you would have to capture a lot of them to see this.

As to the VMWare, I can't offer any help as I don't use it.
 
Old 07-01-2015, 10:02 AM   #6
JockVSJock
Senior Member
 
Registered: Jan 2004
Posts: 1,420

Original Poster
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
This is what did to fix the incorrect error.

I changed the driver that was tied to the NIC in RHEL, went from Flexible to VMXNET3 and once doing that we ran the test again and I watched the tcpdump traffic and no longer see the incorrect error. The error we are getting now says: Socket red time out62000

However we are still getting the socket error, however I'm starting to lean more towards that this issue maybe with the software and how it is trying to connect.
 
  


Reply

Tags
redhat, socket, tcp/ip



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Troubleshooting NTP issues, what does echo $? mean? JockVSJock Linux - Newbie 2 12-15-2014 01:50 PM
Lost Pulseaudio while troubleshooting sound issues with Spotify on 12.04 marinecomm Ubuntu 3 06-06-2013 08:39 PM
Troubleshooting authentication issues dantes990 Linux - Newbie 1 04-06-2012 07:28 AM
LXer: Troubleshooting Veritas Cluster Server LLT Issues On Linux and Unix LXer Syndicated Linux News 0 05-30-2008 11:40 AM
x.25 socket programming issues venkat_p257 Linux - General 0 12-02-2007 10:03 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 07:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration