LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   network keeps dropping on two NIC cards (https://www.linuxquestions.org/questions/linux-networking-3/network-keeps-dropping-on-two-nic-cards-769701/)

mydogspot 11-17-2009 10:19 AM

network keeps dropping on two NIC cards
 
I have 3 HP Proliant 585 G5's set up to be an oracle rac cluster. They are all supposed to be identical, and all running Redhat 2.6.18-128.4.1.el5. They each have an internal Intel e1000e NIC and an additional Broadcomm NIC, for a total of 4 network ports. I have a public IP address assigned and a private vlan IP address assigned to each server. In theory, each of these is bonded as follows -

intel (eth0 on bond0, eth1 on bond1)
broadcomm (eth2 on bond0, eth3 on bond1)

bonds are all set up to be failover

Here is the problem -

Network connectivity on one of the 3 servers keeps dropping. I've run the following configurations, and no matter how they are configured, network keeps dropping -


bond0/bond1 -- both running on intel (eth0/eth1)
bond0/bond1 -- removed eth1 from bond1, so it is running on eth0/eth3
bond0/bond1 -- removed eth0 from bond0, so all on the broadcomm NIC

completely removed the bonds, and tried just running off of

eth0 public eth1 private
eth2 public eth1 private

eth2 public eth3 private

basically I've tried every possible configuration, yet no matter what, both connections will drop at exactly the same time. There is nothing being reported in the logs. sosreport just reports that there is no connectivity --


SOS report -
Nov 9 16:50:53 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 9 16:50:53 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 00:31:26 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 00:31:26 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 02:10:32 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 02:10:32 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 03:19:54 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 03:19:54 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 11:12:42 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 11:12:42 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond1: link is not ready Nov 10 14:51:37 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 14:51:37 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond1: link is not ready



Can anyone please give me more pointers on what could be wrong? I've swapped out the broacomm NIC with a new NIC. I'm now thinking about swapping the hard drives between two of the servers to see if the problem moves, indicating a software problem. I've moved the ports on the switches. These servers need to go into production and I can't put this box into production like this.

thanks!!!!!

cardy 11-17-2009 10:36 AM

OK I have setup a number of configurations like this as this is what the company I work for specialise in among other things.

You have 4 network ports and looking at your docs I am assuming your using eth0 -> eth3.

I am also assuming that eth0 & eth1 are the onboard and that eth2 and eth3 are the broadcom (i.e. the add in card)

I would not like to rely on a single chipset in a RAC configuration so I would bond one port of each pair of cards as below

Quote:

bond0: eth0 and eth2
bond1: eth1 and eth3
You need to check the bonding information last time I checked round robin was the simplest and most supported form for RAC. Some methods of bonding require support from the switches and or switch configuration to work correctly.

Ensure that your cables are properly connected and ensure you have the same configuation on ALL nodes in the cluster you can't have the public on bond0 on node 1 and the public on bond1 on node 2.

You should have 3 IP addresses.

I am assuming that All nodes are setup as follows

Quote:

bond0 PUBLIC network
bond1 PRIVATE Network (Should be connected to a switch and should NOT be a crossover cable as they are not supported)

You will need 3 IPs per Node:

Quote:

Public IP (should be assigned to bond0)
Private IP (should be assigned to bond1)

Virtual IP ( Should be linked to bond0 but NOT set on the OS as its handled by the clusterware).
Finally looking at the output you have provided I would recommend that you check the switches and cabling as the switches don't seem to be patched to the cards correctly. I would have expected to have seen at least one entry saying

Quote:

kernel: ADDRCONF(NETDEV_UP): bond0: link is ready
To me that shows that the cards don't even see the switches into which they are patched (???)


All times are GMT -5. The time now is 03:31 PM.