network keeps dropping on two NIC cards
I have 3 HP Proliant 585 G5's set up to be an oracle rac cluster. They are all supposed to be identical, and all running Redhat 2.6.18-128.4.1.el5. They each have an internal Intel e1000e NIC and an additional Broadcomm NIC, for a total of 4 network ports. I have a public IP address assigned and a private vlan IP address assigned to each server. In theory, each of these is bonded as follows -
intel (eth0 on bond0, eth1 on bond1) broadcomm (eth2 on bond0, eth3 on bond1) bonds are all set up to be failover Here is the problem - Network connectivity on one of the 3 servers keeps dropping. I've run the following configurations, and no matter how they are configured, network keeps dropping - bond0/bond1 -- both running on intel (eth0/eth1) bond0/bond1 -- removed eth1 from bond1, so it is running on eth0/eth3 bond0/bond1 -- removed eth0 from bond0, so all on the broadcomm NIC completely removed the bonds, and tried just running off of eth0 public eth1 private eth2 public eth1 private eth2 public eth3 private basically I've tried every possible configuration, yet no matter what, both connections will drop at exactly the same time. There is nothing being reported in the logs. sosreport just reports that there is no connectivity -- SOS report - Nov 9 16:50:53 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 9 16:50:53 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 00:31:26 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 00:31:26 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 02:10:32 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 02:10:32 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 03:19:54 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 03:19:54 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready Nov 10 11:12:42 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 11:12:42 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond1: link is not ready Nov 10 14:51:37 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Nov 10 14:51:37 lxmefprd01 kernel: ADDRCONF(NETDEV_UP): bond1: link is not ready Can anyone please give me more pointers on what could be wrong? I've swapped out the broacomm NIC with a new NIC. I'm now thinking about swapping the hard drives between two of the servers to see if the problem moves, indicating a software problem. I've moved the ports on the switches. These servers need to go into production and I can't put this box into production like this. thanks!!!!! |
OK I have setup a number of configurations like this as this is what the company I work for specialise in among other things.
You have 4 network ports and looking at your docs I am assuming your using eth0 -> eth3. I am also assuming that eth0 & eth1 are the onboard and that eth2 and eth3 are the broadcom (i.e. the add in card) I would not like to rely on a single chipset in a RAC configuration so I would bond one port of each pair of cards as below Quote:
Ensure that your cables are properly connected and ensure you have the same configuation on ALL nodes in the cluster you can't have the public on bond0 on node 1 and the public on bond1 on node 2. You should have 3 IP addresses. I am assuming that All nodes are setup as follows Quote:
You will need 3 IPs per Node: Quote:
Quote:
|
All times are GMT -5. The time now is 03:31 PM. |