Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi,
I am running a 6 node 11g RAC cluster with cluster ready services on linux. All the hosts are Virtual machines running RHEL6.6 on ESX 5.5 with dedicated heartbeat for interconnect with Jumbo frames enabled.
The issue that I'm seeing is one or two nodes gets rebooted when CRS can't communicate with the other nodes. I see the below message on /var/log/messages
I checked, the storage IO and cant find any high utilization, it's running at less than 10% all the time. network is 10G, and is not showing errors on the switch. Memory usage(at 60% on an average) and CPU util(less tnan 40%) is normal.
Can somone suggest me what other information would be beneficial to check? I am seeing nothing in the logs as to any errors, or waits for disk writes. I believe it's a software issue, but i'm not sure how to prove it. Any suggestions are appreciated.
Hi,
I am running a 6 node 11g RAC cluster with cluster ready services on linux. All the hosts are Virtual machines running RHEL6.6 on ESX 5.5 with dedicated heartbeat for interconnect with Jumbo frames enabled.
The issue that I'm seeing is one or two nodes gets rebooted when CRS can't communicate with the other nodes. I see the below message on /var/log/messages
I checked, the storage IO and cant find any high utilization, it's running at less than 10% all the time. network is 10G, and is not showing errors on the switch. Memory usage(at 60% on an average) and CPU util(less tnan 40%) is normal.
Can somone suggest me what other information would be beneficial to check? I am seeing nothing in the logs as to any errors, or waits for disk writes. I believe it's a software issue, but i'm not sure how to prove it. Any suggestions are appreciated.
Since you're in a well-supported environment (RHEL 6.6, ESX, and Oracle 11g), you are paying for support from ALL of those vendors. The best way to diagnose this problem, is to contact Oracle. They can have you run a trace, and analyze it. If they don't find something, an SOS report to Red Hat might, and barring either of those bearing fruit, you can then present your findings to VMWare.
Since you're in a well-supported environment (RHEL 6.6, ESX, and Oracle 11g), you are paying for support from ALL of those vendors. The best way to diagnose this problem, is to contact Oracle. They can have you run a trace, and analyze it. If they don't find something, an SOS report to Red Hat might, and barring either of those bearing fruit, you can then present your findings to VMWare.
Thanks for the reply. I forgot to mention that we have don't have support from Oracle. I contacted Redhat and VMWARE and they got back to me with no findings asking me to contact oracle. We do have some kind of a third party for oracle support but they are not much of a help either. They went through the logs and tell me that it was a network issue as the logs say CRS rebooted the node cause the interconnect is not reachable.
So i was trying to find out if any of the experts here faced similar issues and may be give me some ideas where to start troubleshooting.
Thanks for the reply. I forgot to mention that we have don't have support from Oracle. I contacted Redhat and VMWARE and they got back to me with no findings asking me to contact oracle. We do have some kind of a third party for oracle support but they are not much of a help either. They went through the logs and tell me that it was a network issue as the logs say CRS rebooted the node cause the interconnect is not reachable.
So i was trying to find out if any of the experts here faced similar issues and may be give me some ideas where to start troubleshooting.
The most telling thing is that Red Hat and VMWare told you there weren't any problems, which leaves you with Oracle. The question here is WHY on earth would you run Oracle RAC without support, when you're paying for support on everything else????
Oracle can easily tell you what's up. Pay for support, and ask them. If your 'third party' won't help you, then don't pay them, since they're of no use. Pay Oracle directly. An Oracle trace will tell you what's up...could very well be there is a kernel module that's older (or NEWER) in RHEL that is causing a problem...are you patched/current with RHEL?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.