LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 05-20-2015, 10:48 AM   #1
srinishrews
LQ Newbie
 
Registered: Mar 2014
Posts: 16

Rep: Reputation: Disabled
Oracle CRS reboots


Hi,
I am running a 6 node 11g RAC cluster with cluster ready services on linux. All the hosts are Virtual machines running RHEL6.6 on ESX 5.5 with dedicated heartbeat for interconnect with Jumbo frames enabled.


The issue that I'm seeing is one or two nodes gets rebooted when CRS can't communicate with the other nodes. I see the below message on /var/log/messages

exec /apps/crs/GRID/11203/perl/bin/perl -I/apps/crs/GRID/11203/perl/lib /apps/crs/GRID/11203/bin/crswrapexece.pl /apps/crs/GRID/11203/crs/install/s_crsconfig_test01_env.txt /apps/crs/GRID/11203/bin/ohasd.bin "reboot"


I checked, the storage IO and cant find any high utilization, it's running at less than 10% all the time. network is 10G, and is not showing errors on the switch. Memory usage(at 60% on an average) and CPU util(less tnan 40%) is normal.

Can somone suggest me what other information would be beneficial to check? I am seeing nothing in the logs as to any errors, or waits for disk writes. I believe it's a software issue, but i'm not sure how to prove it. Any suggestions are appreciated.
 
Old 05-20-2015, 10:54 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,636

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by srinishrews View Post
Hi,
I am running a 6 node 11g RAC cluster with cluster ready services on linux. All the hosts are Virtual machines running RHEL6.6 on ESX 5.5 with dedicated heartbeat for interconnect with Jumbo frames enabled.

The issue that I'm seeing is one or two nodes gets rebooted when CRS can't communicate with the other nodes. I see the below message on /var/log/messages

exec /apps/crs/GRID/11203/perl/bin/perl -I/apps/crs/GRID/11203/perl/lib /apps/crs/GRID/11203/bin/crswrapexece.pl /apps/crs/GRID/11203/crs/install/s_crsconfig_test01_env.txt /apps/crs/GRID/11203/bin/ohasd.bin "reboot"

I checked, the storage IO and cant find any high utilization, it's running at less than 10% all the time. network is 10G, and is not showing errors on the switch. Memory usage(at 60% on an average) and CPU util(less tnan 40%) is normal.

Can somone suggest me what other information would be beneficial to check? I am seeing nothing in the logs as to any errors, or waits for disk writes. I believe it's a software issue, but i'm not sure how to prove it. Any suggestions are appreciated.
Since you're in a well-supported environment (RHEL 6.6, ESX, and Oracle 11g), you are paying for support from ALL of those vendors. The best way to diagnose this problem, is to contact Oracle. They can have you run a trace, and analyze it. If they don't find something, an SOS report to Red Hat might, and barring either of those bearing fruit, you can then present your findings to VMWare.
 
Old 05-20-2015, 11:02 AM   #3
srinishrews
LQ Newbie
 
Registered: Mar 2014
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by TB0ne View Post
Since you're in a well-supported environment (RHEL 6.6, ESX, and Oracle 11g), you are paying for support from ALL of those vendors. The best way to diagnose this problem, is to contact Oracle. They can have you run a trace, and analyze it. If they don't find something, an SOS report to Red Hat might, and barring either of those bearing fruit, you can then present your findings to VMWare.
Thanks for the reply. I forgot to mention that we have don't have support from Oracle. I contacted Redhat and VMWARE and they got back to me with no findings asking me to contact oracle. We do have some kind of a third party for oracle support but they are not much of a help either. They went through the logs and tell me that it was a network issue as the logs say CRS rebooted the node cause the interconnect is not reachable.

So i was trying to find out if any of the experts here faced similar issues and may be give me some ideas where to start troubleshooting.
 
Old 05-20-2015, 12:24 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,636

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by srinishrews View Post
Thanks for the reply. I forgot to mention that we have don't have support from Oracle. I contacted Redhat and VMWARE and they got back to me with no findings asking me to contact oracle. We do have some kind of a third party for oracle support but they are not much of a help either. They went through the logs and tell me that it was a network issue as the logs say CRS rebooted the node cause the interconnect is not reachable.

So i was trying to find out if any of the experts here faced similar issues and may be give me some ideas where to start troubleshooting.
The most telling thing is that Red Hat and VMWare told you there weren't any problems, which leaves you with Oracle. The question here is WHY on earth would you run Oracle RAC without support, when you're paying for support on everything else????

Oracle can easily tell you what's up. Pay for support, and ask them. If your 'third party' won't help you, then don't pay them, since they're of no use. Pay Oracle directly. An Oracle trace will tell you what's up...could very well be there is a kernel module that's older (or NEWER) in RHEL that is causing a problem...are you patched/current with RHEL?
 
  


Reply

Tags
esx, linux, oracle rac, redhat, vmware



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
can't evaluate CRS 12298 an other problems! cousinlucky Linux - General 0 02-21-2015 10:44 AM
Make block device names persistent between reboots on Oracle Linux vahab Linux - General 1 06-25-2013 11:39 PM
Linux version 2.6.32-220.el6.x86_64 - reboots with Oracle user mmanickaraj Linux - Enterprise 0 12-09-2012 08:45 AM
Oracle CRS rebooting node tmy Linux - Server 1 11-02-2010 08:33 AM
how to delete CRs? zelos Programming 6 11-14-2005 02:46 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 08:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration