LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices


Reply
  Search this Thread
Old 07-16-2009, 02:37 AM   #1
amir_myself
LQ Newbie
 
Registered: Jul 2009
Posts: 6

Rep: Reputation: 0
RHEL Kernal booting error


Dear All,

I have two machine running RedHAT Linux and attached to EMC SAN storage and running an oracle cluster active active. Yesterday, One of the machine was down
due to power fluctation. This is multiprocess machine model 6800 power edge and using (REDHAT ENTERPRISE LINUX AS (2.4.21-37.ELsmp) ) Kernal by default.
the machine gets hang on stuff below. It seems to me a HBA driver loading problem. But when i use different kernel (REDHAT ENTERPRISE AS-UP (2.4.21-37.EL))
it boots the machine and attache to storage. But i cannot run my oracle cluster stuff on it becasue the kernel should be (REDHAT ENTERPRISE LINUX AS (2.4.21-37.ELsmp) )
Please help me how can i pass thorugh the booting process using kernal (REDHAT ENTERPRISE LINUX AS (2.4.21-37.ELsmp) ) . Please find below the error detail
when using kernel ELsmp.

error message detail
--------------------

loading megaraid_sas.o module
/lib/megaraid_sas.o

Hint: insmod errors can be caused by inccorect module parameter,including invalid
I/O or IRQ paramegter. you may find more information in syslog or the output from dmesg.

error: /bin/insmod exited abnormally loading lpfc.o module

Machine Exception 0000000000000000000004



I will be very thankfull to you.
Regards,
Amir
 
Old 07-16-2009, 03:06 AM   #2
vap16oct1984
Member
 
Registered: Jun 2009
Location: INDIA
Distribution: RHEL-5
Posts: 174
Blog Entries: 3

Rep: Reputation: 38
Thumbs up

Hi Amir,
Welcome to Linux world.
Can you tell me few things about the about this problem that help us to give you better solution.

Quote:
Originally Posted by amir_myself View Post
Dear All,

I have two machine running RedHAT Linux and attached to EMC SAN storage and running an oracle cluster active active. Yesterday, One of the machine was down
due to power fluctation. This is multiprocess machine model 6800 power edge and using (REDHAT ENTERPRISE LINUX AS (2.4.21-37.ELsmp) ) Kernal by default.
the machine gets hang on stuff below. It seems to me a HBA driver loading problem.
Is both your Linux machine have same configuration,same hardware, same setup??? I mean two boxes have same setup and configuration.

As you have said one server goes down means there is no issue with
other linux machine.

When your linux machine is Hangout? At the time of booing or after booting when some of your process is running???

Quote:
But when i use different kernel (REDHAT ENTERPRISE AS-UP (2.4.21-37.EL))
it boots the machine and attache to storage. But i cannot run my oracle cluster stuff on it becasue the kernel should be (REDHAT ENTERPRISE LINUX AS (2.4.21-37.ELsmp) )
Please help me how can i pass thorugh the booting process using kernal (REDHAT ENTERPRISE LINUX AS (2.4.21-37.ELsmp) ) .
Why you used different kernal? Have you face this issue first time?
I mean previously before you face the problem is every thing running fine with same kernal?


I/O or IRQ paramegter. you may find more information in syslog or the output from dmesg.

Give me the logs of dmesg and syslog.

Thanks
 
Old 07-16-2009, 04:34 AM   #3
amir_myself
LQ Newbie
 
Registered: Jul 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Dear vap16oct1984,

Thanks for your quick replay.

1) Regarding for your first question Is both your Linux machine have same configuration,same hardware, same setup??? I mean two boxes have same setup and configuration.

Answer: Yes, Both the servers have the same configuration,hardware and same setup.

2) As you have said one server goes down means there is no issue with
other linux machine.

Answer: Yes, the one machine is working fine and no issues.

3) When your linux machine is Hangout? At the time of booing or after booting when some of your process is running???

Answer: My machine is hangout at the time of booting it doesn't show any process before that it hangs. You know when count down start on kernel selection screen after that it try to load the drivers then it give me the below messages and doesn't go to the process

loading megaraid_sas.o module
/lib/megaraid_sas.o
Hint: insmod errors can be caused by inccorect module parameter,including invalid
I/O or IRQ paramegter. you may find more information in syslog or the output from dmesg.
error: /bin/insmod exited abnormally loading lpfc.o module
Machine Exception 0000000000000000000004

Note: it says that you can find more information from dmesg or syslog but it doesn't boot how can i get that information. Even i tryied with giving option in kernel 1 to boot as a single user the same above error appear. But any how please find attached dmesg file of EL kernel. but i could not locate the syslog file.


4) Why you used different kernal? Have you face this issue first time?
I mean previously before you face the problem is every thing running fine with same kernal.

Answer: I just use the different kernel which is EL to just check whether the problem is with HBA or something else and it booted but i don't want to use the EL kernel as you know my machine is multiprocessor machine and my oracle application service doesn't run on that kernel that kernel is for singl processor machine. Every thing was running fine with ELsmp kernel prviously.

If you need more information please let me know. I really apperciate your help.

Thanks a lot.
Regards,
Amir
Attached Files
File Type: txt dmesg.txt (14.6 KB, 16 views)
 
Old 07-16-2009, 04:40 AM   #4
amir_myself
LQ Newbie
 
Registered: Jul 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Please find attached the dmesg file of the server which is working fine with ELsmp kernel.

Thanks.
Attached Files
File Type: txt dmesg.txt (35.3 KB, 17 views)
 
Old 07-17-2009, 06:38 AM   #5
vap16oct1984
Member
 
Registered: Jun 2009
Location: INDIA
Distribution: RHEL-5
Posts: 174
Blog Entries: 3

Rep: Reputation: 38
Yes i checked your dmesg looks ...its issue between kernal and drivers module.Some of modules are not properly installed thats way this issue u faced ....

I request you pls reinstall the OS. I hope it will start working agian.
Let me know the status after u reinstall.
 
Old 07-17-2009, 05:22 PM   #6
exkor5000
Member
 
Registered: Nov 2003
Distribution: Slackware
Posts: 51

Rep: Reputation: 16
Quote:
Some of modules are not properly installed thats way this issue u faced ....
I request you pls reinstall the OS
Why reinstall the OS??? When you change oil in your car do you replace your windows, doors, and seats?

Amir, first of all try to reinstall the modules/drivers for your fiber, MegaRAID, and MegaSAS, see if that helps. The reason why you could not use your Oracle with the EL kernel is probably related to missing modules/drivers that are present with the ELsmp kernel but are not compiled for the EL kernel. It's hard to see what the problem is without more info. Copy and paste your /var/log/messages and /var/log/failog. If the two machines are identical just image one to the other with dd.

In any way what I think is problematic from your dmesg log is this:

Code:
megasas: PCI hotplug regisration failed
Code:
SCSI device sdc: 555745280 512-byte hdwr sectors (284542 MB)
 sdc:<6>Device 08:20 not ready.
 I/O error: dev 08:20, sector 0
Device 08:20 not ready.
 I/O error: dev 08:20, sector 0
 unable to read partition table
SCSI device sdd: 307200 512-byte hdwr sectors (157 MB)
 sdd: sdd1
SCSI device sde: 204800 512-byte hdwr sectors (105 MB)
 sde:<6>Device 08:40 not ready.
 I/O error: dev 08:40, sector 0
Device 08:40 not ready.
 I/O error: dev 08:40, sector 0
 unable to read partition table
...
Double check all you configs and especially logs for you megasas/megaraid for more details.
 
Old 07-17-2009, 05:41 PM   #7
exkor5000
Member
 
Registered: Nov 2003
Distribution: Slackware
Posts: 51

Rep: Reputation: 16
actually why don't you post the dmesg when you load the EL kernel, lets see the difference...
 
Old 07-18-2009, 01:06 AM   #8
vap16oct1984
Member
 
Registered: Jun 2009
Location: INDIA
Distribution: RHEL-5
Posts: 174
Blog Entries: 3

Rep: Reputation: 38
Well exkor5000, reinstallation is best method when you cant do much more and especially when
you are not sure about your issue. I am not saying this is the only method but you can't deny this is the one of most effective method in some extreme scenario where our logic's and mind's got hanged.

Here in the current scenario The OP has not able to boot the machine at all. So, i don't this you
got the logs. More ever as my experience i think this is the best quick method to resolve the issue for this scenario.
 
Old 07-18-2009, 01:08 AM   #9
amir_myself
LQ Newbie
 
Registered: Jul 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Sorry, Actually I posted the EL kernel dmesge file but by mistake i said ELsmp. Because ELsmp is not working from the boot itself and giving the error which i menstion in my first post.
Regarding the installation of HBA driver it's working fine with EL kernel because i can see the storage. but due to some configuration related to oracle it should see ELsmp kernel because this is not normal oracle installation it's cluster oracle installation. So, it record also the kernel while installation.

Please guide me How can i reinstall the lpfc.o module for ELsmp kernel sitting on EL kernal.

And this is a production system i cannot reinstall the O.S. That is the last option but i don't want to go for that without trying other methods.

Thanks a lot for your people help.

Regards,
Amir
 
Old 07-18-2009, 10:50 AM   #10
exkor5000
Member
 
Registered: Nov 2003
Distribution: Slackware
Posts: 51

Rep: Reputation: 16
To recompile the modules all you need to do is go to the kernel source directory (people usually put it in /usr/src/linux-xxxxx) and issue these commands:
Code:
make modules
make modules_install
This is assuming you compiled your kernel this way from source in the first place.

There is a faster way you can try:

go to /lib/modules and make a copy of you ELsmp kernel modules:
Code:
cp -R ./<ELsmp_KERNELVERSION> ./<ELsmp_KERNELVERSION>.bak
Then copy the fiber, megaSAS, and megaRaid modules (all the three) from ./lib/modules/<EL_KERNELVERSION> to /lib/modules/<ELsmp_KERNELVERSION>.

Then reboot with ELsmp, what happends?
 
Old 07-18-2009, 10:54 AM   #11
exkor5000
Member
 
Registered: Nov 2003
Distribution: Slackware
Posts: 51

Rep: Reputation: 16
Vap the problem here is with kernel modules it is clear as water.
It is better to try and isolate the problem, the worst is reinstalling the kernel not the OS. Plus you learn absolutely nothing by just reinstalling every time a problem pops up.
 
Old 07-19-2009, 04:24 AM   #12
amir_myself
LQ Newbie
 
Registered: Jul 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Dear exkor5000,

I tried your second method which you menstion below but no success. The same error message i recieved

Hint: insmod errors can be caused by inccorect module parameter,including invalid
I/O or IRQ paramegter. you may find more information in syslog or the output from dmesg.
error: /bin/insmod exited abnormally loading lpfc.o module


(
There is a faster way you can try:
go to /lib/modules and make a copy of you ELsmp kernel modules:

Code:
cp -R ./<ELsmp_KERNELVERSION> ./<ELsmp_KERNELVERSION>.bakThen copy the fiber, megaSAS, and megaRaid modules (all the three) from ./lib/modules/<EL_KERNELVERSION> to /lib/modules/<ELsmp_KERNELVERSION>.
)

Note: even i copied the fiber, megaSAS, and megaRaid modules from my working machine ELsmp kernel still gave me the same error message.

As this is a production system and i did not execute your first option.
I really appericate your help. Please advise me what to do next.

Thanks.
Kind Regards,
Amir
 
Old 07-19-2009, 10:30 AM   #13
exkor5000
Member
 
Registered: Nov 2003
Distribution: Slackware
Posts: 51

Rep: Reputation: 16
ok then try to load the modules one by one manually see what happends.

Boot the system with EL kernel.
Turn off megaSAS, megaRaid, and fiber from booting in the current RC script.
You can do it either with this command:
Code:
ntsysv
or
Code:
chkconfig <name> <on|off>
list all runlevels:
Code:
chkconfig --list
Then boot the system with ELsmp.
When you login, try to load the modules by hand using:
Code:
modprobe <name>
That way you are loading them without any parameters, so let's see if that's the problem. Errors and messages will be also logged so if you get kernel panic you can reboot and read the logs.

If that's not the problem then you most likely have something wrong with IRQ table in your ELsmp kernel.
 
Old 07-19-2009, 11:32 PM   #14
exkor5000
Member
 
Registered: Nov 2003
Distribution: Slackware
Posts: 51

Rep: Reputation: 16
one note, if the OS sitting on the RAID/SAS don't disable megaRAID or megaSAS...
 
Old 07-20-2009, 06:39 AM   #15
amir_myself
LQ Newbie
 
Registered: Jul 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Dear,

Please find attached file which contain output of chkconfig --list on my server which is not working from ELsmp. This command is executed from EL kernel.

Actually, I am not sure which module need to off from list which attached in my file.
Please guide me.

Thanks a lot
Attached Files
File Type: txt chkconfig.txt (5.6 KB, 11 views)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RHEL 5 - Booting problem - error loading shared library c00kie88 Linux - Enterprise 1 06-04-2009 09:46 PM
Installed RHEL in external HD.Kernal Panic error z96 Linux - Software 3 03-28-2007 01:08 AM
kernal Panic while installing rhel 3 Mclaren Red Hat 2 03-23-2007 12:03 AM
Booting Puppy Linux from Fedora shows -"kernal panic error" why? senthilarumugam Puppy 1 02-23-2007 08:49 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat

All times are GMT -5. The time now is 09:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration