LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Debian (http://www.linuxquestions.org/questions/debian-26/)
-   -   mptscsih problem in Debian5 (Lenny) with VMWare ESX3.5 (http://www.linuxquestions.org/questions/debian-26/mptscsih-problem-in-debian5-lenny-with-vmware-esx3-5-a-740747/)

timvandijk039 07-17-2009 06:48 AM

mptscsih problem in Debian5 (Lenny) with VMWare ESX3.5
 
Need help with a recently encountered phenomenon on all our debian 5 based servers within our VMWare ESX3.5 cluster.

They all reported the following problems:

Jul 17 09:00:43 athene kernel: [61207.003123] mptscsih: ioc0: attempting task abort! (sc=df262480)
Jul 17 09:00:43 athene kernel: [61207.003196] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 05 1a ad 00 00 08 00
Jul 17 09:00:43 athene kernel: [61207.113976] mptscsih: ioc0: task abort: SUCCESS (sc=df262480)
Jul 17 09:00:43 athene kernel: [61207.114061] mptscsih: ioc0: attempting task abort! (sc=df262080)
Jul 17 09:00:43 athene kernel: [61207.114101] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 93 e3 4d 00 00 08 00
Jul 17 09:00:43 athene kernel: [61207.226139] mptscsih: ioc0: task abort: SUCCESS (sc=df262080)


According to our VMWare administrator there were no updates done at all so I guess we can rule that out for now.

Here a description of one of the problem-systems

athene:~# uname -a
Linux athene 2.6.26-2-686 #1 SMP Thu May 28 15:39:35 UTC 2009 i686 GNU/Linux


VMWare tools version 3.5.0 build 153875

athene:/# mount
/dev/sda1 on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/sda9 on /home type ext3 (rw)
/dev/sda8 on /tmp type ext3 (rw)
/dev/sda5 on /usr type ext3 (rw)
/dev/sda6 on /var type ext3 (rw)
nas01:/vol/backup on /backup type nfs (rw,addr=xxx.xxx.xxx.xxx)
/dev/hda on /media/cdrom0 type iso9660 (ro,noexec,nosuid,nodev)


athene:/# lsmod
Module Size Used by
...
...
mptspi 12936 6
mptscsih 21760 1 mptspi
mptbase 51108 2 mptspi,mptscsih
scsi_transport_spi 19840 1 mptspi
scsi_mod 129324 5 libata,sd_mod,mptspi,mptscsih,scsi_transport_spi
...
...


athene:~# modinfo mptscsih
filename: /lib/modules/2.6.26-2-686/kernel/drivers/message/fusion/mptscsih.ko
version: 3.04.06
license: GPL
description: Fusion MPT SCSI Host driver
author: LSI Corporation
srcversion: 1F2950F4B9A626767D8FC96
depends: scsi_mod,mptbase
vermagic: 2.6.26-2-686 SMP mod_unload modversions 686

athene:/# modinfo mptspi
filename: /lib/modules/2.6.26-2-686/kernel/drivers/message/fusion/mptspi.ko
version: 3.04.06
license: GPL
description: Fusion MPT SPI Host driver
author: LSI Corporation
srcversion: F584C8912673447E1C7755A
alias: pci:v00001000d00000040sv*sd*bc*sc*i*
alias: pci:v0000117Cd00000030sv*sd*bc*sc*i*
alias: pci:v00001000d00000030sv*sd*bc*sc*i*
depends: mptscsih,scsi_mod,mptbase,scsi_transport_spi
vermagic: 2.6.26-2-686 SMP mod_unload modversions 686
parm: mpt_saf_te: Force enabling SEP Processor: enable=1 (default=MPTSCSIH_SAF_TE=0) (int)




Our Windows systems on the same ESX cluster (yes it's an ESX HA cluster) don't seem have a problem. The Windows hosts report however that they can't find their disk, but don't see that a a problem :-/ .
The debian servers are all based on the same VMWare template!

Even with the forums on the internet I haven't been able to figure out what's causing this.
Can anyone help me with this issue or point me in good direction?

Thanks in advance.

Tim van Dijk

timvandijk039 07-29-2009 04:50 AM

Our solution
 
In reply to my own post:

It seems that we found the solution to this problem. Hopefully this solution is helpfull to someone with a similar problem.

The problem was found within our NetAPP. On our NetAPP we had several scheduled processes. The scheduled snapshots were however the cause of our problem.

More specific:
1. Aggregate snapshots
2. Volume snapshot of Datastore-volume

We disabled the aggregate snapshots and the datastore volume snapshot. After this we enabled the snapmirror/snapmanager option with our virtual center server. This feature backs-up our VM's with the great NetAPP snapshot functionality but temporarily suspends the vm's while doing it. By suspending the VM (for a second or two) there is no filesystem corruption.

To check the scheduling of your NetAPP shapshot-tasks, you can enter te following command on the NetAPP command line interface:

Code:

snap sched      (for volume snapshots)
snap sched -A    (for aggregate snapshots)

To disable a volume snapshot
Code:

snap sched <volume_name> 0
To disable a aggregate snapshot
Code:

snap sched -A <aggregate_name> 0

Cheers...


All times are GMT -5. The time now is 05:28 AM.