mptscsih problem in Debian5 (Lenny) with VMWare ESX3.5
Need help with a recently encountered phenomenon on all our debian 5 based servers within our VMWare ESX3.5 cluster.
They all reported the following problems: Jul 17 09:00:43 athene kernel: [61207.003123] mptscsih: ioc0: attempting task abort! (sc=df262480) Jul 17 09:00:43 athene kernel: [61207.003196] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 05 1a ad 00 00 08 00 Jul 17 09:00:43 athene kernel: [61207.113976] mptscsih: ioc0: task abort: SUCCESS (sc=df262480) Jul 17 09:00:43 athene kernel: [61207.114061] mptscsih: ioc0: attempting task abort! (sc=df262080) Jul 17 09:00:43 athene kernel: [61207.114101] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 93 e3 4d 00 00 08 00 Jul 17 09:00:43 athene kernel: [61207.226139] mptscsih: ioc0: task abort: SUCCESS (sc=df262080) According to our VMWare administrator there were no updates done at all so I guess we can rule that out for now. Here a description of one of the problem-systems athene:~# uname -a Linux athene 2.6.26-2-686 #1 SMP Thu May 28 15:39:35 UTC 2009 i686 GNU/Linux VMWare tools version 3.5.0 build 153875 athene:/# mount /dev/sda1 on / type ext3 (rw,errors=remount-ro) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/sda9 on /home type ext3 (rw) /dev/sda8 on /tmp type ext3 (rw) /dev/sda5 on /usr type ext3 (rw) /dev/sda6 on /var type ext3 (rw) nas01:/vol/backup on /backup type nfs (rw,addr=xxx.xxx.xxx.xxx) /dev/hda on /media/cdrom0 type iso9660 (ro,noexec,nosuid,nodev) athene:/# lsmod Module Size Used by ... ... mptspi 12936 6 mptscsih 21760 1 mptspi mptbase 51108 2 mptspi,mptscsih scsi_transport_spi 19840 1 mptspi scsi_mod 129324 5 libata,sd_mod,mptspi,mptscsih,scsi_transport_spi ... ... athene:~# modinfo mptscsih filename: /lib/modules/2.6.26-2-686/kernel/drivers/message/fusion/mptscsih.ko version: 3.04.06 license: GPL description: Fusion MPT SCSI Host driver author: LSI Corporation srcversion: 1F2950F4B9A626767D8FC96 depends: scsi_mod,mptbase vermagic: 2.6.26-2-686 SMP mod_unload modversions 686 athene:/# modinfo mptspi filename: /lib/modules/2.6.26-2-686/kernel/drivers/message/fusion/mptspi.ko version: 3.04.06 license: GPL description: Fusion MPT SPI Host driver author: LSI Corporation srcversion: F584C8912673447E1C7755A alias: pci:v00001000d00000040sv*sd*bc*sc*i* alias: pci:v0000117Cd00000030sv*sd*bc*sc*i* alias: pci:v00001000d00000030sv*sd*bc*sc*i* depends: mptscsih,scsi_mod,mptbase,scsi_transport_spi vermagic: 2.6.26-2-686 SMP mod_unload modversions 686 parm: mpt_saf_te: Force enabling SEP Processor: enable=1 (default=MPTSCSIH_SAF_TE=0) (int) Our Windows systems on the same ESX cluster (yes it's an ESX HA cluster) don't seem have a problem. The Windows hosts report however that they can't find their disk, but don't see that a a problem :-/ . The debian servers are all based on the same VMWare template! Even with the forums on the internet I haven't been able to figure out what's causing this. Can anyone help me with this issue or point me in good direction? Thanks in advance. Tim van Dijk |
Our solution
In reply to my own post:
It seems that we found the solution to this problem. Hopefully this solution is helpfull to someone with a similar problem. The problem was found within our NetAPP. On our NetAPP we had several scheduled processes. The scheduled snapshots were however the cause of our problem. More specific: 1. Aggregate snapshots 2. Volume snapshot of Datastore-volume We disabled the aggregate snapshots and the datastore volume snapshot. After this we enabled the snapmirror/snapmanager option with our virtual center server. This feature backs-up our VM's with the great NetAPP snapshot functionality but temporarily suspends the vm's while doing it. By suspending the VM (for a second or two) there is no filesystem corruption. To check the scheduling of your NetAPP shapshot-tasks, you can enter te following command on the NetAPP command line interface: Code:
snap sched (for volume snapshots) Code:
snap sched <volume_name> 0 Code:
snap sched -A <aggregate_name> 0 Cheers... |
All times are GMT -5. The time now is 11:53 PM. |