Checking raw disks

brendanmcdonald · 04-18-2005, 09:05 PM

Hi, I'm running a LINUX Red Hat ES 3.0 cluster with 2 disks installed in each machine. We also have disks that are shared.

My problem is how do I find the disk information of the local disks ?? How do I find if they're mirrored ??? The cluster disks I'm told are raw disks with Oracle installed on them. How do I check this ???

Any help would be greatly appreciated.

Tinkster · 04-19-2005, 03:22 PM

I assume they're a RAID-system of sorts attached
to a SCSI(RAID) controller? In that case (if they
were an external RAID) they'd represent themselves
to the OS as ONE disk, and there's no way (unless
the controller has Linux management software) to
check it out without physical access... if it's ordinary
SCSI disks you can find them in /proc/scsi/scsi

If you provide a bit more info you may get answers
more suitable to your actual situation, e.g. Hardware
details ...

Cheers,
Tink

brendanmcdonald · 04-20-2005, 01:32 AM

Thatnks for that.

I've found that we have 3 shared disks on an SCSI array (cod1, c0d2 and c0d3). I've found the SCSI raid controllers (cciss). I've found the partition in question which houses the oracle datafiles and is mounted as /dev/cciss/c0d3 and is an ext2 filesystem. See below.

[root@fox_rad1 etc]# lsdev
Device DMA IRQ I/O Ports
------------------------------------------------
ATI 2400-24ff
cascade 4 2
cciss 3000-30ff
cciss0 30
dma 0080-008f
dma1 0000-001f
dma2 00c0-00df
eepro100 4000-403f
eth0 29
eth2 24
fpu 00f0-00ff
ide0 14 01f0-01f7 03f6-03f6
Intel 4000-403f
keyboard 1 0060-006f
Mouse 12
PCI 0cf8-0cff 1800-18ff 2800-28ff 3000-30ff
pic1 0020-003f
pic2 00a0-00bf
rtc 8 0070-007f
serial 03f8-03ff
ServerWorks 0170-0177 01f0-01f7 0376-0376 03f6-03f6 2000-200f
timer 0 0040-005f
usb-ohci 7
vga+ 03c0-03df

[root@fox_rad1 proc]# dmesg | grep cciss

cciss: Device 0xb178 has been found at bus 1 dev 3 func 0
cciss: not using DAC cycles
cciss/c0d0: p1 p2 p3 p4 < p5 p6 p7 p8 p9 >
cciss/c0d1: unknown partition table
cciss/c0d2: unknown partition table
cciss/c0d3: unknown partition table
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,2), internal journal
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,1), internal journal
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,3), internal journal
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,7), internal journal
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,9), internal journal
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,6), internal journal
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on cciss0(104,5), internal journal
cciss0: No device changes detected.
scsi0 : cciss0

/proc/partitions:
=================

major minor #blocks name rio rmerge rsect ruse wio wmerge wsect wuse running use aveq

104 0 35561280 cciss/c0d0 513115 771147 10188614 3832590 73494568 276655809 -1493076556 -1977094516 0 611404480 -1973236436
104 1 101984 cciss/c0d0p1 291 13848 28278 2320 1114 1100 4428 772800 0 429890 775120
104 2 2097120 cciss/c0d0p2 21017 38727 477970 60530 32083034 166073948 1585426056 456015590 0 422275060 456062110
104 3 15361200 cciss/c0d0p3 180685 345530 4209646 3219960 9028387 17769544 214424912 498474760 0 288261960 501692260
104 4 1 cciss/c0d0p4 1 0 2 10 0 0 0 0 0 10 10
104 5 4194224 cciss/c0d0p5 37746 102498 1122102 177100 15545942 24023049 316616720 508766930 0 395514390 508938270
104 6 4194224 cciss/c0d0p6 263838 226838 3925296 344700 7197977 46848927 432796056 488947620 0 295791470 489336790
104 7 2097104 cciss/c0d0p7 8229 38273 371896 18250 7989 32279 324120 9714450 0 581420 9732710
104 8 2097104 cciss/c0d0p8 1074 5392 51722 7300 1592 28022 237016 1347740 0 18360 1355170
104 9 2097104 cciss/c0d0p9 183 29 1576 2260 9628533 21878940 252061432 353842610 0 342442750 353841190
104 16 130560 cciss/c0d1 52673408 14 71046618 17698160 52819425 0 52821945 48126750 0 65676830 65799880
104 32 130560 cciss/c0d2 48210862 14 66584077 16437140 52819425 0 52821945 38342780 0 54716890 54759390
104 48 71000160 cciss/c0d3 3086570 1515176 29848946 1087330 20503454 14568764 280576514 18746270 0 19131260 19825570

Now that I know that the shared disk (i'm assuming it's 3 disks configured to RAID5) has a filesystem, my next problem is that i'm seeing the following error meassage in /var/logs/messages:

Apr 17 04:03:02 fox_rad1 syslogd 1.4.1: restart.
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <crit> clu_check_checksum: expected = 0x192fa090 observed = 0x1ccfa82f
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <emerg> diskLseekRawReadChecksum: bad check sum, part = 0 offset = 15872 len = 36
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <crit> clu_check_checksum: expected = 0x192fa090 observed = 0x816a784c
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <emerg> diskLseekRawReadChecksum: bad check sum, part = 1 offset = 15872 len = 36
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <crit> clu_check_checksum: expected = 0x192fa090 observed = 0x1ccfa82f
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <emerg> diskLseekRawReadChecksum: bad check sum, part = 0 offset = 15872 len = 36
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <emerg> diskRawReadShadow: checksums bad on both partitions
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <err> readServiceBlock: bad ret -1 from diskRawReadShadow
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <err> readSomeServiceBlocks: unable to read service block 11
Apr 17 05:32:44 fox_rad1 cluscand[1366]: <err> scanSomeServiceBlocks: read of service blocks failed.

Could someone please give me some pointers to what this error message means.

As this is a cluster and production system, we don't want to fail the cluster over in case the shared disks don't come up agaian.

Thanks