[SOLVED] Adding new storage to RHEL 5.11 HA cluster w/Oracle

Tim Walsh · 07-13-2017, 09:50 AM

SO I found myself with a new job in a mixed Windows / Linux environment where previously I was working primarily with Windows. I'm not foreign to Linux, but am a little rusty.

I have a redhat 5.11 failover cluster hosting (clustered) Oracle. The cluster has been up several years, but we recently added more storage to the shared enclosure. I've presented the storage to both hosts and (Like Windows) the next step would be to partition, format / mount, etc.

I ran fdisk -l and the drives are not listed so (like Windows) I believe I need to rescan the SCSI bus in order to detect the new drives to avoid having to reboot.

I have read that there is SG3 utils for RHEL to set up clustering and that package includes a script to rescan the SCSI bus: rescan-scsi-bus.sh. I suppose I should also ask how can verfy if that was used and how can I tell what clustering mechanism was installed if that wasn't used?

If I run the "rescan the SCSI Bus" script, I want to make sure that does not take any drives offline because this is live production. I think I just need to rescan the bus, partition and write a file system to it, and I'm golden.

Thanks!

Tim

MensaWater · 07-13-2017, 12:24 PM

OK

First off I'll mention RHEL5 is end of life. You should urge the powers that be to move on to at least RHEL6 and RHEL7 if possible. You won't be getting security updates any longer and we've already discovered that things like TLS1.1 and higher simply aren't supported in RHEL5.

Secondly there are many types of clustering. I suspect you may be running Oracle RAC or GRID if what is failing over is a the database. RAC allowed for ocfs (or ocfs2) which are shared filesystems types created by Oracle itself. Also it allowed for ASM which is storage on raw devices rather than file system. (GRID is just an extension of RAC which was an extension of older Oracle Parallel Server).

Other clusters exist including Linux native and Veritas Cluster Suite (VCS). One might use RAC/GRID in tandem with other clustering methodologies or just use the the other clustering methodology to fail over devices.

If
A: You're only using RAC/GRID clustering
B: The devices you use are using for ocfs or ASM are multipath (e.g. Linux native multipath or EMC Powerpath i.e. NOT /dev/sd*)
Then doing the rescan shouldn't trigger any events because only one path at a time would go down on the rescan.

If you are using other clustering then it depends on how the clustering was configured. If the cluster treats the multipath device as critical but NOT the underlying sd devices of that multipath then again it shouldn't trigger a failover. If on the other hand it DOES treat individual sd devices as critical it might trigger a failover because one of the sd paths might go away and return during the rescan (and then the other after the first is back up).

We've done rescan safely on our Oracle RAC/GRID servers on RHEL6 using ASM without causing any sort of failover.

So some things to examine:
1) "ps -ef |grep pmon" - Does this show the SID for your database AND a process for ASM?
2) "cat /etc/mtab" to see what filesystem types you have mounted. Are any ocfs or ocfs2?
3) "ps -ef |grep multipath" - Is multipath daemon running?
4) "multipath -l -v2" - Does it show multipath devices and sd components of same?
5) "who -r" - This shows what run level you are in (usually 3 or 5).
6) "ls -l /etc/rc#.d/S* (Where run level from step 5 is the "#" you use.() - This shows which init scripts are started in the specified run level. Looking through those names may give you an idea of any clustering you may be running.

Installing "sg3_utils" and "lssci" is safe to do even if you don't run the rescan. You can install both with "yum install".

Turning on Linux multipath daemon should NOT be done if you're running another multipath tool like EMC Power Path. It will cause a system panic.

P.S. RedHat is disabling RHN used originally for software updates in favor of RHSM (RedHat Subscription Management) as of 31-Jul. Make sure your RHEL5.11 is already on RHSM before then otherwise things like yum won't find files to download.

Tim Walsh · 07-13-2017, 12:42 PM

Good morning, and thanks for a great response!

I realize 5.11 is EOL and yeah, the DBAs want to upgrade Oracle to 12c and I'm pushing for upgrading the version of Redhat. Like I said I'm new on this job LOL!

The shared storage is a Dell MD3200 (SAS) DAS, not really a -true- SAN.
The /etc/multipath.conf shows the following Dell info uncommented to confirm that
vendor "DELL"
product "MD32xxi"

Oracle processes are running, and the multipath daemon is running, and I have root access.

# ps -ef |grep pmon
oracle 7489 1 0 Jun23 ? 00:00:32 asm_pmon_+ASM1
oracle 7914 1 0 Jun23 ? 00:01:14 ora_pmon_livedb1
root 22593 18529 0 10:37 pts/0 00:00:00 grep pmon
# ps -ef |grep multipath
root 3673 1 0 Jun23 ? 00:04:23 /sbin/multipathd
root 22631 18529 0 10:37 pts/0 00:00:00 grep multipath

lvm listed all the volumes too. (too much to paste)

I'll see if it has the "rescan SCSI bus" command available.

MensaWater · 07-13-2017, 01:05 PM

You didn't list "multipath -l -v2" output to verify the devices are under multipath control.

The ASM pmon suggests you're using ASM vs OCFS.

You can run "lssci" to get a list of devices and which storage they're associated with.

I've not worked with the MD3200 but this page suggests it does allow for Linux multipath configuration and in fact shows configs for RHEL5.x that would likely still be relevant for your RHEL5.11.

It suggests a patch and other utilities (including a rescan utility) for the PowerVault. You may already have that patch with those utilities. Running "rpm -qa" will show all the RPMs you have installed. It appears like this documentation may be a more specific guide for what you're doing.

Tim Walsh · 07-13-2017, 02:55 PM

Here's the multipath output

[root@]# /sbin/multipath -l -v2
multipath.conf line 115, invalid keyword: prio
multipath.conf line 116, invalid keyword: polling_interval
multipath.conf line 118, invalid keyword: path-selector
multipath.conf line 130, invalid keyword: prio
multipath.conf line 131, invalid keyword: polling_interval
multipath.conf line 133, invalid keyword: path-selector
mpath2 (36842b2b000528f8d0000029b4ccead44) dm-1 DELL,MD32xx
[size=280M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:1 sdk 8:160 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:1 sdc 8:32 [active][undef]
mpath1 (36842b2b0005524f8000002a24ccead18) dm-0 DELL,MD32xx
[size=280M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:0:0 sdb 8:16 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:0 sdj 8:144 [active][undef]
mpath8 (36842b2b000528f8d000002a14cceae08) dm-7 DELL,MD32xx
[size=418G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:7 sdq 65:0 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:7 sdi 8:128 [active][undef]
mpath7 (36842b2b0005524f8000002a94cceadda) dm-6 DELL,MD32xx
[size=417G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:0:6 sdh 8:112 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:6 sdp 8:240 [active][undef]
mpath6 (36842b2b000528f8d0000029f4cceadad) dm-5 DELL,MD32xx
[size=100M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:5 sdo 8:224 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:5 sdg 8:96 [active][undef]
mpath5 (36842b2b0005524f8000002a74ccead87) dm-4 DELL,MD32xx
[size=280M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:0:4 sdf 8:80 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:4 sdn 8:208 [active][undef]
mpath4 (36842b2b000528f8d0000029d4ccead63) dm-3 DELL,MD32xx
[size=280M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:3 sdm 8:192 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:3 sde 8:64 [active][undef]
mpath3 (36842b2b0005524f8000002a54ccead3e) dm-2 DELL,MD32xx
[size=280M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:0:2 sdd 8:48 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:2 sdl 8:176 [active][undef]

I guess the lsscsi utility is not native to RHEL or just not installed? (can I: yum install lsscsi to add it?)

but I did get this:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 32 Lun: 00
Vendor: DP Model: BACKPLANE Rev: 1.07
Type: Enclosure ANSI SCSI revision: 05
Host: scsi0 Channel: 02 Id: 00 Lun: 00
Vendor: DELL Model: PERC 6/i Rev: 1.22
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 01
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 02
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 03
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 04
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 05
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 06
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 07
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 31
Vendor: DELL Model: Universal Xport Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 01
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 02
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 03
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 04
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 05
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 06
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 07
Vendor: DELL Model: MD32xx Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 01 Lun: 31
Vendor: DELL Model: Universal Xport Rev: 0770
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: PLDS Model: DVD+-RW DS-8A4S Rev: JD51
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 06 Lun: 00
Vendor: IBM Model: ULTRIUM-HH3 Rev: 93G7
Type: Sequential-Access ANSI SCSI revision: 03

Currently there are 7 logical drives for the existing Oracle installation. Number 8 would be the new one. SO I guess I just need to run the "rescan scsi bus" command?

MensaWater · 07-13-2017, 04:12 PM

Yes you can run "yum install lsscsi" (as I noted in earlier post).

Your multipath -l output confirms the MD32* devices DO have dual paths each. For example:
mpath6 (36842b2b000528f8d0000029f4cceadad) dm-5 DELL,MD32xx
[size=100M][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:5 sdo 8:224 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:5 sdg 8:96 [active][undef]

That shows multipath device, mpath6 (/dev/mapper/mpath6 [and maybe /dev/mpath/path6 since you're on RHEL5] is comprised of the 2 component disks sdo (/dev/sd0) and sdg (/dev/sdg).

If you run "pvs" to see what LVM physical volumes are in use ideally you see the mpath# devices rather than those sd* devices.

As noted in my last post though, the utilities suggested by the document I linked there include a different rescan utility for the MD32* itself so were it me I'd check for the rpm they mention. You can run "rpm -qa |grep rdac" to see if you have one of the packages they mention. On re-checking that page I see that packages is a DKMS which is to load a driver. Since you already have working drives you presumably have a driver that found your disks already.

The sg3_util scan utility MIGHT work on the MD32* stuff but I'm not familiar with that storage enclosure/array. I have used this recan utility with fiber SAN arrays such as Hitachi VSP and Pure Flasharray. Given there is documentation for your MD32* I'd be inclined to go down that path first before trying the sg3_util provided scan.

You can run "lsmod" to see kernel modules loaded and see if you have one like they talk about in the document as I suspect you do.

Tim Walsh · 07-21-2017, 12:52 PM

So this was easier than I thought

Here's what you do, for each host list /sys/class/scsi_host/

[root@oracle]# ls /sys/class/scsi_host/
host0 host1 host2 host3 host4

[root@oracle]# ls /sys/class/scsi_host/
host0 host1 host2 host3

then for each host, scan each "scsi_host" instance on each host

echo "- - -" > /sys/class/scsi_host/host0/scan
echo "- - -" > /sys/class/scsi_host/host1/scan
etc..

fdisk -l verifies new storage (in this case 1.6 TB)

Disk /dev/sdr: 1798.7 GB, 1798765019136 bytes
255 heads, 63 sectors/track, 218687 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

if THAT doesn't work, you'll need to install the SG3_utils and run the rescan-scsi-bus.sh script. That script will not re-initialize the SCSI Bus.

Thanks!

MensaWater · 07-21-2017, 01:10 PM

Glad you got it working. Please go to thread tools and marked this as Solved. It helps others in future with similar questions find solutions more quickly.

Yes - lsscsi is a package you can install: yum install lsscsi

As an FYI you can search for given files and commands by running:
yum provides "*/<filename>"

So running the following should show you the lsscsi command is part of package of the same lsscsi name.
yum provides "*/lsscsi"

You can do the same the libraries.

Of course not all packages are available in the native repositories. You can other places to get rpms for the distro/version you want. A good one for RHEL/CentOS is Fedora's EPEL which has rpms for both rhel6 and rhel7 that will also work on centos6 or centos7 (and probably other distros such as OEL).

JockVSJock · 07-25-2017, 06:48 PM

Quote:

Originally Posted by Tim Walsh

I realize 5.11 is EOL and yeah, the DBAs want to upgrade Oracle to 12c and I'm pushing for upgrading the version of Redhat. Like I said I'm new on this job LOL!

You can contact Red Hat for Extended Support for RHEL5, as we had to do. As we are currently trying to move away from RHEL5/Oracle 11g to RHEL6/Oracle 12c.

Tim Walsh · 07-26-2017, 09:32 AM

Yes in fact that is exactly what I (well, my company) did. the RHEL support staff were helpful. And yes they explained it was EOL, but this was part of prepping to migrate and upgrade our infrastructure.

oh and BTW in order to get Oracle to be able to use the new storage, it needs a raw partition without a file system. I first tried fdisk and got an error, so I had to use kpartx because of some bug.

fdisk /dev/mapper/mpath9

new, primary, 1, etc. "w" to write to disk... and it errors out

to fix that you have to use: kpartx -a -v (add, verbose) /dev/mapper/mpath9

but wait!! ls -l /dev/mapper and /sbin/multipath -ll still aren't listing your new partition!

truncated output:
...
brw-rw---- 1 root disk 253, 7 Jun 23 03:57 mpath8
brw-rw---- 1 root disk 253, 10 Jun 23 03:57 mpath8p1
brw-rw---- 1 root disk 253, 16 Jul 24 17:14 mpath9

you must /sbin/multipath -r to refresh multipath

truncated output:
...
brw-rw---- 1 root disk 253, 7 Jun 23 03:57 mpath8
brw-rw---- 1 root disk 253, 10 Jun 23 03:57 mpath8p1
brw-rw---- 1 root disk 253, 16 Jul 24 17:14 mpath9
brw-rw---- 1 root disk 253, 17 Jul 25 08:20 mpath9p1 <-- there is your new (raw) partition

On the other node, just run /sbin/multipath -r to refresh multipath

Bryansom · 09-15-2018, 10:41 PM

Been running MX 15 since it came out and have had pretty good experience over all. Burned an ISO when 16 came out ut, never installed it. Now they have released MX 17 and it looks pretty interesting. Might just have to give it a whirl. Anyone here tried out any MX distro?

MensaWater · 09-17-2018, 12:16 PM

Quote:

Originally Posted by Bryansom

Been running MX 15 since it came out and have had pretty good experience over all. Burned an ISO when 16 came out ut, never installed it. Now they have released MX 17 and it looks pretty interesting. Might just have to give it a whirl. Anyone here tried out any MX distro?

Please don't append to old/closed threads. Ask your question in a new thread to insure it gets higher visibility. You might also want to explain what "MX 15" etc... are in your new post as they aren't RHEL (RedHat Enterprise Linux) designations.