SAN LUNs not using multipath
Our database server, running Oracle Linux 6.4 has 12 SAN LUNs(Hitachi) presented. But only 8 show up as multipath devices and, remaining 4 show up as local scsi disks, which means these LUNs have no redundant paths.
Output of PVS : Code:
PV VG Fmt Attr PSize PFree After doing some research, only solution I found is making changes to /etc/multipath.conf. Specifically comment out the parameter {wwid "*"} under blacklist and then restart the multipathd services. Not very sure if I should make this change since this is a heavily used server ~200+ users. content of /etc/multipath.conf: Code:
# multipath.conf written by anaconda Has anyone seen a similar issue or know of a fix? Is this even as big of a problem or can be ignored ? |
The "pvs" command only shows the physical volumes already made LVM Physical Volumes. That is to say if you've not yet done a pvcreate on the new devices it doesn't show them.
Not all disks on the system are necessarily multipath (e.g. internal disks, dedicated arrays with their own HBAs, SSD HBAs etc...). The output that would be helpful is "lsscsi" as it would show you all the SCSI devices including disks along with types. (You may need to "yum install lsscsi" package to get the lsscsi command.) You would also want to run "multipath -l -v2" or similar to see what Multipath actually sees. It should show you the multipath device (in your case mpath...) as well as any component scsi disk (sd...) paths for all the paths that make up that multipath device. Typically when doing pvcreate I specify the /dev/mapper/<multipath> device (/dev/mapper/mpath... in your case. If you specified the scsi disk (/dev/sd...) instead it may restrict it to that though normally LVM will find the same PV info on all paths. You may need to look at the "filter" in your /etc/lvm.conf to verify it can see all mpath... devices if the sd... devices you listed are in fact from the array and there are multiple paths to them as shown by lsscsi. You don't say which Hitachi you have. Another tool I've found very useful from Hitachi comes with their HORCM software and it is called "inqraid". If you have that you can run a command to see the relationship between specific Hitachi devices and sd... devices by running: ls /dev/sd* |inqraid -CLI -fxng NOTE: That will also include non-Hitachi sd... devices if you have them but will show "-" for all fields other than the first one for those non-Hitachi devices. You can determine settings for /etc/multipath.conf by reviewing array documentation for your array. On RHEL6 for Hitachi VSP the definition we use here is: Code:
device { Also after any edit of multipath.conf run "service multipathd restart" so the daemon re-reads it. (This can be done on a running system with no impact to existing devices.) By the way - what you posted wasn't the "devices" section but rather the "blacklist" section. Do NOT add your array to the "blacklist" section. You don't mention if you ran a rescan to detect new devices. You'd have to have done that (or a reboot) after presenting them from the disk array. A nice tool to do that is rescan-scsi-bus.sh which you can get by installing sg3_utils package with yum. |
Quote:
Quote:
Quote:
Code:
lsscsi Quote:
Code:
sudo multipath -l -v2 Code:
Typically when doing pvcreate I specify the /dev/mapper/<multipath> device (/dev/mapper/mpath... in your case. If you specified the scsi disk (/dev/sd...) instead it may restrict it to that though normally LVM will find the same PV info on all paths. You may need to look at the "filter" in your /etc/lvm.conf to verify it can see all mpath... devices if the sd... devices you listed are in fact from the array and there are multiple paths to them as shown by lsscsi. Quote:
Quote:
Quote:
Quote:
|
And thanks a lot for replying ! Sorry my last response got pretty long.
I am worried that the devices which show up as /dev/sd , do not have redundant paths to storage, is that concern even legit ? |
The lsscsi confirms your devices, sdav, sdbl, sdbz and sdca are Hitachi OPEN-V devices. Comparing the ID on those devices to those you see for your other multipath devices suggests each is a single path to a separate Hitachi device.
The multipath output confirms that you have 8 paths to most of your devices but doesn't show anything containing these 4 devices so I suspect your issue is simply that multipath hasn't updated to create multipath devices yet. As I noted before you can run "service multipathd restart". This bounces the daemon. Stopping/bouncing the daemon doesn't remove or impact any active multipath devices. On restart the daemon should find the new devices. Based on the pattern I see in your multipath output it seems you should have a multipath that contains 8 paths for each of the 4 devices you saw in pvs but only one of each is showing in your pvs output yet all of which are in your lsscsi output. For example your first disk has 8 paths in lsscsi but only the 1 in pvs: 0:0:0:8 0:0:1:8 0:0:2:8 0:0:3:8 sdav In pvs output 1:0:0:8 1:0:1:8 1:0:2:8 1:0:3:8 As noted above I believe your host has already re-scanned and is seeing the new devices so you can ignore what follows but I did want to say more about rescan. The rescan command that I mentioned has worked well for me in recent months but there are other ways to rescan. One way is to do an echo into each of your host adapters. Based on the output you gave you appear to have host0 and host1 as adapters as denoted by the lines that start "0:" or "1:". (You appear to have multiple ports from the disk array zoned to the server given that you show 0:0: through 0:3: and 1:0: through 1:3: in your multipath for other devices giving each disk 8 paths in total.) The command to rescan that way is: echo "- - -" >host0/scan echo "- - -" >host1/scan Here we use Qlogic HBAs - you can determine which Qlogic hosts you have by going to /sys/class/scsi_host and running: grep -i ql */* 2>/dev/null That will find any that have host#s that have QL part numbers. Also Qlogic provides a tool we install on our servers which includes a rescan script they provide called: ql-dynamic-tgt-lun-disc.sh If you're running Qlogic you can download that from their site as part of the Super-Installer package. (When I last did this searching by OS found it where searching by Qlogic model didn't on their site.) If you're using something else (e.g. Emulex) you may find a similar tool on that vendor's site. I've not had to reboot a server in years to see new devices because one of these methods always works. As noted above the first rescan tool I posted about is one I've used multiple times on both RHEL5 and RHEL6 systems. Your OEL is ported from RHEL though Oracle does add things. I'd expect these things to work. |
Sorry about the delayed response. Had to get an approval for even a non-disruptive change. I got an OK to restart the multipathd daemon for this weekend.
You are correct, we have multiple ports zoned in, using Brocade SAN fabrics. And we are using Emulex HBAs, see below. I checked their website but haven't found a re-scan utility/script yet but, I will keep looking. I checked the history and, confirmed that I have used the echo command in past, see below. Code:
03:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03) Quote:
When I search for these 4 devices (sdav, sdbl, sdbz and sdca) in lsscsi output, I only see it once but, you are saying its there with 8 paths ? Also for full disclosure, I also see duplicate path warnings. Which apparently I can write a reg ex filter for in /etc/lvm/lvm.conf and ignore. Code:
Found duplicate PV cllQixBtG36oy39iVPdzmj8GFAhtX7ww: using /dev/sday1 not /dev/sdbm1 |
Quote:
I was saying that while we can see the I/O path for sdav is 0:0:3:8 in your lsscsi output. We can also see 7 other I/O paths that are *likely* to be separate paths to the same SAN disk (LUN/LDEV). Those being: 0:0:0:8 0:0:1:8 0:0:2:8 1:0:0:8 1:0:1:8 1:0:2:8 1:0:3:8 I did not list the sd* names associated with each of those because I'm lazy and you can see them yourself in your lsscsi output. The reason I say they are *likely* paths to the same disk is the pattern I see in your other multipath disks as reported by multipath -ll (multipath -l -v2) suggests them. For example the mpathe you show contains the following: |- 0:0:0:1 sdb 8:16 active undef running |- 0:0:1:1 sdo 8:224 active undef running |- 0:0:2:1 sdab 65:176 active undef running |- 0:0:3:1 sdao 66:128 active undef running |- 1:0:0:1 sdbc 67:96 active undef running |- 1:0:1:1 sdbp 68:48 active undef running |- 1:0:2:1 sdcc 69:0 active undef running `- 1:0:3:1 sdcp 69:208 active undef running That suggests the common item in IO path to each of the 8 sds for mpathe is the last field of the colon separate IO path is 1 for each of those meaning the first 3 fields are defining the path and the last field is defining the specific disk (LUN/LDEV). So your 8 IO paths (without specific final field) are: 0:0:0:x 0:0:1:x 0:0:2:x 0:0:3:x 1:0:0:x 1:0:1:x 1:0:2:x 1:0:3:x So for mpathe final character on each of those is "1". For mpathd the final character on each of those is "4". For your device with name sdav as shown in lsscsi the final character is "8" which is why I suggested the other 7 IO paths I did each starting as shown above but ending with "8". It wasn't clear from what you wrote whether you actually ran the "service multipathd restart" after you got permission. If you did then you should rerun "multipath -l -v2" (or multipath -ll) to see if it now shows your multipath devices for the other 4 SAN disks (LUNS/LDEVS) each containing 8 component sd devices (4 x 8 = 32 sd devices overall for the 4 mpaths created). NOTE: On reboot both "sd" names and "mpath" names can change as the names are assigned as devices are found during each boot. Examples: What is sdf on this boot might become sdam on another boot. What is mpathk on this boot might become mpathb on another boot. |
Have not yet ran "service multipathd restart".
I am afraid the theory of last field value being the LUN ID may not apply. We have total of 12 LUNs mapped. But the lsscsi output have values ranging from 0:0:0:0 - 0:0:0:12, which would mean the OS is seeing total of 13 LUNs, it doesnt add up. You are right about sd names possibly changing, but I think mptah names are static, since they are defined in /dev/mapper. Code:
ls -la /dev/mapper/ |
I feel confident the pattern I mentioned is correct. If you notice, all 104 lsscsi lines that start with 0: or 1: (your 2 controllers) ending in :0 through :12 show as Hitachi OPEN-V regardless of first character. This means you do (or did) have 13 Hitachi devices. 13 devices x 8 paths = 104. A couple of reasons you'd have more than you think are:
1) You had another device presented at one time and removed from the Hitachi side but didn't do cleanup from the server side. 2) You have another device presented that you're not taking into account. Not all devices are used for filesystems (or LVM). You might have another for various reasons such as: a) Hitachi command device - These are used by Hitachi operations such as Shadow Image or Copy On Write - the server has to see them even though they don't actually store any OS level detail. b) Raw or block device used by an application. Oracle ASM for example uses raw/block devices for database storage rather than using filesystems such as ext4. From OS level you don't see the data files but they are there managed by ASM. You could run a command like "lsblk" or "fdisk -luc" on each of the sd devices shown to determine full details of each. Its possible one or more would give you errors indicating the device doesn't really exist and if so can be removed but I suspect they do all exist. A more certain way to verify this would be to go into the Hitachi array's management page itself, locate the host HBAs (on VSP these are defined as "host groups" - not sure if that is the same in your Hitachi) to see how many LUNs (on VSP these are defined as LDEVs) are presented to those HBAs. Make sure when looking at them you don't have any "filters" turned on that might be doing something like excluding "CMD" devices or others. Finally as I've said several times running "service multipathd restart" does NOT affect currently active multipath devices - it simply makes it rediscover to see any new ones. I've run this command dozens of times on multiple servers and have never seen it cause an issue. Even turning off with "service multipathd stop" won't affect currently active multipath devices. |
Didn't mean to abandon this thread. I still have not been able to get on OK to work on this.
Will keep this discussion open and update asap. |
Quote:
|
Not yet. I should have an update by early next week.
|
Quick question, since I am doing multipathd restart, would this be a good time to make change to "/etc/multipath.conf"? It was suggested that we should try removing the asterisk from the blacklist option, and that might enable all the LUNs to be in multi path.
Anything I need to worried about when making changes to multipath.conf? Thanks! Code:
# multipath.conf written by anaconda |
I've never used * for the wwid in the blacklist of any multipath.conf.
Review of the man page says for blacklist: Quote:
The fact you have existing mpaths despite the existence of the * metacharacter further seems to confirm it is ignoring that and finding the Hitachi devices. This means having the line probably isn't hurting or helping so leaving it there likely won't cause you a problem but were it me I'd probably remove it just to avoid future confusion. |
Doing the change tomorrow after hours. Plan is to edit the multupath.conf and, then restart multipathd.
|
All times are GMT -5. The time now is 09:33 AM. |