Linux - EnterpriseThis forum is for all items relating to using Linux in the Enterprise.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am relatively new to the whole FC / storage arena.
I have a Linux server (HP DL580) which had exactly 1 dual-port QLogic HBA installed. Each port registers as a separate card / entity in the kernel, so as far as the OS is concerned, it's got two HBAs. (If you go into /sys/class/fc_host, you see host0/ and host1/ .)
I was able to get the qlogic drivers working, multipath configured on top of the SCSI LUNs presented from the SAN, etc. The host was connected direct-attached to ports on an HP EVA SAN setup.
We took the server down and installed another dual-port qlogic HBA. The reason being is that we're going to be moving to a Brocade fabric-based setup, and the storage admins want me to see if I can view a Fibre-based tape drive they connected to the fabric.
So after power-on, the OS does indeed see two new HBA ports. (I can tell this because in /sys/class/fc_host, there are now entries for host2/ and host3/ , as well as the previous host0/ and host1/. Additionally, I see a new entry in /sys/class/fc_remote_ports/, which is rport-2:0-0. In /sys/class/fc_transport, I see target2:0:0.
Not really understanding fully what was going on, and not knowing if I was supposed to, I went ahead and did a:
echo "- - -" > scan in /sys/class/scsi_host/host2/ and host3/ ,
thinking perhaps the drive would show up as a new /dev/sd? device. It did not.
Here's what I don't quite have my head around. How do I know if I can successfully see the tape drive they attached? How do I know if I will be able to successfully talk to any devices attached to the fabric in the future? What exactly do the entries I see in the directories fc_transport/ and fc_remote_ports/ correspond to?
For starters, what OS / version is this host? And, out of curiosity, what did you use to set up multipathing?
Quote:
How do I know if I can successfully see the tape drive they attached?
Check the contents of /proc/scsci/scsi. In addition to the SCSI info there, of particular interest may be the vendor and model fields. If re-scanning the SCSI bus does not work, you might try restarting the host.
Quote:
How do I know if I will be able to successfully talk to any devices attached to the fabric in the future?
If the device's WWN/WWID is presented to the OS, you will be able to successfully talk to it. (Unless you have zoning at the switch level and/or access controls at the storage device level that would prevent it.)
For starters, what OS / version is this host? And, out of curiosity, what did you use to set up multipathing?
Check the contents of /proc/scsci/scsi. In addition to the SCSI info there, of particular interest may be the vendor and model fields. If re-scanning the SCSI bus does not work, you might try restarting the host.
If the device's WWN/WWID is presented to the OS, you will be able to successfully talk to it. (Unless you have zoning at the switch level and/or access controls at the storage device level that would prevent it.)
That may not apply if you're using a different OS or version.
Dunno for sure. But my WAG is these will match up with the SCSI host, channel, and Id (as visible, for example, in /proc/scsi/scsi).
Thank you for the assistance. Before you posted, I took the time and went into /sys/class/fc_remote_ports, and was able to match up a WWPN from the tape drive to the port_name file in one of the rport-?:?-? directories. Good sign.
Then I used your advice and delved into /proc/scsi/scsi, and yep, the new tape drive shows up at the bottom of the list. (I recognize the Vendor: and Model: fields.) My knowledge is limited enough that I'm still confused why a scan of the scsi bus didn't end up producing an additional file in the form of a /dev/sd? device file. I'm wondering if udev would need to be tweaked to actually create the device file for it.
To answer your other questions, this is a RHEL 5.3 box. I used dm-multipath to handle multipathing to the san luns.
I will check out your kbase article - I think I may have read that last night.
My knowledge is limited enough that I'm still confused why a scan of the scsi bus didn't end up producing an additional file in the form of a /dev/sd? device file. I'm wondering if udev would need to be tweaked to actually create the device file for it.
I don't know the answer, but I can tell you that (for whatever reason) SCSI bus scans haven't always worked for me. In those situations I've made a practice of rebooting. It's not very convenient, but these are one-off situations.
Quote:
Originally Posted by larold
To answer your other questions, this is a RHEL 5.3 box. I used dm-multipath to handle multipathing to the san luns.
Good. I've been really happy with DM Multipath in my environment -- so far.
Good. I've been really happy with DM Multipath in my environment -- so far.
This is slightly off the post-topic.
I was also happy with dm-mp for quite awhile. Something happened a couple weeks ago you should be aware of, and keep an eye out for. Here's what happened to me.
For unknown reasons, we had all paths to a specific LUN fail. (Through BOTH hbas.) In the syslogs, multipath reported it saw the luns go down, and we saw the usual messages about paths being failed. 15 seconds later, we saw syslog messages (kernel / driver I believe) stating that connectivity was back. However... no multipath message. Also, the paths were still marked as 'failed'. We only noticed this because of Oracle write backlog, and our Oracle consultants saying 'Hey - we did a vmstat on one of the LUNs and see some really weird things."
Turns out that sometime during that 15 seconds, multipathd died. It wasn't around to mark the paths as available again, so Oracle was never able to write to the LUN as far as I can tell. A couple days later is when I was made aware of the problem, and a 'multipath -l' showed me that all paths to the LUN were still marked as 'failed'. *OOPS*.
I have since put in a Nagios check to ensure multipathd is running.
Maybe there's some piece of the puzzle I don't understand, but it was very scary. I have a hunch something about the LUN crash itself freaked multipathd out causing it to die.
Like you, I'm doing some regular polling to ensure that multipath sees exactly the number of LUNS, and paths to each LUN, that I'd expect. In my case, I am doing this on each host (at the host level), since that is what I am concerned with most. That approach might not scale well for 400 nodes, but it works fine for 8 of them.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.