/dev/sdb: read failed after 0 of 4096 at 0: Input/output error

deep27ak · 07-24-2015, 02:45 AM

I get this error everytime I run any LVM commands on my machine which is running with SLES on IBM and is connected to EMC VNX5300 storage.

Initially I thought one of my block device has failed but everything on the OS side looked normal as seen below

output of multipath

Code:

# multipath -ll
36006016037a02e00ca86fb1d4847e111 dm-0 DGC,RAID 5
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='service-time 0' prio=4 status=active
| `- 0:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 1:0:0:0 sdb 8:16 active ready running

# lsscsi
[1:0:0:0]    disk    DGC      RAID 5           0532  /dev/sda
[2:0:0:0]    disk    DGC      RAID 5           0532  /dev/sdb

multipathd> show paths
hcil    dev dev_t pri dm_st  chk_st dev_st  next_check
1:0:0:0 sda 8:0   4   active ready  running XXXXXX.... 13/20
2:0:0:0 sdb 8:16  1   active ready  running XXXXXX.... 13/20

But I couldnot see /dev/sdb

Code:

# fdisk /dev/sdb
fdisk: unable to read /dev/sdb: Invalid argument

So I manually tried to remove sda just to make sure if the redundancy is working properly

Code:

# echo 1 > /sys/block/sda/device/delete

# multipath -ll
36006016037a02e00ca86fb1d4847e111 dm-0 DGC,RAID 5
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
`-+- policy='service-time 0' prio=2 status=active
  `- 1:0:0:0 sdb 8:16 active ready running

Thern I rescanned my HBA

Code:

# echo "- - -" > /sys/class/scsi_host/host0/scan

and both the block device re-appear

Code:

# multipath -ll
36006016037a02e00ca86fb1d4847e111 dm-0 DGC,RAID 5
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='service-time 0' prio=4 status=active
| `- 0:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 1:0:0:0 sdb 8:16 active ready running

Adding to my surprise all the errors which were shown earlier while running lvm commands were gone

Code:

# lvs
  LV                     VG     Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
  ISS                    system -wi-ao--- 120.47g
  is-main                system -wi-ao---  15.00g
  system-opt-mgtservices system -wi-ao---   5.00g
  system-usr             system -wi-ao---   2.00g
  system-var             system -wi-ao---   2.00g
  system-var-log         system -wi-ao---   2.00g
  system-var-opt         system -wi-ao---  25.00g
  tmp                    system -wi-ao---  20.00g

And also this device was visible again

Code:

# fdisk /dev/sdb

Command (m for help): p

Disk /dev/sdb: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006db56

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048     1060863      529408   83  Linux
/dev/sdb2         1060864     9461759     4200448   83  Linux
/dev/sdb3         9461760   411023359   200780800   8e  Linux LVM
/dev/sdb4       411023360   419430399     4203520   82  Linux swap / Solaris

I am really confused what is leading to such behavior? Any misconfiguration in mutipath?

Below is my multipath config

Code:

defaults {
        verbosity 2
        polling_interval 5
        multipath_dir "/lib64/multipath"
        path_selector "service-time 0"
        path_grouping_policy "failover"
        uid_attribute "ID_SERIAL"
        prio "const"
        prio_args ""
        features "0"
        path_checker "directio"
        alias_prefix "mpath"
        failback "manual"
        rr_min_io 1000
        rr_min_io_rq 1
        max_fds "max"
        rr_weight "uniform"
        queue_without_daemon "yes"
        flush_on_last_del "no"
        user_friendly_names "no"
        fast_io_fail_tmo 5
        bindings_file "/etc/multipath/bindings"
        wwids_file /etc/multipath/wwids
        log_checker_err always
        retain_attached_hw_handler no
        detect_prio no
}

Keruskerfuerst · 07-24-2015, 03:54 AM

Then try to replace the mentioned HDD.

deep27ak · 07-24-2015, 04:33 AM

It is a SAN storage, sda and sdb are just block devices

I didn't get you exactly !!

Keruskerfuerst · 07-24-2015, 06:02 AM

Then check the complete device.