LinuxQuestions.org - /dev/sdb: read failed after 0 of 4096 at 0: Input/output error

I get this error everytime I run any LVM commands on my machine which is running with SLES on IBM and is connected to EMC VNX5300 storage.

Initially I thought one of my block device has failed but everything on the OS side looked normal as seen below

output of multipath

Code:

# multipath -ll

36006016037a02e00ca86fb1d4847e111 dm-0 DGC,RAID 5

size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw

|-+- policy='service-time 0' prio=4 status=active

| `- 0:0:0:0 sda 8:0  active ready running

`-+- policy='service-time 0' prio=1 status=enabled

  `- 1:0:0:0 sdb 8:16 active ready running



# lsscsi

[1:0:0:0]    disk    DGC      RAID 5          0532  /dev/sda

[2:0:0:0]    disk    DGC      RAID 5          0532  /dev/sdb



multipathd> show paths

hcil    dev dev_t pri dm_st  chk_st dev_st  next_check

1:0:0:0 sda 8:0  4  active ready  running XXXXXX.... 13/20

2:0:0:0 sdb 8:16  1  active ready  running XXXXXX.... 13/20

But I couldnot see /dev/sdb

Code:

# fdisk /dev/sdb

fdisk: unable to read /dev/sdb: Invalid argument

So I manually tried to remove sda just to make sure if the redundancy is working properly

Code:

# echo 1 > /sys/block/sda/device/delete



# multipath -ll

36006016037a02e00ca86fb1d4847e111 dm-0 DGC,RAID 5

size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw

`-+- policy='service-time 0' prio=2 status=active

  `- 1:0:0:0 sdb 8:16 active ready running

Thern I rescanned my HBA

Code:

# echo "- - -" > /sys/class/scsi_host/host0/scan

and both the block device re-appear

Code:

# multipath -ll

36006016037a02e00ca86fb1d4847e111 dm-0 DGC,RAID 5

size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw

|-+- policy='service-time 0' prio=4 status=active

| `- 0:0:0:0 sda 8:0  active ready running

`-+- policy='service-time 0' prio=1 status=enabled

  `- 1:0:0:0 sdb 8:16 active ready running

Adding to my surprise all the errors which were shown earlier while running lvm commands were gone

Code:

# lvs

  LV                    VG    Attr      LSize  Pool Origin Data%  Move Log Copy%  Convert

  ISS                    system -wi-ao--- 120.47g

  is-main                system -wi-ao---  15.00g

  system-opt-mgtservices system -wi-ao---  5.00g

  system-usr            system -wi-ao---  2.00g

  system-var            system -wi-ao---  2.00g

  system-var-log        system -wi-ao---  2.00g

  system-var-opt        system -wi-ao---  25.00g

  tmp                    system -wi-ao---  20.00g

And also this device was visible again

Code:

# fdisk /dev/sdb



Command (m for help): p



Disk /dev/sdb: 214.7 GB, 214748364800 bytes

255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x0006db56



  Device Boot      Start        End      Blocks  Id  System

/dev/sdb1  *        2048    1060863      529408  83  Linux

/dev/sdb2        1060864    9461759    4200448  83  Linux

/dev/sdb3        9461760  411023359  200780800  8e  Linux LVM

/dev/sdb4      411023360  419430399    4203520  82  Linux swap / Solaris

I am really confused what is leading to such behavior? Any misconfiguration in mutipath?

Below is my multipath config

Code:

defaults {

        verbosity 2

        polling_interval 5

        multipath_dir "/lib64/multipath"

        path_selector "service-time 0"

        path_grouping_policy "failover"

        uid_attribute "ID_SERIAL"

        prio "const"

        prio_args ""

        features "0"

        path_checker "directio"

        alias_prefix "mpath"

        failback "manual"

        rr_min_io 1000

        rr_min_io_rq 1

        max_fds "max"

        rr_weight "uniform"

        queue_without_daemon "yes"

        flush_on_last_del "no"

        user_friendly_names "no"

        fast_io_fail_tmo 5

        bindings_file "/etc/multipath/bindings"

        wwids_file /etc/multipath/wwids

        log_checker_err always

        retain_attached_hw_handler no

        detect_prio no

}