LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 02-06-2014, 05:15 AM   #1
Iyyappan
Member
 
Registered: Dec 2008
Location: Chennai, India
Distribution: CentOS 5, SLES 11
Posts: 235

Rep: Reputation: 4
Errors after Adding LUN to Server


Hi,

We have a RHEL server 5.6 running Oracle Database......

We have two PVS and two VGS and in under one VG we had around 11 LVs...... and other VG had one LV.....


at 14.17...A LUN of 50 GB was added by storage team to extend the FS of a partition ( Note: LUN alone was added, no pv, vg extend were done )

At the same exact time, Oracle database stopped working as it was unable to read a undo.dbf file.......Also when they tried to start, it never started.....it was trying to access the below file and it was not accessible n then oracle went down again.

/d03_testdb/db/apps_st/data/undo01.dbf

There are many such .dbf files which are present but they were available....DB guys said this undo01.dbf is important without which Oracle wont start

In /var/log/messages, found the below errors

Except the below messages there was no I/O errors for the last one month...

Test kernel: sd 4:0:1:1: [sde] Unhandled error code
Feb 5 14:17:13 Test kernel: end_request: I/O error, dev sde, sector 310879448
Feb 5 14:17:13 Test kernel: Buffer I/O error on device dm-2, logical block 12645483
Feb 5 14:17:13 Test kernel: lost page write due to I/O error on dm-2
Feb 5 14:17:13 Test kernel: Buffer I/O error on device dm-2, logical block 12645484
Feb 5 14:17:13 Test kernel: lost page write due to I/O error on dm-2
Feb 5 14:17:13 Test kernel: Buffer I/O error on device dm-2, logical block 12645485
Feb 5 14:17:13 Test kernel: lost page write due to I/O error on dm-2
Feb 5 14:17:13 Test kernel: Buffer I/O error on device dm-2, logical block 12645486
Feb 5 14:17:13 Test kernel: lost page write due to I/O error on dm-2
Feb 5 14:17:13 Test kernel: end_request: I/O error, dev sde, sector 312463232
Feb 5 14:17:13 Test kernel: Buffer I/O error on device dm-2, logical block 12843456
Feb 5 14:17:13 Test kernel: lost page write due to I/O error on dm-2
Feb 5 14:17:13 Test kernel: Buffer I/O error on device dm-2, logical block 12843457
Feb 5 14:17:13 Test kernel: lost page write due to I/O error on dm-2
Feb 5 16:02:48 Test kernel: 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A

Feb 5 14:17:13 Test kernel: JBD: Detected IO errors while flushing file data on dm-2
Feb 5 16:02:48 Test kernel: EXT3 FS on dm-2, internal journal

PV VG Fmt Attr PSize PFree
/dev/mpath/mpathb testvg01 lvm2 a- 500.07G 0
/dev/sda4 VG lvm2 a- 522.27G 22.27G

Had one dm
lrwxrwxrwx 1 root root 8 Feb 5 16:02 mpathb -> ../dm-12

Looks like dm-2 was a ghost entry.....




We saw I/O error in Oracle trace files....... and in forum they asked to do an fsck to fix the issue..... I rebooted the server to clear the LUN ghost entries and also all the File systems were checked at boot as per entries in fstab..... Issue got fixed n Database is working fine.....



Now I need to know whether the I/O error occurred due to the LUN allocation or it was due to file corruption.... As both occurred at exact time I am confused.....


Also I have done many times online addition of LUN and have extended LV as well...I never faced this issue...

Last edited by Iyyappan; 02-06-2014 at 05:22 AM.
 
Old 02-06-2014, 05:24 AM   #2
Iyyappan
Member
 
Registered: Dec 2008
Location: Chennai, India
Distribution: CentOS 5, SLES 11
Posts: 235

Original Poster
Rep: Reputation: 4
This test server does not has support....So kindly do not ask me to ask RHEL or Oracle.........
 
Old 02-06-2014, 08:53 AM   #3
grim76
Member
 
Registered: Jun 2007
Distribution: Debian, SLES, Ubuntu
Posts: 295

Rep: Reputation: 48
Looks like your lvm.conf does not point to only the multipath devices. That might be causing your problem.
 
Old 02-10-2014, 08:57 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 16,003

Rep: Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990Reputation: 2990
Quote:
Originally Posted by Iyyappan
Now I need to know whether the I/O error occurred due to the LUN allocation or it was due to file corruption.... As both occurred at exact time I am confused.....
Grim76 hit it right on the head. Did you check with the SAN administrators to find out what they saw on their side? Did you see anything related to the HBA's come up in your system logs?

Quote:
Originally Posted by Iyyappan View Post
This test server does not has support....So kindly do not ask me to ask RHEL or Oracle.........
Funny, you've never seemed to call RHEL or Oracle support for ANY of your servers, despite being advised to many times. And again, if you're PAYING for RHEL and Oracle, your test servers are also covered for questions like this. So what's the reason for not calling? And having a test server connected to the SAN (with multipathing), and adding more space to it seems very odd. Why bother? Since it's a test server, you're spending a LOT more money for more HBA's, SAN fabric space (switches, etc.), and SAN disk isn't typically cheap. Deleting files from a test server seems far more likely for a test server.

Unless this wasn't a test server.
 
Old 02-14-2014, 01:56 AM   #5
Iyyappan
Member
 
Registered: Dec 2008
Location: Chennai, India
Distribution: CentOS 5, SLES 11
Posts: 235

Original Poster
Rep: Reputation: 4
Found the issue. We compared the o/p of multipath -ll before and after reboot.

Before reboot
[root@test data]# multipath -ll
mpathc (360a980006471585a534a775048434637) dm-13 NETAPP,LUN
size=50G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=4 status=active
| |- 3:0:1:2 sdi 8:128 active ready running
| `- 4:0:1:2 sdk 8:160 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 3:0:0:2 sdh 8:112 active ready running
`- 4:0:0:2 sdj 8:144 active ready running
mpathb (360a980006471585a534a6f5166436355) dm-12 NETAPP,LUN
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=4 status=active
| |- 3:0:1:1 sdc 8:32 active ready running
| `- 4:0:1:1 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 3:0:0:1 sdb 8:16 active ready running
`- 4:0:0:1 sdd 8:48 active ready running

Post reboot

mpathc (360a980006471585a534a775048434637) dm-13 NETAPP,LUN
size=50G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=4 status=active
| |- 3:0:1:2 sde 8:64 active ready running
| `- 4:0:1:2 sdi 8:128 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 3:0:0:2 sdc 8:32 active ready running
`- 4:0:0:2 sdg 8:96 active ready running
mpathb (360a980006471585a534a6f5166436355) dm-12 NETAPP,LUN
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=4 status=active
| |- 3:0:1:1 sdd 8:48 active ready running
| `- 4:0:1:1 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 3:0:0:1 sdb 8:16 active ready running
`- 4:0:0:1 sdf 8:80 active ready running


The device ids got changed...
@grim76 -- can u tell me how to configure the lvm.conf file to prevent the existing devices from getting scanned by lvm. Current lvm.conf is the default one.

Should I change it to
filter =[ "r|/dev/sde|", "r|/dev/sdi|", "r|/dev/sdc|", "r|/dev/sdg|", "r|/dev/sdd|", "r|/dev/sdh|", "r|/dev/sdb|", "r|/dev/sdf|" ]

Kindly help to configure the filters
 
Old 02-14-2014, 07:16 AM   #6
grim76
Member
 
Registered: Jun 2007
Distribution: Debian, SLES, Ubuntu
Posts: 295

Rep: Reputation: 48
That will not work if you lose the path that those devices are attached to then you will be down with no recourse. You are going to have to setup lvm.conf to look at the multipath names for the deivces.

In your output it shows mpathb and mpathc for your device aliases. You need to setup your lvm.conf file to point to those rather than the underlying physical dev identifier.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to detect FC lun on Server? pinga123 Linux - General 1 01-27-2014 03:34 PM
Increasing LUN sizes only, this not adding a new LUN maxmal7 Linux - Kernel 1 11-22-2013 04:55 PM
Replacing 200 GB LUN with 100 GB LUN in LVM rajaniyer123 Linux - Enterprise 4 03-27-2012 12:32 PM
RHEL 5 adding a LUN without a reboot l4n3 Linux - Enterprise 6 03-28-2008 03:17 PM
LUN Device Mapping: 2 devices map to same lun DantePasquale Linux - Distributions 0 09-24-2007 02:59 PM


All times are GMT -5. The time now is 01:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration