LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 01-23-2021, 05:07 PM   #1
dburke8088
LQ Newbie
 
Registered: Jan 2021
Location: South East USA
Distribution: Ubuntu 20.04
Posts: 4

Rep: Reputation: Disabled
9650SE raid controller Drive jumped out of raid


I have been working with Hardware for a long time but this one has me stumped. I have a 9650SE-16ML raid controller. I've configured it to raid 6 with 14 3TB drives. Working fine and loaded the LSI communications and enabled the WEB management, so I can start getting raid health. Worked fine for several months and just noticed a degraded array. One of the drives just died. Powered down, replaced the defective drive, come back up, and it's rebuilding the array just fine. But here's the thing, One drive is now part of a second unit! Not the drive that was replaced, but an entirely different drive. It's part of a non-existent Unit 1 and everything is reporting OK. Also, my raid size has reduced. I'm stumped as to what did this and how to put the drive back into the array without an entire rebuild. This is Ubuntu Server 20.04. Controller is
LSI 3DM2 9650SE-16ML 3DM2 version 2.11.00.021
API version 2.08.00.027
Copyright (c) 2012 LSI Corporation

From the web interface:
Drive Information (Controller ID 1)
VPort Model Capacity Type Phy Slot Unit Status Identify
0 ST3000DM001-1CH166 2.73 TB SATA 0 -- 0 OK
1 ST3000DM001-1CH166 2.73 TB SATA 1 -- 0 OK
2 ST3000DM001-1CH166 2.73 TB SATA 2 -- 0 OK
3 ST3000DM001-1E6166 2.73 TB SATA 3 -- 0 OK
4 ST3000DM001-1CH166 2.73 TB SATA 4 -- 0 OK
5 ST3000DM001-1CH166 2.73 TB SATA 5 -- 0 OK
6 ST3000DM001-9YN166 2.73 TB SATA 6 -- 0 OK
7 ST3000DM001-1CH166 2.73 TB SATA 7 -- 0 OK
8 ST3000DM001-1ER166 2.73 TB SATA 8 -- 0 OK
9 ST3000DM001-1CH166 2.73 TB SATA 9 -- 0 OK
10 ST3000DM001-1CH166 2.73 TB SATA 10 -- 0 OK
11 ST3000DM001-1CH166 2.73 TB SATA 11 -- 1 OK
12 ST3000DM001-1CH166 2.73 TB SATA 12 -- 0 OK
13 ST3000DM001-1ER166 2.73 TB SATA 13 -- 0 OK

Any Suggestions? I'm thinking if I replace drive 11 it might re-join?

Also, everything is just fine, great performance, just stuck in a degraded status

Last edited by dburke8088; 01-23-2021 at 05:08 PM. Reason: update
 
Old 01-24-2021, 06:24 PM   #2
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
I would guess the drive in slot 11 might be a problem or not. First I would suggest that you allow the drive you just replaced finish rebuilding, then worry about another drive.

You said this is raid6 so the second drive should not have lost any data, but take troubleshooting one step at a time and play it safe.
You know the data is there, you know it is rebuilding. Until rebuilding completes there is a risk of data loss (especially if the drive in slot 11 actually is bad) should you do anything that might further compromise the array.

After the rebuild completes then see about that other drive, until then "hands off". You want at least one redundant drive and until the rebuild completes you have none.
 
Old 01-26-2021, 02:13 AM   #3
dburke8088
LQ Newbie
 
Registered: Jan 2021
Location: South East USA
Distribution: Ubuntu 20.04
Posts: 4

Original Poster
Rep: Reputation: Disabled
Status - Waiting

Thanks for the input computersavvy And that is what I have done. The capture information is post array rebuild. Drive 13 was the original offending drive. It was after it's replacement that the odd drive 11 jumped to its own unit. I'm going to replace drive 11 this weekend to see if it makes it better or worse. I have all data backed up on a mirrored system for fail over.
 
Old 01-26-2021, 01:31 PM   #4
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
Quote:
Originally Posted by dburke8088 View Post
Thanks for the input computersavvy And that is what I have done. The capture information is post array rebuild. Drive 13 was the original offending drive. It was after it's replacement that the odd drive 11 jumped to its own unit. I'm going to replace drive 11 this weekend to see if it makes it better or worse. I have all data backed up on a mirrored system for fail over.
You might try a smartctl test on it on another machine just to see what went wrong. If smartctl does not report an error the problem could be a simple erroneous write that corrupted something in the array metadata and the drive may be recoverable.
 
Old 02-05-2021, 05:07 AM   #5
dburke8088
LQ Newbie
 
Registered: Jan 2021
Location: South East USA
Distribution: Ubuntu 20.04
Posts: 4

Original Poster
Rep: Reputation: Disabled
So I have an answer. BACK UP YOUR DATA FIRST Power down, replace the effected drive, on power up - go into the 3ware config/management (atl-3), the new drive will show attached and the array will show degraded. Select both the attached drive and the degraded array, tab into maintain and select rebuild. Press f8, the system will then finish boot and will rebuild while operating. when the server is back up, I can check the WEB interface and only have one array and it's rebuilding. I'm thinking that drive 11 might or might not have an issue, will low level erase the drive and do some speed tests to determine it's value.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
3ware 9650se-16ML raid poor performance unkie888 Linux - Hardware 2 01-30-2011 04:47 AM
Package Kit error often jumped out when I run some software such as 805840887 Linux - Newbie 0 04-08-2009 11:18 AM
Slackware 12.0 and 3ware 9650SE card dccspring92 Slackware 1 11-29-2007 02:57 PM
Slackware 12.0 and 3ware 9650SE card dccspring92 Linux - Newbie 1 11-29-2007 08:43 AM
Alternative to 3ware 9650SE SATA RAID? jhwilliams Linux - Hardware 0 09-13-2007 05:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 02:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration