Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I had three 80Gb hard disks of which about 60Gb of each one was configured to a Raid5 array. It has worked well for a number of years until I got an error screen telling me one of the Raid disks was failing and did I want to boot with a downgraded Raid. I have added another disk to the Raid and taken the affected one out. However I have not been able to identify the physical drive so it remains installed.
Performance is very slow, is there something else I should be doing? I have Ubuntu 11.10 installed and version 3.14 of mdadm.
To identify a drive I've found the best way, with my setup, is to do the following...
Find out the physical connection, usually in the manual, say sata1, sata2 etc.
To save having do prefix the commands with sudo, bring up a console window then issue "sudo -i" (remove quotes) and enter your current users password.
Then issue "dmesg", then look for something along the lines of :-
[ 0.855906] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 0.871922] ata1.00: ATA-8: WDC WD10EACS-00D6B1, 01.01A01, max UDMA/133
[ 0.871928] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 0.872814] ata1.00: configured for UDMA/133
[ 0.888121] scsi 0:0:0:0: Direct-Access ATA WDC WD10EACS-00D 01.0 PQ: 0 ANSI: 5
[ 1.612019] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.620299] ata2.00: ATA-8: WDC WD10EAVS-00D7B1, 01.01A01, max UDMA/133
[ 1.620304] ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1.621204] ata2.00: configured for UDMA/133
[ 1.636122] scsi 1:0:0:0: Direct-Access ATA WDC WD10EAVS-00D 01.0 PQ: 0 ANSI: 5
This identifies the drives numbers according to the kernel and is directly related to the hardware port numbers as the ports are enumerated in physical order.
Further down in the dmesg output is the following :-
[ 5.505237] md: md2 stopped.
[ 5.506711] md: bind<sdb2>
[ 5.506863] md: bind<sda2>
[ 5.508861] raid1: raid set md2 active with 2 out of 2 mirrors
[ 5.509448] md2: bitmap initialized from disk: read 1/1 pages, set 0 bits
[ 5.509451] created bitmap (1 pages) for device md2
[ 5.522807] md2: detected capacity change from 0 to 209702912
[ 5.523766] md2: unknown partition table
links the sd* number to the md* raid number.
Using a mix of the above and "cat /proc/mdstat/" (remove quotes) and the mdadm (see the manual) command to list devices and details should tell you which physical drive you need to remove and if the raid is re-building or is still degraded or some other problems are present.
The output of the dmesg will also show any errors related to the drives, so a careful scan of its content can be illuminating.
I would suggest that if one drive has failed there is a good chance that the others are on their way out so backup before they fail.
My dmesg output is not quite as simple as yours, the lines refer to ata3 and ata4 for a bit then quite a bit later refer to ata5 and ata6. Are you saying these will be in order of the labels on the mother board?
I do not quite follow your steps to identify the linux drives to the ata numbers therefore the physical ports either. Could you provide a bit more guidance please?
When my machine is booting I know it will boot correctly if it comes up with a message that IRQ #11 is being disabled. I have lifted the context of that from my dmseg output to see if that helps. It also says that the RAID was set up correctly but performance is still very slow. It takes a long time to get filesystem information and my wine application reloads pages slower than reading speed.
1.713761] FDC 0 is a post-1991 82077
[ 1.917695] md: bind<sdc3>
[ 1.923016] md: bind<sda3>
[ 1.933910] md: bind<sdd2>
[ 3.812031] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 0)
[ 4.615882] irq 11: nobody cared (try booting with the "irqpoll" option)
[ 4.615888] Pid: 0, comm: swapper Not tainted 3.0.0-12-generic #20-Ubuntu
[ 4.615891] Call Trace:
[ 4.615893] <IRQ> [<ffffffff810cf8ad>] __report_bad_irq+0x3d/0xe0
[ 4.615908] [<ffffffff810cfcd5>] note_interrupt+0x135/0x180
[ 4.615913] [<ffffffff810cdcc9>] handle_irq_event_percpu+0xa9/0x220
[ 4.615918] [<ffffffff810937a8>] ? tick_dev_program_event+0x48/0x110
[ 4.615923] [<ffffffff810cde8e>] handle_irq_event+0x4e/0x80
[ 4.615927] [<ffffffff810d01e1>] handle_level_irq+0x81/0x100
[ 4.615932] [<ffffffff8100c252>] handle_irq+0x22/0x40
[ 4.615937] [<ffffffff815f3d2a>] do_IRQ+0x5a/0xe0
[ 4.615942] [<ffffffff815ea413>] common_interrupt+0x13/0x13
[ 4.615947] [<ffffffff81065f10>] ? __do_softirq+0x60/0x210
[ 4.615952] [<ffffffff8109388f>] ? tick_program_event+0x1f/0x30
[ 4.615956] [<ffffffff815f34dc>] ? call_softirq+0x1c/0x30
[ 4.615959] [<ffffffff8100c2d5>] ? do_softirq+0x65/0xa0
[ 4.615963] [<ffffffff8106633e>] ? irq_exit+0x8e/0xb0
[ 4.615967] [<ffffffff815f3e1e>] ? smp_apic_timer_interrupt+0x6e/0x99
[ 4.615972] [<ffffffff815f2c93>] ? apic_timer_interrupt+0x13/0x20
[ 4.615974] <EOI> [<ffffffff81012457>] ? mwait_idle+0x87/0x160
[ 4.615984] [<ffffffff8100920b>] ? cpu_idle+0xab/0x100
[ 4.615990] [<ffffffff815b803e>] ? rest_init+0x72/0x74
[ 4.615995] [<ffffffff81ad0c2b>] ? start_kernel+0x3d4/0x3df
[ 4.616000] [<ffffffff81ad0388>] ? x86_64_start_reservations+0x132/0x136
[ 4.616005] [<ffffffff81ad0140>] ? early_idt_handlers+0x140/0x140
[ 4.616007] [<ffffffff81ad0459>] ? x86_64_start_kernel+0xcd/0xdc
[ 4.616007] handlers:
[ 4.616007] [<ffffffff81449450>] usb_hcd_irq
[ 4.616007] Disabling IRQ #11
[ 8.812022] ata5.00: qc timeout (cmd 0xa1)
[ 8.812029] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 10.996040] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 0)
[ 20.996017] ata5.00: qc timeout (cmd 0xec)
[ 20.996024] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 20.996028] ata5: limiting SATA link speed to 1.5 Gbps
[ 23.180035] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[ 53.180021] ata5.00: qc timeout (cmd 0xec)
[ 53.180027] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 55.364036] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[ 57.444023] ata6: SATA link down (SStatus 0 SControl 0)
[ 57.444525] xor: automatically using best checksumming function: generic_sse
[ 57.464008] generic_sse: 4403.000 MB/sec
[ 57.464011] xor: using function: generic_sse (4403.000 MB/sec)
[ 57.469621] md: raid6 personality registered for level 6
[ 57.469626] md: raid5 personality registered for level 5
[ 57.469629] md: raid4 personality registered for level 4
[ 57.472629] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
[ 57.472635] PCI: setting IRQ 10 as level-triggered
[ 57.472643] firewire_ohci 0000:04:02.0: PCI INT A -> Link[LNKB] -> GSI 10 (level, low) -> IRQ 10
[ 57.474181] bio: create slab <bio-1> at 1
[ 57.474204] md/raid:md0: device sdd2 operational as raid disk 2
[ 57.474207] md/raid:md0: device sda3 operational as raid disk 0
[ 57.474210] md/raid:md0: device sdc3 operational as raid disk 1
[ 57.474831] md/raid:md0: allocated 3230kB
[ 57.475105] md/raid:md0: raid level 5 active with 3 out of 3 devices, algorithm 2
[ 57.475109] RAID conf printout:
[ 57.475111] --- level:5 rd:3 wd:3
[ 57.475114] disk 0, o:1, dev:sda3
[ 57.475116] disk 1, o:1, dev:sdc3
[ 57.475118] disk 2, o:1, dev:sdd2
[ 57.475152] md0: detected capacity change from 0 to 119019667456
I'm not sure your level of knowledge, so can you issue:-
Can you post the output of "cat /proc/mdstat/" (remove quotes)
Which will tell me if your raids are up and in a good state.
Then the output of "mdadm --detail /dev/md0" (remove quotes)
Which tells me if the md0 is good, and a few bits of other info.
According to what i've found out... irq11 is used by network devices, additional disk controlers, and sound cards.
There are various reported problems, so it would be best if you look at what google brings up and if it relates to your setup.
I notice you are using a new version of ubuntu, there are a lot of bug fixes going on all the time and there was a rather nasty short lived problem with mdadm so make sure you have run all updates.
At a guess from output shown I'd say that sdb was the failed device?
Also are you using a sata card, or sata on the motherboard?
Do you have any card readers, such as compact flash etc?
What is the motherboard make and model.
IF the above after posting does NOT help you then in another post do the following.
Finally the whole output of "dmesg" after a re-boot and upto the point where it says something similar to
[ 10.629406] EXT4-fs (md5): mounted filesystem with ordered data mode
[ 10.729736] EXT4-fs (md6): mounted filesystem with ordered data mode
As this is when its all happy with the disks and has mounted them... usually after this it will be something about bringing up the network and potentually log in stuff.
You may find that pastebin.com is useful for the dmesg output as its going to be long... also please check that you don't include output that may identify your computers ip address, computer name, or network name.
I'm not sure if I can help at this point as its getting very machine specific, but I'll try.
Thanks for persisting Jonathon, I think you are about to find out my knowledge is extremely limited.
These are the posts you asked for;
don@don-ubuntu:~$ sudo cat /proc/mdstat/
[sudo] password for don:
cat: /proc/mdstat/: Not a directory
don@don-ubuntu:~$ cat /proc/mdstat/
cat: /proc/mdstat/: Not a directory
don@don-ubuntu:~$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Sat Oct 27 16:40:52 2007
Raid Level : raid5
Array Size : 116230144 (110.85 GiB 119.02 GB)
Used Dev Size : 58115072 (55.42 GiB 59.51 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Feb 6 09:13:53 2012
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 5a5910e6:769ae62d:16970757:09b4ed7d (local to host don-ubuntu)
Events : 0.2379
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 35 1 active sync /dev/sdc3
2 8 50 2 active sync /dev/sdd2
I do not know why the cat command did not work. Am I being real thick?
SATA is on the motherboard. I always try to keep the system up to date applying updates as once as they are available. The motherboard is an ABIT AL8. Yes sdb was the drive that was reporting bad sectors. No there is no flash card reader as part of the system but I have a reader that plugs into the USB port and recently read a faulty card. However that USB port has read other stuff since.
And according to it, on p25 you have six possible sata's and one ide.
Can you
Code:
ls -al /dev/disk/by-id
Which will tell me what you have in the way of disks...
and also can you give me the complete output of dmesg after a re-boot and after you have browsed the content of all your drives, just incase there is one causing suprious errors.
I'm begining to think this may not be a disk or raid problem, although it would be worth identifying the failed drive to remove it completely.
One point of note though is that if the older drives were only capable of achieving 1.5G transfer speeds and the new one is 3G speeds then I#m not sure the implications for the software raid receiving the data at different speeds. I guess its possible it could cause problems that wouldn't actually be errors.
[edit]I checked on the mailing list and it seems it should make no difference at all to overall performance.[/edit]
It might be worth looking at the mailing list for linux raid at :-
I have had network problems so no internet for a week or so hence no response. Also an update to the kernel just came through so I will see how things go for a while. If I still have problems I will follow Jonathon's suggestions.
After the kernel upgrade the problem is persisting. I have pasted the dmesg output to pastebin onder the same user name as this one. I noticed when I was browsing it that there was a warning that I was trying to mount an ext3 file as ext2. I can not find the warning now but wondered if that was the problem.
Since the kernel was upgraded there are no reports of the RAID being degraded but IRQ11 is still being disabled.
Any help appreciated.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.