Red HatThis forum is for the discussion of Red Hat Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I just purchased a Dell PowerEdge 2900, which is very similar to a PE1900. I'm experiencing the same issue with CentOS 4.4. smartd refuses to start both on system startup and when invoked manually. Here follows a rough summary of the server specifications,
(2) 1.6Ghz Quad-core Xeon 1066Mhz FSB
(4) 146GB SAS 3.5" 15K Hard drive
(4) 2GB 667Mhz Dual Ranked Fully Buffered DIMMS
(1) PERC 5/i configured for RAID 10
The error in /var/log/messages is as follows,
[root@bigdog ~]# service smartd start
Starting smartd: [FAILED]
[root@bigdog ~]# tail /var/log/messages
Apr 3 16:37:35 bigdog smartd: smartd startup failed
Apr 3 16:38:08 bigdog smartd[30205]: smartd version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Apr 3 16:38:08 bigdog smartd[30205]: Home page is http://smartmontools.sourceforge.net/
Apr 3 16:38:08 bigdog smartd[30205]: Opened configuration file /etc/smartd.conf
Apr 3 16:38:08 bigdog smartd[30205]: Configuration file /etc/smartd.conf parsed.
Apr 3 16:38:08 bigdog smartd[30205]: Device: /dev/sda, opened
Apr 3 16:38:08 bigdog smartd[30205]: Device: /dev/sda, Bad IEC (SMART) mode page, err=-5, skip device
Apr 3 16:38:08 bigdog smartd[30205]: Unable to register SCSI device /dev/sda at line 30 of file /etc/smartd.conf
Apr 3 16:38:08 bigdog smartd[30205]: Unable to register device /dev/sda (no Directive -d removable). Exiting.
Apr 3 16:38:08 bigdog smartd: smartd startup failed
[root@bigdog ~]#
I'm not very experienced with smartd. From what I can gather, this utility is used to detect potential failures before they occur. If smartd is not needed for post-failure recovery of an array then I'd feel safe just removing it from chkconfig...
Even so, any thoughts on the subject would be greatly appreciated.
Cheers!
Hi ,
As siya said , check that your scripts/softwares not using smartd command .
we run ServeRAID on this as well, RHEL4u2 ... this partition (dm-2) is a very large ext3FS that acts as the mount for a postgres database that is very frequently used and taxed...the partition is in LVM2 so I thought maybe a LVM issue but:
The live dm data corresponds to what we see in the LVM2 metadata, so we can probably rule out any problem at the volume manager/device mapper layer:
Code:
vg00-lvol01: 0 2097152 linear 8:2 384
vg00-lvol09: 0 10289152 linear 8:2 643891584
vg00-lvol08: 0 4128768 linear 8:2 639762816
vg00-lvol10: 0 632881152 linear 8:2 2752896
vg00-lvol07: 0 4128768 linear 8:2 635634048
vg00-lvol06: 0 655360 linear 8:2 2097536
vg00-lvol05: 0 4128768 linear 8:2 679870848
vg00-lvol04: 0 6160384 linear 8:2 673710464
vg00-lvol03: 0 18481152 linear 8:2 655229312
vg00-lvol02: 0 1048576 linear 8:2 654180736
These linear mappings correspond to the device regions identified in the etc/lvm/backup/vg00 metadata file, for example:
Code:
lvol10 {
id = "v1j3Ii-GqDO-tVHF-y845-kEQj-vWe5-l4NT7f"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1
segment1 {
start_extent = 0
extent_count = 9657 # 301.781 Gigabytes
type = "striped"
stripe_count = 1 # linear
stripes = [
"pv0", 42
]
}
}
pv0 is sda2 (8:2):
pv0 {
id = "0MWHVn-TYKx-0ifq-jCw8-KnrK-LVLD-BH5QGg"
device = "/dev/sda2" # Hint only
status = ["ALLOCATABLE"]
pe_start = 384
pe_count = 10625 # 332.031 Gigabytes
}
Exten 42 above ("pv0", 42) puts us right at the beginning of the region on the disk that is throwing back all the scsi errors...the region is only about 27k in size (that is throwing back the errors) and is high up on the deivce and corrosponds to the journal itself (for the ext3FS).
So I thought there is a problem with how the ips driver or firmware is dealing with the I/Os being sent down from the jbd driver (journaled block device). Immediately remaking the partition as ext2FS resolves the problem ... hmmm ...
So anyway, wanted to try to keep it ext3FS --- so tried:
increase the commit time to 30 seconds by editing /etc/fstab.
Code:
For example:
/dev/vg0/varvol /var ext3 commit=30 1 2
Then remount or reboot for the settings to take a effect on the filesystem.
No go with that so I also tried: decreasing block flushing frequency via dirty_writeback_centisecs/dirty_expire_centisecs....
-- Again no go ... smartd disabled etc,, all the original items posted in this post ... well we have this problem at multiple locations running the same configuration ... funny thing is now the ext2FS is spitting out the SCSI I/O errors on dm-2 (lvol10) the postgres mount partition ... but only when doing a dd to the partition ... I tried the irqpoll option for booting, again no go -- it aint a hardware issue as 18 different sites are having the same problem ...
I too am seeing this problem on an IBM x226 server that has been running fine for nearly 5 years (without being touched) and now it is scrolling these errors up the console screen:
It has six SCSI disks on a ServeRAID-6i. IBM have swapped out the RAID card and now also the motherboard. All hardware looks fine and is not reporting errors. Server is still in use with 40 users off it and it running fine.
The o/s (RedHat Enterprise 4 AS 2.6.9-22.0.1.EL) boots fine with no errors. Then these SCSI errors start about 20 seconds after the o/s has fully booted.
Are there any known fixes for this?
I'm just about to look into smartd, but I've seen from many Google searches that this doesn't seem to be the reason.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.