Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm having some trouble understanding the smartd configuration options which go in /etc/smartd.conf. I've had smartd running for close to a year now, and read the manual page for smartd and smartd.conf several times, and I still can't get my head around it properly. This is what I use in smartd.conf:
/dev/sda -H -f -S on -o on -n standby,q \
-s (O/../.././(00|06|12|18)|S/../.././11|L/../../(3|6)/21) \
-m <nomailer> -M exec /usr/sbin/smartd_mailer
The above works fine (I use a custom script because I use exim). However, I had two failing hard-disks on two different machines (sector pending attribute) - but I would only receive warning emails if I restarted the machines - not when the periodic tests were being performed. I have a number of questions, if any of you would like to share some light on this:
1. It is not clear from the manual page how the monitoring and alerting mode works. Do I get an initial email when I start smartd if something is wrong, and then no other warning email after each scan, even if there is a fault, unless the fault goes worse?
2. Can I force it to send me warning emails after each test (offline, online, short, long etc.) if the fault condition is still there, even if it hasn't gotten worse?
3. There seem to be a variety of parameters monitored. I was thinking that if I monitor bad sectors, and the hdd temperature - it should be enough to warn me when the hard-disk is failing? Would the -a switch in smartd.conf cover these?
4. I don't understand the difference between monitoring and logging in relation to alerts? Do I get alerts only if I log things?
5. Do I understand correctly that the -p and -u switch would warn me of *any* changes in SMART attributes, even if they don't represent risk of failure?
6. Is there a way of finding out if the scheduled tests (offline, long, short etc.) have been performed?
Many thanks for any replies. It might be just me being thick, or the SMART specifications being a bit complex, or the manual page not being as clear as it could be - or maybe a combination of the above :-)
Thanks for the answer - but this is not exactly what I am asking. You are suggesting how to find the current smart status (using smartctl and /var/log/syslog).
What I am saying is that smartd is running in the background all the time, monitoring and running its periodic tests. However, it fails to spot the problem, unless I restart the machine. Shouldn't I get an error/warning message when the automatic smart test occurs, several times a week? Why is it only emailing me when the machine starts that something is wrong? If I don't restart the server, it will sit there running happily for months and never notifies me that something is wrong.
What is the point of having smartd running in the background and performing automatic tests, and being configured to email away warnings - if I have to manually login remotely and run smartctl and check syslog?
Do I get an initial email when I start smartd if something is wrong, and then no other warning email after each scan, even if there is a fault, unless the fault goes worse?
AFAIK the default is "-M once". If you want an alert on smartd startup you add "-M test".
Quote:
Originally Posted by xj25vm
Can I force it to send me warning emails after each test (offline, online, short, long etc.) if the fault condition is still there, even if it hasn't gotten worse?
You mean 'M daily'?
Quote:
Originally Posted by xj25vm
There seem to be a variety of parameters monitored. I was thinking that if I monitor bad sectors, and the hdd temperature - it should be enough to warn me when the hard-disk is failing? Would the -a switch in smartd.conf cover these?
Apparently the man page says it covers what you wrote about. Plus it's a smartd default.
Quote:
Originally Posted by xj25vm
I don't understand the difference between monitoring and logging in relation to alerts? Do I get alerts only if I log things?
I don't know if I can explain this in a simple way but monitoring is what smartd does. Reason-for-being kind of thing. Logging means telling syslog about changes like starting smartd, starting a self-test or telling it some value has changed. Alerting means smartd emailing a problem description. While it would be odd not to log changes you can do so (use say '-l local6' and don't reference it in /etc/syslog.conf) and only keep the alerting.
Quote:
Originally Posted by xj25vm
Do I understand correctly that the -p and -u switch would warn me of *any* changes in SMART attributes, even if they don't represent risk of failure?
"-t", which starnds for "combine -p with -t" reports all changes. With "-I" you can ignore specific values.
Quote:
Originally Posted by xj25vm
Is there a way of finding out if the scheduled tests (offline, long, short etc.) have been performed?
AFAIK the default is "-M once". If you want an alert on smartd startup you add "-M test".
I think what I am after is -M daily. I hope this will email me daily *only* if there keeps on being a problem with the hard-disk. Although I can't tell from the man page if that is the case - or it will email me daily even if there is nothing to report. It is a bit confusing that, although you can schedule regular tests - this doesn't seem to influence how frequently you receive email alerts. I just assumed that, if a test finds a problem, it will just email the alert immediately. It seemed like the reasonable thing to expect. You enable emailing - and you receive alerts all the time while there keeps on being a problem there - every time you run a test and it keeps on finding the problem?
Thanks for the other pointers as well. I think I'm getting there :-)
I hope this will email me daily *only* if there keeps on being a problem with the hard-disk. Although I can't tell from the man page if that is the case - or it will email me daily even if there is nothing to report.
If there's nothing to report smartd won't send email.
Quote:
Originally Posted by xj25vm
It is a bit confusing that, although you can schedule regular tests - this doesn't seem to influence how frequently you receive email alerts. I just assumed that, if a test finds a problem, it will just email the alert immediately. It seemed like the reasonable thing to expect.
AFAIK self-tests are independent backgrounded processes that take quite a while to complete and that's different from counters the disk maintains and which smartd can access and report about instantly.
Quote:
Originally Posted by xj25vm
You enable emailing - and you receive alerts all the time while there keeps on being a problem there - every time you run a test and it keeps on finding the problem?
No, I chose the default. In contrast to people who think getting a gazillion emails is a Good Thing I hold the opinion that if one can't or won't respond to a single message then sending duplicates won't change / teach anything priority / efficiency-wise...
No, I chose the default. In contrast to people who think getting a gazillion emails is a Good Thing I hold the opinion that if one can't or won't respond to a single message then sending duplicates won't change / teach anything priority / efficiency-wise...
I can't necessarily argue with your point there - however, I prefer to be pestered again and again until I get around to solving the problem - otherwise it gets lost in the noise of daily running around and trying to fix things. I guess that means I'm not as organised as I should be
Thanks again for your helpful replies. I just couldn't get my head around the idea of warning emails being treated as completely separate things from the scheduled scans.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.