[SOLVED] Help with smartd configuration options

xj25vm · 06-28-2011, 01:12 AM

Hi all,

I'm having some trouble understanding the smartd configuration options which go in /etc/smartd.conf. I've had smartd running for close to a year now, and read the manual page for smartd and smartd.conf several times, and I still can't get my head around it properly. This is what I use in smartd.conf:

/dev/sda -H -f -S on -o on -n standby,q \
-s (O/../.././(00|06|12|18)|S/../.././11|L/../../(3|6)/21) \
-m <nomailer> -M exec /usr/sbin/smartd_mailer

The above works fine (I use a custom script because I use exim). However, I had two failing hard-disks on two different machines (sector pending attribute) - but I would only receive warning emails if I restarted the machines - not when the periodic tests were being performed. I have a number of questions, if any of you would like to share some light on this:

1. It is not clear from the manual page how the monitoring and alerting mode works. Do I get an initial email when I start smartd if something is wrong, and then no other warning email after each scan, even if there is a fault, unless the fault goes worse?
2. Can I force it to send me warning emails after each test (offline, online, short, long etc.) if the fault condition is still there, even if it hasn't gotten worse?
3. There seem to be a variety of parameters monitored. I was thinking that if I monitor bad sectors, and the hdd temperature - it should be enough to warn me when the hard-disk is failing? Would the -a switch in smartd.conf cover these?
4. I don't understand the difference between monitoring and logging in relation to alerts? Do I get alerts only if I log things?
5. Do I understand correctly that the -p and -u switch would warn me of *any* changes in SMART attributes, even if they don't represent risk of failure?
6. Is there a way of finding out if the scheduled tests (offline, long, short etc.) have been performed?

Many thanks for any replies. It might be just me being thick, or the SMART specifications being a bit complex, or the manual page not being as clear as it could be - or maybe a combination of the above :-)

Andy Alt · 07-07-2011, 03:59 PM

http://serverfault.com/questions/320...ecovered-value

check out the smartctl command.
http://smartmontools.sourceforge.net...martctl.8.html

As for the output messages from smartd, I'll sometimes just use tail -f /var/log/syslog | grep smartd

Though on some systems the messages are in /var/log/messages

xj25vm · 10-20-2011, 05:29 AM

Thanks for the answer - but this is not exactly what I am asking. You are suggesting how to find the current smart status (using smartctl and /var/log/syslog).

What I am saying is that smartd is running in the background all the time, monitoring and running its periodic tests. However, it fails to spot the problem, unless I restart the machine. Shouldn't I get an error/warning message when the automatic smart test occurs, several times a week? Why is it only emailing me when the machine starts that something is wrong? If I don't restart the server, it will sit there running happily for months and never notifies me that something is wrong.

What is the point of having smartd running in the background and performing automatic tests, and being configured to email away warnings - if I have to manually login remotely and run smartctl and check syslog?

unSpawn · 10-20-2011, 07:05 AM

Quote:

Originally Posted by xj25vm

Do I get an initial email when I start smartd if something is wrong, and then no other warning email after each scan, even if there is a fault, unless the fault goes worse?

AFAIK the default is "-M once". If you want an alert on smartd startup you add "-M test".

Quote:

Originally Posted by xj25vm

Can I force it to send me warning emails after each test (offline, online, short, long etc.) if the fault condition is still there, even if it hasn't gotten worse?

You mean 'M daily'?

Quote:

Originally Posted by xj25vm

There seem to be a variety of parameters monitored. I was thinking that if I monitor bad sectors, and the hdd temperature - it should be enough to warn me when the hard-disk is failing? Would the -a switch in smartd.conf cover these?

Apparently the man page says it covers what you wrote about. Plus it's a smartd default.

Quote:

Originally Posted by xj25vm

I don't understand the difference between monitoring and logging in relation to alerts? Do I get alerts only if I log things?

I don't know if I can explain this in a simple way but monitoring is what smartd does. Reason-for-being kind of thing. Logging means telling syslog about changes like starting smartd, starting a self-test or telling it some value has changed. Alerting means smartd emailing a problem description. While it would be odd not to log changes you can do so (use say '-l local6' and don't reference it in /etc/syslog.conf) and only keep the alerting.

Quote:

Originally Posted by xj25vm

Do I understand correctly that the -p and -u switch would warn me of *any* changes in SMART attributes, even if they don't represent risk of failure?

"-t", which starnds for "combine -p with -t" reports all changes. With "-I" you can ignore specific values.

Quote:

Originally Posted by xj25vm

Is there a way of finding out if the scheduled tests (offline, long, short etc.) have been performed?

As in 'smartctl -l selftest /dev/devicename'?

xj25vm · 10-20-2011, 10:17 AM

Thank you for your reply, unSpawn.

Quote:

AFAIK the default is "-M once". If you want an alert on smartd startup you add "-M test".

I think what I am after is -M daily. I hope this will email me daily *only* if there keeps on being a problem with the hard-disk. Although I can't tell from the man page if that is the case - or it will email me daily even if there is nothing to report. It is a bit confusing that, although you can schedule regular tests - this doesn't seem to influence how frequently you receive email alerts. I just assumed that, if a test finds a problem, it will just email the alert immediately. It seemed like the reasonable thing to expect. You enable emailing - and you receive alerts all the time while there keeps on being a problem there - every time you run a test and it keeps on finding the problem?

Thanks for the other pointers as well. I think I'm getting there :-)

unSpawn · 10-20-2011, 11:07 AM

Quote:

Originally Posted by xj25vm

I hope this will email me daily *only* if there keeps on being a problem with the hard-disk. Although I can't tell from the man page if that is the case - or it will email me daily even if there is nothing to report.

If there's nothing to report smartd won't send email.

Quote:

Originally Posted by xj25vm

It is a bit confusing that, although you can schedule regular tests - this doesn't seem to influence how frequently you receive email alerts. I just assumed that, if a test finds a problem, it will just email the alert immediately. It seemed like the reasonable thing to expect.

AFAIK self-tests are independent backgrounded processes that take quite a while to complete and that's different from counters the disk maintains and which smartd can access and report about instantly.

Quote:

Originally Posted by xj25vm

You enable emailing - and you receive alerts all the time while there keeps on being a problem there - every time you run a test and it keeps on finding the problem?

No, I chose the default. In contrast to people who think getting a gazillion emails is a Good Thing I hold the opinion that if one can't or won't respond to a single message then sending duplicates won't change / teach anything priority / efficiency-wise...

xj25vm · 10-20-2011, 11:24 AM

Quote:

No, I chose the default. In contrast to people who think getting a gazillion emails is a Good Thing I hold the opinion that if one can't or won't respond to a single message then sending duplicates won't change / teach anything priority / efficiency-wise...

I can't necessarily argue with your point there

- however, I prefer to be pestered again and again until I get around to solving the problem - otherwise it gets lost in the noise of daily running around and trying to fix things. I guess that means I'm not as organised as I should be

Thanks again for your helpful replies. I just couldn't get my head around the idea of warning emails being treated as completely separate things from the scheduled scans.

unSpawn · 10-20-2011, 11:45 AM

You're welcome.