[SOLVED] Automating raid failure detection on Slack 13.1
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hey fellow Slackers! I just setup sendmail on my server to send emails and it works, now I would like to be able to get an email from mdadm if sometjhing was going wrong. I imagine most raid users have this feature setup.
The script is started at boot time from rc.local. I created a small script in /usr/bin that send the following command to rc.mdadm giving me the status of the arrays:
Code:
/etc/rc.d/rc.mdadm status
and it works fine, but this requires me probing the arrays manually by calling the script from the command line. I would like to automate probing every 10 minutes or whatever and if a fault has been detected, I get an email.
Right now, with the command:
Code:
mdadm --monitor --scan --test --oneshot
I get 7 emails saying:
Code:
This is an automatically generated mail message from mdadm
running on local-server
A TestMessage event had been detected on md device /dev/md2.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md6 : active raid1 sde1[0] sdf1[1]
1465135936 blocks [2/2] [UU]
md5 : active raid1 sda9[0] sdb9[1]
253834880 blocks [2/2] [UU]
md4 : active raid1 sda8[0] sdb8[1]
15366016 blocks [2/2] [UU]
md3 : active raid1 sda7[0] sdb7[1]
10249344 blocks [2/2] [UU]
md2 : active raid1 sda6[0] sdb6[1]
10249344 blocks [2/2] [UU]
md1 : active raid1 sda5[0] sdb5[1]
20490752 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
272960 blocks [2/2] [UU]
unused devices: <none>
In my mdadm.conf I have
Code:
MAILADDR email@gmail.com
Thats all I have for now. Will this allow monitoring of my raid arrays the way it is setup now? Or do I need to modify the setup? Right now I am not even sure how mdadm is started at first, and if it is really monitoring my arrays. I could ultimately test the setup by unplugging a drive, but I really (really!) dont want to do that...
Thanks!
Last edited by lpallard; 05-01-2011 at 01:17 PM.
Click here to see the post LQ members have rated as the most helpful post in this thread.
Make sense, but my problem is to find out the way slackware launch mdadm at boot time. There is nothing in rc.local so I guess its called from somewhere else. I also found that mdadm is running with the switches --monitor --daemonise /dev/md[0-9] but thats all.
SO your command basically monitor (-F), scans the arrays (--scan), mails to dave@pc1 (-m), is daemonised (-f) and finally polls the arrays everyt 600 secs (-d).
Again, I think it make sense. Mine does not poll the arrays and will not send emails. Why does it need --scan at all? Also, since you called -m from the command directly, do you still need your email in mdadm.conf?
EDIT: I think mdadm is initialized from initrd ... If so, how do I change the parameters? Which file to modify? Last thing I want to do is to fry my setup because of a stupid error...
First thing I should say is that my only machine using software raid is Slackware 12.2 so something may have changed with 13.1 which I don't know about.
My monitor command is run from rc.local. All my raid arrays are made up of partitions of type Linux raid autodetect and are automatically detected at system boot without anything special in the initrd. My initrd only contains modules for ext3.
--scan is there because I don't specify any device names. My mdadm.conf is empty, no email address or devices in it.
Have you actually tried simulating a failure like this to see if you get an email
Slackware by default does not start mdadm at boot time.
The rc.mdadm script does that for you (assuming you have your email address in /etc/mdadm.conf)
The mdadm you see running has been started by my script and is in fact monitoring the listed devices.
The only time you'll get an email is if some RAID event happens.
If started from rc.local you should see one of the very last lines after booting is mdadm telling you it is monitoring your arrays along with the email address that errors are sent to.
It is in fact started at boot time and called from rc.local
So no need for other parameters? ps -A indicate that mdadm has been launched with only monitor and daemonise but nothing else... Will it probe the arrays to confirm all is fine?
ps aux |grep mdadm
root 3777 0.0 0.0 952 172 ? Ss Apr09 0:02 /sbin/mdadm --monitor --daemonise /dev/md[0-9]
As I mentioned earlier, when you first start the machine you should see mdadm starting and showing the email address that will be used. It won't be part of the command-line you see from ps output though.
As I mentioned earlier, when you first start the machine you should see mdadm starting and showing the email address that will be used. It won't be part of the command-line you see from ps output though.
Yes ps shows exactly what you posted on your last post.
This server is headless. No monitor attached to it. Instead of looking at the boot sequence, Is there a way to get the output of the boot sequence to see if it uses my email?
This server is headless. No monitor attached to it. Instead of looking at the boot sequence, Is there a way to get the output of the boot sequence to see if it uses my email?
If you really want to check that all is working,why not do the test I described above. No hardware unplugging involved, just the mdadm commands.
Nope! I did your test, the array degraded then successfully reconstructed, but I never got an email from mdadm and sendmail works perfectly since I get the test emails with the command:
Code:
mdadm --monitor --scan --test --oneshot
Except "on-demand" querying, I am 99% sure the rc.mdadm script does not provide notification functionality nor it provides scanning or real time monitoring per-se. I have my email in /etc/mdadm.conf. Except normal mdadm messages such as reconstruction of the array, there is nothing in /var/log/messages that shows mdadm sending an email.
Try populating your mdadm.conf, launch mdadm manually, and test again.
This is what I use, with a populated mdadm.conf:
Code:
/sbin/mdadm --monitor --scan -f -d 120
This is interesting, from the mdadm man page:
-f, --daemonise
Tell mdadm to run as a background daemon if it decides to moni-
tor anything. This causes it to fork and run in the child, and
to disconnect form the terminal. The process id of the child is
written to stdout. This is useful with --scan which will only
continue monitoring if a mail address or alert program is found
in the config file.
Also, I currently dont have to launch mdadm manually, every boot it is started automatically either via rc.mdadm or something else (that I couldnt find).
Yes ps shows exactly what you posted on your last post.
This server is headless. No monitor attached to it. Instead of looking at the boot sequence, Is there a way to get the output of the boot sequence to see if it uses my email?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.