External strorage back up failure
As part of our back up strategy we back up data to two external strorage devices (NAS buffalo linkstation) via a cron job. Recently I noticed that the linkstation backups were not working and the server (via the terminal) complained about CIFS write error or something along these lines. I decided to reboot the server and this fixed the problem for one week for one of the linkstations and for a few weeks for the other linkstation.
There are no error messages this time on the server or on any of the logs that I have checked but the backups are not being made and the processes are still running for the days that backups have not been made.
Any ideas???....the backups used to work everyday without fail until recently and no changes have been made to the script that does the backup....
Thanks in advance
could you get a little bit more detailed?
Please put some log output to the thread from your backup software or script and from the syslog. Possible something on the communcation way to your external devices is the problem.
For the begining run
Also helpful will some information on what kind of share your are writing your backup.
A little bit network configuration description will be nice. Ping around the boxes, check that all hostnames are correct.
Hi mesiol and sorry for the late response. I Rebooted the server as a temp fix and one of the linkstation backup after a few days stopped working again.....grrrrr
I did as you mentioned and have already checked the messages file the output tends to be something like:-
'Jul 3 18:37:23 'Nameofserver' -- MARK --'
The above line is repeated every 30 mins. There is nothing that gives any information as to why the backups may be failing or any other information. The processes for the failed backups are however still hanging in the memory if that makes sense and i think when this is left for some time I will see an error at the terminal directly connected to the server saying cifs write error and a number (i think!)....sorry i did write it down the last time this occured but seemed to have misplaced the piece of paper with the exact error message.
mount information gives for the device that failed is:-
//ip address/share on /sharename type cifs (rw,mand,noexec,nosuid,nodev)
The scripts that do the backups have the same code for both (contained in diff files only diff being the location of where the bakcups are made) and are executed at midnight.
There is no issue with the hosts etc....this problem like i said just started to occur and was temp fixed by rebooting the server. After rebooting one of the linkstations (external storage) fails then a couple of weeks later then next one does....previously no problems for more than 2 years and no changes have been made regarding the server or scripts....i dont know if there is logs for the linkstations but will ask my boss if there is to view them to see if its a fault on its side or whether there is any clue as to why the backups fail.
Thanks for your assistance.
so now i have to excuse the long time for response.
It seems all a little bit strange. Normally the kernel will log problems on mounted filesystems. But if your /var/log/messages does not contain any information at the time the problem came up it seems to be a little out of my scope.
Possibly the problem only exists during the first access you can try to touch a file on the share, wait 30 seconds and than run the backup. Does the storage device use any kind of powersaving settings? possibly the disk spinup takes to long so your backup tool (tar/cp/cpio/ded or something else) will get an timeout and stops working.
Sounds like the network cxn is failing during the backup, which can cause the process to 'hang'. You'll need to restart the process.
This may sound odd, but I've known it happen; are any of the machines in a non-secure area like an a office. I have actually known the old 'cleaner used the power outlet' to happen; it's not just a myth.
Maybe somebody is obsessed with saving power (going green) and assumes eg that if no-one is sitting in front of the linkstn, its not being used...
i assumed also a network problem, but after some time syslog should report a problem during access to the share.
Best you run a ping during the complete night, in a LAN environment normaly no packet loss will appear, next morning check if your backup was failing, and check of there is a larger number of lost ping packets (e.g. > 20). This can indicate a network problem.
Are there any entries in the storage logfiles?
|All times are GMT -5. The time now is 10:20 PM.|