-   Linux - Software (
-   -   Strange behavior with simple backup routine (

jbruyet 02-12-2013 04:44 PM

Strange behavior with simple backup routine
Hey all, I've created a simple backup system for quick restoring of files in case they get accidentally deleted. My backup server is running Debian Squeeze, and the storage is a 2TB MDADM RAID 10 array. I did something like this a while back and everything was working fine but I came in this morning and my Nagios server was complaining about my backup server (Critical CPU Usage and excessive processes). Here's what I'm using for my backup script:


mount -t cifs -o username=admin,password=Pass1! // /mnt/Windows
cp -R /mnt/Windows/"Not Shared"/* /mnt/r10/Backup/srvDictum2/5-Fri/NotShared
cp -R /mnt/Windows/Shared/* /mnt/r10/Backup/srvDictum2/5-Fri/Shared
umount /mnt/Windows

I ran ps -aux | less and toward the bottom I saw this:


root  14684  0.0  0.0  45556  776 ?  S    01:56  0:00 /USR/SBIN/CRON
root  14685  0.0  0.0  3956  372 ?  Ss  01:56  0:00 /bin/sh -c /home/jobee/bin/TueDictum2  >/dev/null 2>&1
root  14686  0.0  0.0  3956  384 ?  S    01:56  0:00 /bin/sh /home/jobee/bin/TueDictum2
root  14688  0.0  0.0  18080  684 ?  D    01:56  0:31 cp -R /mnt/Windows/Not Shared/ADA

and then 20 more lines filled with folder entries (/mnt/Windows yadda yadda). There were about 61 of those groups.

I started this from crontab (a different backup file for each day of the week) so I'm going to go in and kill that until I figure out what I'm doing wrong. Any help would be greatly appreciated.


Joe B

Kustom42 02-12-2013 05:39 PM

The process status is your big indicator here that its a wait issue.

The process status of D is disk wait basically.

Man page:

D  uninterruptible sleep (usually IO)
This is a common problem with CIFS(atleast in my experience) windows shares. Its usually resolved by a reboot of the windows machine and a restart of the services on the linux machine but thats been in my cases. You can do an strace on teh process it is probably doing a "POLL" on a file descriptor which is a socket that is connecting to the windows machine. If that's the case you know the issue is on the Windows box.

jbruyet 02-15-2013 02:22 PM

Ok, thought it was because I was backing up over a 100Mb (when I thought it was a 1Gb connection) but that's not it. I'm running at 1Gb now and I'm still getting this error...

That was yesterday before I was interrupted. Today the server is almost unresponsive; ls -l takes about 10 seconds to display and an additional 10 seconds or so to run.

I don't understand why this worked on another server a few months ago but not now. I'll kill this backup, create some wimpy folders and files and experiment on them while running Wireshark to see if I can figure this out.


Joe B

All times are GMT -5. The time now is 08:42 PM.