LinuxQuestions.org - Strange behavior with simple backup routine

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - Strange behavior with simple backup routine (https://www.linuxquestions.org/questions/linux-software-2/strange-behavior-with-simple-backup-routine-4175449874/)

Strange behavior with simple backup routine

Hey all, I've created a simple backup system for quick restoring of files in case they get accidentally deleted. My backup server is running Debian Squeeze, and the storage is a 2TB MDADM RAID 10 array. I did something like this a while back and everything was working fine but I came in this morning and my Nagios server was complaining about my backup server (Critical CPU Usage and excessive processes). Here's what I'm using for my backup script:

Code:

mount -t cifs -o username=admin,password=Pass1! //192.168.2.29/Shares /mnt/Windows

cp -R /mnt/Windows/"Not Shared"/* /mnt/r10/Backup/srvDictum2/5-Fri/NotShared

cp -R /mnt/Windows/Shared/* /mnt/r10/Backup/srvDictum2/5-Fri/Shared

umount /mnt/Windows

I ran ps -aux | less and toward the bottom I saw this:

Code:

root  14684  0.0  0.0  45556  776 ?  S    01:56  0:00 /USR/SBIN/CRON

root  14685  0.0  0.0  3956  372 ?  Ss  01:56  0:00 /bin/sh -c /home/jobee/bin/TueDictum2  >/dev/null 2>&1

root  14686  0.0  0.0  3956  384 ?  S    01:56  0:00 /bin/sh /home/jobee/bin/TueDictum2

root  14688  0.0  0.0  18080  684 ?  D    01:56  0:31 cp -R /mnt/Windows/Not Shared/ADA

and then 20 more lines filled with folder entries (/mnt/Windows yadda yadda). There were about 61 of those groups.

I started this from crontab (a different backup file for each day of the week) so I'm going to go in and kill that until I figure out what I'm doing wrong. Any help would be greatly appreciated.

Thanks,

Joe B

The process status is your big indicator here that its a wait issue.

The process status of D is disk wait basically.

Man page:

Code:

D uninterruptible sleep (usually IO)

This is a common problem with CIFS(atleast in my experience) windows shares. Its usually resolved by a reboot of the windows machine and a restart of the services on the linux machine but thats been in my cases. You can do an strace on teh process it is probably doing a "POLL" on a file descriptor which is a socket that is connecting to the windows machine. If that's the case you know the issue is on the Windows box.

Ok, thought it was because I was backing up over a 100Mb (when I thought it was a 1Gb connection) but that's not it. I'm running at 1Gb now and I'm still getting this error...

That was yesterday before I was interrupted. Today the server is almost unresponsive; ls -l takes about 10 seconds to display and an additional 10 seconds or so to run.

I don't understand why this worked on another server a few months ago but not now. I'll kill this backup, create some wimpy folders and files and experiment on them while running Wireshark to see if I can figure this out.

Thanks,

Joe B