Lubuntu 12.10 crashing every night possibly by BackInTime program
1 Attachment(s)
Lubuntu 12.10 32-bit with 6gb ram in the system. Two 1gb ram virtualbox machines run inside it as well as BackInTime running each night at 10:00pm.
It seems each night around 1:30 (give or take an hour), the server crashes with a ton of oom-killer messages in the kern.log (I check after rebooting). Just recently I added over one million files to my directory that backintime backs up. So I think it could be that. I also run Crashplan as well as rsync all the files to another server each night, but I disabled those two about a week ago trying to find out what is crashing the server. It still crashes with those disabled. Attached is a portion of my kern.log file. The file is about 1mb on the server and keeps giving oom-killer errors every few minutes until I manually powered the server off. Thanks for any help. |
Please compare times of first oom message with Backintimes syslog messages to find out in which state the new snapshot is when this happens.
Regards, Germar, BIT dev team |
Large memory (yes even 6 Gig) management of 32-bit systems is murky at best.
That doesn't look to be large allocation requests. I'd be inclined to save /proc/zoneinfo and "slabtop -o -s c" just after a re-boot, and when you get bit by the oom-killer. Might give you some ideas. |
OK, looking at a recent syslog, I see this regarding backintime:
Dec 20 23:34:53 lubuntu-server backintime (root): INFO: Command "find "/media/truebackup/backintime/server/root/daily/20131219-220002-954/backup/" -type d -exec chmod a-w {} \;" returns 0 Then a few minutes later in the same log file: Dec 20 23:57:04 lubuntu-server kernel: [52119.257761] nxagent invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0 Dec 20 23:57:04 lubuntu-server kernel: [52119.257766] nxagent cpuset=/ mems_allowed=0 Dec 20 23:57:04 lubuntu-server kernel: [52119.257769] Pid: 2854, comm: nxagent Tainted: G O 3.5.0-17-generic #28-Ubuntu Dec 20 23:57:04 lubuntu-server kernel: [52119.257771] Call Trace: Dec 20 23:57:04 lubuntu-server kernel: [52119.257779] [<c15c01c4>] dump_header.isra.10+0x86/0x1b4 Dec 20 23:57:04 lubuntu-server kernel: [52119.257784] [<c1104a1a>] oom_kill_process+0x23a/0x270 Dec 20 23:57:04 lubuntu-server kernel: [52119.257788] [<c1104ae1>] ? select_bad_process.constprop.15+0x91/0x170 Dec 20 23:57:04 lubuntu-server kernel: [52119.257791] [<c1104f53>] out_of_memory+0x163/0x1c0 Dec 20 23:57:04 lubuntu-server kernel: [52119.257794] [<c1108abf>] __alloc_pages_nodemask+0x68f/0x750 Dec 20 23:57:04 lubuntu-server kernel: [52119.257798] [<c1108bfc>] __get_free_pages+0x1c/0x40 And so on. I have since disabled backintime and the system has not crashed or had an oom-killer entry in any of the logs. So something is going on with backintime. This all started when I added about a million files to one of the current directories that backintime backs up. Also, I rsync the files off this server onto another server and that completes with no problems. |
Which BIT version do you use?
After that line the next task would be 'chmod -R a+w <new_snapshot_folder>'. A very simple task. Quite weired if that would cause a oom-killer... If you use BIT version >=1.0.22 you can try out 'full rsync mode' which won't do all this chmod action anymore. Maybe that would help. |
I'm using 1.0.10.
How do I update it? The repository has only this version. Is there a deb file I can download to update it? |
Quote:
Not doing all that would seem a (much) better option. |
And I do have tons and tons of directories. What do you recommend I do then? Not use backintime? Or will the newer version fix this issue?
|
I always believe in trying to help anyone prepared to develop open-source.
As you have a real problem, and a potential solution, I'm sure they would appreciate you trying the new version. Whether it solves the issue or not, the feedback will be beneficial. I would hope it does help. |
OK, I'll install it tonight (I have to find out how first) and then will run the backup tonight. Hopefully my server does not freeze up. Still, I'll give it 3 days to be sure since a few times it took two nights before it froze the server.
|
You can use our PPA bit-team/stable to get the current stable version (1.0.34)
Code:
sudo add-apt-repository ppa:bit-team/stable I'm pretty sure the new version will fix it because there are two changes regarding this. First we now use Code:
find [...] -exec chmod [...] {} + Code:
find [...] -exec chmod [...] {} \; And second there is this new 'Full rsync mode' which will delegate all the work to rsync (must be selected in options to use it) |
Ok I got it installed and it ran fine last night. I'll run it a few more times and report back. I checked on the rsync option in the options area of the program.
Thanks |
The hard drive I backup to ran out of space...I believe switching to rsync made the program recreate all the files in the backup drive. In other words, it was not an incremental backup but a new full backup. I'll have to delete my old backups to run this one as my original folder being backed up takes up more than half the space on the backup drive.
|
Quote:
Normally they should be incremental. Even after switching to 'Full rsync mode'. But if deleting all previous snapshots is an option for you, it would be better anyways. Which filesystem do you use on source and destination? I'd recommend ext2|3|4 for dst. |
I use ext4. The files all said 1 whereas my old backups would have much higher numbers. So basically, this was not incremental. I am going to delete all my backups and then run this again and see what happens.
|
All times are GMT -5. The time now is 08:15 AM. |