LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Lubuntu 12.10 crashing every night possibly by BackInTime program (https://www.linuxquestions.org/questions/linux-software-2/lubuntu-12-10-crashing-every-night-possibly-by-backintime-program-4175489060/)

xmrkite 12-24-2013 01:25 PM

Lubuntu 12.10 crashing every night possibly by BackInTime program
 
1 Attachment(s)
Lubuntu 12.10 32-bit with 6gb ram in the system. Two 1gb ram virtualbox machines run inside it as well as BackInTime running each night at 10:00pm.

It seems each night around 1:30 (give or take an hour), the server crashes with a ton of oom-killer messages in the kern.log (I check after rebooting).

Just recently I added over one million files to my directory that backintime backs up. So I think it could be that.

I also run Crashplan as well as rsync all the files to another server each night, but I disabled those two about a week ago trying to find out what is crashing the server. It still crashes with those disabled.

Attached is a portion of my kern.log file. The file is about 1mb on the server and keeps giving oom-killer errors every few minutes until I manually powered the server off.

Thanks for any help.

S0M30N3 12-25-2013 03:01 PM

Please compare times of first oom message with Backintimes syslog messages to find out in which state the new snapshot is when this happens.

Regards,
Germar, BIT dev team

syg00 12-25-2013 07:27 PM

Large memory (yes even 6 Gig) management of 32-bit systems is murky at best.
That doesn't look to be large allocation requests.
I'd be inclined to save /proc/zoneinfo and "slabtop -o -s c" just after a re-boot, and when you get bit by the oom-killer. Might give you some ideas.

xmrkite 12-27-2013 06:40 PM

OK, looking at a recent syslog, I see this regarding backintime:

Dec 20 23:34:53 lubuntu-server backintime (root): INFO: Command "find "/media/truebackup/backintime/server/root/daily/20131219-220002-954/backup/" -type d -exec chmod a-w {} \;" returns 0

Then a few minutes later in the same log file:

Dec 20 23:57:04 lubuntu-server kernel: [52119.257761] nxagent invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Dec 20 23:57:04 lubuntu-server kernel: [52119.257766] nxagent cpuset=/ mems_allowed=0
Dec 20 23:57:04 lubuntu-server kernel: [52119.257769] Pid: 2854, comm: nxagent Tainted: G O 3.5.0-17-generic #28-Ubuntu
Dec 20 23:57:04 lubuntu-server kernel: [52119.257771] Call Trace:
Dec 20 23:57:04 lubuntu-server kernel: [52119.257779] [<c15c01c4>] dump_header.isra.10+0x86/0x1b4
Dec 20 23:57:04 lubuntu-server kernel: [52119.257784] [<c1104a1a>] oom_kill_process+0x23a/0x270
Dec 20 23:57:04 lubuntu-server kernel: [52119.257788] [<c1104ae1>] ? select_bad_process.constprop.15+0x91/0x170
Dec 20 23:57:04 lubuntu-server kernel: [52119.257791] [<c1104f53>] out_of_memory+0x163/0x1c0
Dec 20 23:57:04 lubuntu-server kernel: [52119.257794] [<c1108abf>] __alloc_pages_nodemask+0x68f/0x750
Dec 20 23:57:04 lubuntu-server kernel: [52119.257798] [<c1108bfc>] __get_free_pages+0x1c/0x40

And so on. I have since disabled backintime and the system has not crashed or had an oom-killer entry in any of the logs. So something is going on with backintime. This all started when I added about a million files to one of the current directories that backintime backs up. Also, I rsync the files off this server onto another server and that completes with no problems.

S0M30N3 12-29-2013 02:39 PM

Which BIT version do you use?

After that line the next task would be 'chmod -R a+w <new_snapshot_folder>'. A very simple task. Quite weired if that would cause a oom-killer...

If you use BIT version >=1.0.22 you can try out 'full rsync mode' which won't do all this chmod action anymore. Maybe that would help.

xmrkite 12-29-2013 06:17 PM

I'm using 1.0.10.

How do I update it? The repository has only this version. Is there a deb file I can download to update it?

syg00 12-29-2013 07:22 PM

Quote:

Originally Posted by S0M30N3 (Post 5088629)
After that line the next task would be 'chmod -R a+w <new_snapshot_folder>'. A very simple task. Quite weired if that would cause a oom-killer...

I would doubt it (the "find ..." command string) caused the issue directly unless there are a large number of directories - a quick strace showed a new child being clone'd for each directory; could be an issue with very large number of directories. I'd be looking at that python task - I presume it's from BIT from the PID. Lots of small memory allocations might cause fragmentation in the slab allocator - there have been issues reasonably recently with this (couple of years ago). Even if the task gets killed it may not fix the fragmentation.

Not doing all that would seem a (much) better option.

xmrkite 12-29-2013 07:24 PM

And I do have tons and tons of directories. What do you recommend I do then? Not use backintime? Or will the newer version fix this issue?

syg00 12-29-2013 07:29 PM

I always believe in trying to help anyone prepared to develop open-source.
As you have a real problem, and a potential solution, I'm sure they would appreciate you trying the new version. Whether it solves the issue or not, the feedback will be beneficial.
I would hope it does help.

xmrkite 12-29-2013 07:43 PM

OK, I'll install it tonight (I have to find out how first) and then will run the backup tonight. Hopefully my server does not freeze up. Still, I'll give it 3 days to be sure since a few times it took two nights before it froze the server.

S0M30N3 12-30-2013 11:19 AM

You can use our PPA bit-team/stable to get the current stable version (1.0.34)
Code:

sudo add-apt-repository ppa:bit-team/stable
sudo apt-get update
sudo apt-get upgrade

I agree with syg00 that it might be the 'find ...' command.
I'm pretty sure the new version will fix it because there are two changes regarding this. First we now use
Code:

find [...] -exec chmod [...] {} +
instead of
Code:

find [...] -exec chmod [...] {} \;
which will drastically reduce the amount of new chmod instances (works like xargs)

And second there is this new 'Full rsync mode' which will delegate all the work to rsync (must be selected in options to use it)

xmrkite 12-30-2013 11:22 AM

Ok I got it installed and it ran fine last night. I'll run it a few more times and report back. I checked on the rsync option in the options area of the program.

Thanks

xmrkite 12-30-2013 01:08 PM

The hard drive I backup to ran out of space...I believe switching to rsync made the program recreate all the files in the backup drive. In other words, it was not an incremental backup but a new full backup. I'll have to delete my old backups to run this one as my original folder being backed up takes up more than half the space on the backup drive.

S0M30N3 12-30-2013 01:42 PM

Quote:

Originally Posted by xmrkite (Post 5089166)
it was not an incremental backup but a new full backup.

To prove if they are not incremental anymore please take a look at FAQ 2403.
Normally they should be incremental. Even after switching to 'Full rsync mode'. But if deleting all previous snapshots is an option for you, it would be better anyways.

Which filesystem do you use on source and destination? I'd recommend ext2|3|4 for dst.

xmrkite 12-30-2013 03:33 PM

I use ext4. The files all said 1 whereas my old backups would have much higher numbers. So basically, this was not incremental. I am going to delete all my backups and then run this again and see what happens.


All times are GMT -5. The time now is 08:15 AM.