LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-17-2014, 10:02 AM   #1
rrije
Member
 
Registered: Jul 2010
Distribution: openSUSE 11.4
Posts: 33

Rep: Reputation: Disabled
rsyslog: ruleset queues processed too slowly during peak load


Hello,

there is a Debian Wheezy server with rsyslog 5.8.11 (from repos) that aggregates messages from two hosts, putting them into files at dynamically generated paths. A while ago we came to realization that a large portion of messages was getting dropped/not logged.

After some digging it became apparent that:

- rsyslog processes messages as it should during day time
- during evening peak hours when the number of messages grows significantly rsyslog starts eating RAM, then begins the swapping, then message processing seemingly comes to a halt. Lately in extreme cases rsyslog was known to crash
- according to statistics both ruleset queues start growing and their processing can't/won't keep up

Here's the example output of *impstats* module during the peak hour testing:

Code:
    2014-11-28T15:12:03.675001+03:00 trcoll rsyslogd-pstats: imuxsock: submitted=0 ratelimit.discarded=0 ratelimit.numratelimiters=0 
    2014-11-28T15:12:03.675305+03:00 trcoll rsyslogd-pstats: ruleset: size=52631 enqueued=1900685 full=0 maxqsize=59487 
    2014-11-28T15:12:03.675309+03:00 trcoll rsyslogd-pstats: ruleset: size=17 enqueued=1365123 full=0 maxqsize=3783 
    2014-11-28T15:12:03.675312+03:00 trcoll rsyslogd-pstats: main Q: size=3 enqueued=5 full=0 maxqsize=3
    [...]
    2014-11-28T15:38:05.031015+03:00 trcoll rsyslogd-pstats: imuxsock: submitted=129 ratelimit.discarded=0 ratelimit.numratelimiters=21 
    2014-11-28T15:38:05.031025+03:00 trcoll rsyslogd-pstats: ruleset: size=1402722 enqueued=51631288 full=0 maxqsize=1421823 
    2014-11-28T15:38:05.031030+03:00 trcoll rsyslogd-pstats: ruleset: size=33 enqueued=36496128 full=0 maxqsize=8245 
    2014-11-28T15:38:05.031035+03:00 trcoll rsyslogd-pstats: main Q: size=3 enqueued=238 full=0 maxqsize=4
It is a fact that some time ago this setup worked fine with approximately same load.

The issue likely persists for several months already, so it's hard to tell whether there were significant changes to the config. However, it is most likely that around the time the problem started a) rsyslog wasn't updated or significantly changed in any way b) the output template was slightly changed — it's a field-delimiter based template, and as the message format from the two sending hosts had been changed, the template was changed to accomodate the new format (i.e. field numbers).

For now separating the two rulesets into different rsyslog instances solved the problem, but we'd like to know the reason the problem appeared and subsequently return to the 'one rsyslog; two rulesets' setup.

So far the search revealed that there was only a couple issues with similar symptoms, one related to TLS (which we don't use), the second a bug fixed in v4.

The server with rsyslog is a Supermicro X8SIL server with Intel Xeon X3450 @ 2.67GHz and 4GB RAM. Logs are written to a RAID5 array storage. As far as I understand the server performance shouldn't be an issue.

The rsyslog config is as follows:

Code:
    $ModLoad imuxsock # provides support for local system logging
    $ModLoad imklog   # provides kernel logging support (previously done by rklogd)
    $ModLoad imudp
    
    $WorkDirectory /var/spool/rsyslog
    
    $template QHourLogs2,"/mnt/data/traf/host2/%$year%/%$month%/%$day%%$hour%%$qhour%.log.z"
    $template QHourLogs1,"/mnt/data/traf/host1/%$year%/%$month%/%$day%%$hour%%$qhour%.log.z"
    $template host_rulematch, "%TIMESTAMP% %msg:F,32+:8% %msg:F,32+:10% %msg:F,32+:16% %msg:F,32+:17% %msg:F,32+:18%\n"
    $template hosttest_rulematch, "%TIMESTAMP% %msg:F,32+:9% %msg:F,32+:11% %msg:F,32+:17% %msg:F,32+:18% %msg:F,32+:19%\n"
    $template host_cf, "%TIMESTAMP% %msg:F,32+:4% %msg:F,32+:5% %msg:F,32+:7% %msg:F,32+:8%\n"
    $template host_cf_old, "%TIMESTAMP% %msg:F,32+:8% %msg:F,32+:10% %msg:F,32+:23% %msg:F,32+:5%\n"
    
    
    $AbortOnUncleanConfig off
    
    $MainMsgQueueSize 30000000
    $MainMsgQueueSaveOnShutdown off
    $MainMsgQueueDiscardMark 350000
    $MainMsgQueueHighWaterMark 250000
    $MainMsgQueueLowWaterMark 30000
    $MainMsgQueueType FixedArray
    $MainMsgQueueWorkerThreads 8
    $MainMsgQueueMaxFileSize 5g
    
    $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
    $FileOwner root
    $FileGroup adm
    $FileCreateMode 0640
    $DirCreateMode 0755
    $Umask 0022
    $IncludeConfig /etc/rsyslog.d/*.conf
    
    
    $UDPServerAddress 127.0.0.1
    $UDPServerRun 514
    
    
    #-------------------------------defaults
    #
    # First some standard log files.  Log by facility.
    #
    auth,authpriv.*			/var/log/auth.log
    *.*;auth,authpriv.none		-/var/log/syslog
    #cron.*				/var/log/cron.log
    daemon.*			-/var/log/daemon.log
    kern.*				-/var/log/kern.log
    lpr.*				-/var/log/lpr.log
    mail.*				-/var/log/mail.log
    user.*				-/var/log/user.log
    #
    # Logging for the mail system.  Split it up so that
    # it is easy to write scripts to parse these files.
    #
    mail.info			-/var/log/mail.info
    mail.warn			-/var/log/mail.warn
    mail.err			/var/log/mail.err
    #
    # Logging for INN news system.
    #
    news.crit			/var/log/news/news.crit
    news.err			/var/log/news/news.err
    news.notice			-/var/log/news/news.notice
    #
    # Some "catch-all" log files.
    #
    *.=debug;\
    	auth,authpriv.none;\
    	news.none;mail.none	-/var/log/debug
    *.=info;*.=notice;*.=warn;\
    	auth,authpriv.none;\
    	cron,daemon.none;\
    	mail,news.none		-/var/log/messages
    #
    # Emergencies are sent to everybody logged in.
    #
    *.emerg				*
    #
    daemon.*;mail.*;\
    	news.err;\
    	*.=debug;*.=info;\
    	*.=notice;*.=warn	|/dev/xconsole
    #---------------------------------
    
    $RuleSet host1
    $RulesetCreateMainQueue on # create ruleset-specific
    $OMFileAsyncWriting on
    $OMFileZipLevel 9
    $OMFileIOBufferSize 50m
    :msg, contains, "some_marker" -?QHourLogs1;host_cf
    
    
    $RuleSet host2
    $RulesetCreateMainQueue on # create ruleset-specific 
    $OMFileAsyncWriting on
    $OMFileZipLevel 9
    $OMFileIOBufferSize 50m
    :msg, contains, "some_marker" -?QHourLogs2;host_cf
    
    
    $InputUDPServerBindRuleset host2
    $UDPServerAddress <host2_ip>
    $UDPServerRun 514
    $IMUDPSchedulingPolicy fifo
    $IMUDPSchedulingPriority 10
    $UDPServerTimeRequery 100
    
    $InputUDPServerBindRuleset host1
    $UDPServerAddress <host1_ip>
    $UDPServerRun 514
    $UDPServerTimeRequery 100
Some limited testing with different options related to file buffer sizes and such were tried, but to no avail (at least in the course of the limited time alotted for the tests the problem persisted). I'd like to have a bunch of possible measures on hand, so testing of possible solutions can be done in one batch.

Rsyslog update is most unwelcome, unless it will be proven that the issue is related to the current version's core features/bugs.

Any suggestions about the root of the problem or where to look are greatly appreciated.

Thanks in advance.
 
Old 12-18-2014, 03:55 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by rrije View Post
Code:
    2014-11-28T15:12:03.675001+03:00 trcoll rsyslogd-pstats: imuxsock: submitted=0 ratelimit.discarded=0 ratelimit.numratelimiters=0 
    [...]
    2014-11-28T15:38:05.031015+03:00 trcoll rsyslogd-pstats: imuxsock: submitted=129 ratelimit.discarded=0 ratelimit.numratelimiters=21
submitted= doesn't show a sh*tload of messages submitted and ratelimit.discarded= doesn't show any messages have been discarded. On top of that


Quote:
Originally Posted by rrije View Post
Code:
    2014-11-28T15:12:03.675312+03:00 trcoll rsyslogd-pstats: main Q: size=3 enqueued=5 full=0 maxqsize=3
    [...]
    2014-11-28T15:38:05.031035+03:00 trcoll rsyslogd-pstats: main Q: size=3 enqueued=238 full=0 maxqsize=4
... the full= shows no queue was ever full.


Now this may seem the culprit:
Quote:
Originally Posted by rrije View Post
Code:
    2014-11-28T15:12:03.675305+03:00 trcoll rsyslogd-pstats: ruleset: size=52631 enqueued=1900685 full=0 maxqsize=59487 
    2014-11-28T15:12:03.675309+03:00 trcoll rsyslogd-pstats: ruleset: size=17 enqueued=1365123 full=0 maxqsize=3783 
    [...]
    2014-11-28T15:38:05.031025+03:00 trcoll rsyslogd-pstats: ruleset: size=1402722 enqueued=51631288 full=0 maxqsize=1421823 
    2014-11-28T15:38:05.031030+03:00 trcoll rsyslogd-pstats: ruleset: size=33 enqueued=36496128 full=0 maxqsize=8245
Again full= shows no queue was ever full, size= says there's little messages in the queue now and maxqsize= (max queued messages) was never that large to suspect discards.

So what is your actual evidence to support dropped messages?
 
  


Reply

Tags
rsyslog



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
CPU load peak every 10 seconds, causing mouse lag Stefanix Linux - General 11 06-28-2012 12:23 PM
POSIX Queues with message type facility of System V queues ehsan_haq98@yahoo.com Programming 2 12-27-2009 11:45 AM
IPtables failing to load: unknown ruleset bureado Linux - Software 3 06-19-2008 10:32 PM
Programs load very slowly! sdat1333 Mandriva 6 11-24-2004 02:41 PM
cannot load the accounting ruleset monsoon Linux - Newbie 1 12-14-2002 10:11 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration