LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Enterprise (https://www.linuxquestions.org/questions/linux-enterprise-47/)
-   -   filesystem /var file system suddenly utilizes 100% (https://www.linuxquestions.org/questions/linux-enterprise-47/filesystem-var-file-system-suddenly-utilizes-100-a-4175469245/)

_mz 07-11-2013 02:54 AM

filesystem /var file system suddenly utilizes 100%
 
Hi,

I had an issue with redhat server whereby /var file system was suddenly utilized 100% of space. It was triggered by alert and it was just a short issue. The server was just fine few minutes after that. There were no logs in /var/log/messages.

I suspect it could be there was a huge data loaded to other folders on that /var directory earlier but I could not confirm this or maybe other issues.

How can I trace what was going on that particular time?

business_kid 07-11-2013 02:59 AM

Only thing I can think of. . .

If /var suddenly filled up and then emptied, my guess in something humungous was written to /var/tmp. Perhaps the process crashed when it ran out of space, or it moved the data on. Not that much writes to /var/tmp

_mz 07-11-2013 03:24 AM

Thank you for your reply..

In /var/tmp directory:

# ll /var/tmp/
total 32
drwxrwxr-x 2 nagios nagios 4096 May 4 2012 check_logfiles
-rwxrwxrwx 1 root root 248 Feb 15 2012 rehe3_vmstat_110.log
-rwxrwxrwx 1 root root 29 Feb 15 2012 rehe3_vmstat_120.log
drwx------ 2 s22adm sapsys 4096 Mar 13 2012 yum-s22adm-whlFRZ


# ll /var/tmp/check_logfiles/
total 12
-rw-rw-r-- 1 nagios nagios 636 Jul 11 17:15 check_db2diaglog._db2_S22_db2dump_db2diag.log.messagelog
-rw-rw-r-- 1 nagios nagios 0 Jul 11 12:30 check_log_messages._var_log_messages.messagelog

Indeed, the time stamp for "check_log_messages._var_log_messages.messagelog" file is the exact time the issue occurred. Could this be the issue? I have no idea what is this file for..

_mz 07-11-2013 04:10 AM

Hi,

I compared to other server, all files in /var/tmp/check_logfiles/ where own by nagios utilize only 8.0K. So I do not think this is the cause. Had googled around but haven't find anything yet.

Any advise is welcomed :)

business_kid 07-11-2013 10:06 AM

Of course it's not there, because your space issue has resolved itself. Your usage went to 100% then back to normal. I was just thinking back - Where can a program erase files? The time to check is when usage is at 100%.

_mz 07-11-2013 09:04 PM

The time was at 12.30 but it was not logged in any logs of what was going on. It is hard to trace from OS level.

There were no cron jobs running at the time. Logrotate was fine. I was just thinking it could be due to application but I would like to check from OS level first before asking application team to check further..

business_kid 07-12-2013 02:11 PM

The way to narrow it is find what can/does write to /var/tmp. most apps & the OS use /tmp.

jpollard 07-12-2013 05:50 PM

Well, one way is turn on process accounting. That way you will get a log of the processes running, and when that process terminates. If it is a process aborting due to no disk, the disk will be freed, and I believe the accounting entry will contain the reason for the exit (exit status). This is not exactly precise as it will not identify the file name of the failure.


All times are GMT -5. The time now is 01:57 AM.