Out of space on root partition, solved by logout-login?
I'm puzzled by this one...
Yesterday my Slackware 13.0-current box started complaining that there was no space left on the root partition and, obviously, some things started to behave strangely. My root partition is 15GB and I have separate partitions for /boot /home and data. These 15GB should be more than enough for standard Slackware with some extra packages (I am used to 5 ~ 6 GB total). 'df' showed 0 bytes free on /dev/root I checked the usual thing like /var/log, /tmp, etc., but couldn't find anything. Actually, summing up /var, /usr, /bin, /sbin, /lib, /lib64, /etc, /opt I found only about 6GB and saw nothing occupying all these extra gigabytes! I also checked /mnt & /media, but nothing there either. I do not normally boot this machine (the last time was when I updated the kernel, about two weeks ago), it stays on always. I finally decided to fall back to runlevel 3, leaving Xfce. Then I checked again and /dev/root had more than 8GB free again, using only about 6GB - which is normal. Does anyone have a clue what was happening? If this was some kind of stuck temporary file, why couldn't I find it? Weird.... :scratch: |
have you seen the problem happen again?
It's possible for a process to open a file and while it is open a separate process can delete the file (rm from the command line perhaps) and the file will appear to be gone. But as long as the original process still has an open handle to the file, it can continue to read and write to the file and it will continue to consume disk space until the file is closed or the process ends. |
I am closely monitoring my space on / since this happened and something is eating away my disk space :(
I'll leave my computer running again tonight and will check tomorrow morning. If it continues to leak disk space, I'll try the logoff-logon process again... |
This is getting annoying...
I left my desktop on the whole night, with only Xfce and a terminal window open. I woke up this morning with 300MB gone on '/' Going to runlevel 3 gives me everything back, but I really would like to know what is eating my hard disk space, because this way I cannot leave my computer on 24x7. Any idea how I can find the process that's filling my root partition with 'invisible' files? |
(300 megabytes) per (8 hours) = 10.6666667 kilobytes per second
That's a fair amount of data being written overnight. Have you tried this ? Code:
lsof +f -- /dev/root | grep deleted |
It is indeed a fair amount of data :)
It filled up 8GB on my drive in at most two weeks (last time I had booted). I am not at home right now, but I'll check at night and report back here... I tried the 'lsof' but did not narrow it down to deleted files. |
Ha, the seventh column in lsof output shows size ... I should have looked more closely :). No need for a filesystem debugger, although that works (with ext[234]) too.
This article seems wholly relevant to linux's lsof command with respect to deleted-but-open files. The deleted file stuff is about 2/3 down the page. This: Code:
watch -d -n 8 'lsof -X +f -- / | grep deleted | sort -r -n -k 7,7' If that doesn't account for your disk space, maybe you should try checking for a rootkit ? That's the way a tty/pty sniffer or keylogger could behave. |
iotop to the rescue! You could use it and see which program is performing disk writes.
Edit: the -a switch shows the accumulated amount of IO and is interesting too if we suppose this is a single process. If they are several processes than spawn from time to time, accumulated IO will not tell you much, AFAIK. |
Quote:
|
Oh, sorry. I didn't know it didn't work with a stock kernel. I always build mine and had to activate the corresponding options too.
|
Also, to ask the obvious, does
Code:
du -xsh / (My xfs / partition is 12G total, 9.7G used (1 year of /tmp), and it took about 3 minutes to run; I don't like waiting for du) Edit: for finding large, but not hidden, files, "fsview" is useful. |
I haven't found the cause yet...
df -h shows that / has 6.1GB in use du -xsh / show that it has 5.6GB That's a difference of about 500MB :( - a lot of space to waste... |
What I find on the 'net about big "du" and "df" disk usage inconsistencies seems to blame open, unlinked files.
I learned a more elegant way of doing these lsof commands: Code:
lsof -a -X +L1 / Does / did lsof show changes in the sizes of open, unlinked files? lsof output likely has clues. Possibly an email client, web browser, or plugin thereof could be misbehaving, although an (average) 6-10 kB/s is a significant portion of an internet connection's download speed, so just http/html cache doesn't seem likely to write that much data. That data rate is not a whole lot less than what goes to a terminal's stdout! The most open-yet-unlinked files on my system are related to html renderers (web browsers + email client), and their disk usage is variable and not excessive. Browsers' caches would certainly be a process only present in runlevel 4 and not in runlevel 3 (if you only use level 4 for GUI). Asterisk also has a consistently small open and unlinked file. Interestingly, on a nearby computer, this: Code:
export count=0 ; for x in $(lsof -F s -a -X +L1 / | egrep '^s.*' | sed -r 's/s//') ; do count=$(( $x + $count )) ; done ; echo $count ; unset count Have you seen this data growth happen in runlevel 3 or left it there long enough to see the increased disk usage? It's not likely to help much, but if it's your machine, (ie nobody complains about CLI login :) ) you could try startx instead of the usual graphical login -- which could either narrow the problem process down to [xkg]dm, or at least eliminate it as a culprit. Realistically, a graphical login manager is probably not the offending process. I think the offender is some GUI app., and my guess is you'll still see this problem if you startx from runlevel 3. There are a few tricks you could, but should not, use (like debufs's kill_file <$INODE> on open, unlinked files) to free up disk space to get more uptime. Telinit 3 is a whole lot safer than deallocating blocks at the filesystem level to free up (destroy) space. The following doesn't apply if you're on a machine with shell access only for yourself, but if the lsof commands don't account for the disk space, I would consider looking for a root kit. Look in rc.local for weird stuff you didn't put there and lsmod output for modules that look suspicious. Do you have debian-based distro users ssh'ing in ? There could have been weak keys allowing certain rootkits in. |
Well, still without clues here...
The lsof commands show some 0-byte MySql files, 1 0-byte firefox temp file and 1 300K cupsd file (probably something that got lost while printing). None of them are growing. What continues growing is the difference between df & du. I lost another 200MB this night. I am already running runlevel 3 + startx (since yesterday), but it doesn't seem to make a difference. I checked rc.local etc., but found nothing strange... I do have ssh access on this machine, but the key is quite strong and access is from a few Slackware boxes and one Windows machine (mine, at work). I'll leave my computer at runlevel 3 when I go to work today and check if the problem continues... |
As previously mentioned I would scan your unit for a rootkit using rkhunter or something else as you do have ssh enabled on the box. Because you have a strong key I highly doubt it is a rootkit that may be writing data to your HD. However, it might be an idea to eliminate that possibility by scanning for an intruder.
|
All times are GMT -5. The time now is 10:06 AM. |