LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Out of space on root partition, solved by logout-login? (https://www.linuxquestions.org/questions/slackware-14/out-of-space-on-root-partition-solved-by-logout-login-753074/)

niels.horn 09-06-2009 09:36 AM

Out of space on root partition, solved by logout-login?
 
I'm puzzled by this one...

Yesterday my Slackware 13.0-current box started complaining that there was no space left on the root partition and, obviously, some things started to behave strangely.

My root partition is 15GB and I have separate partitions for /boot /home and data.
These 15GB should be more than enough for standard Slackware with some extra packages (I am used to 5 ~ 6 GB total).

'df' showed 0 bytes free on /dev/root

I checked the usual thing like /var/log, /tmp, etc., but couldn't find anything.
Actually, summing up /var, /usr, /bin, /sbin, /lib, /lib64, /etc, /opt I found only about 6GB and saw nothing occupying all these extra gigabytes!
I also checked /mnt & /media, but nothing there either.

I do not normally boot this machine (the last time was when I updated the kernel, about two weeks ago), it stays on always.

I finally decided to fall back to runlevel 3, leaving Xfce.
Then I checked again and /dev/root had more than 8GB free again, using only about 6GB - which is normal.

Does anyone have a clue what was happening?
If this was some kind of stuck temporary file, why couldn't I find it?

Weird.... :scratch:

crabboy 09-06-2009 10:19 PM

have you seen the problem happen again?

It's possible for a process to open a file and while it is open a separate process can delete the file (rm from the command line perhaps) and the file will appear to be gone. But as long as the original process still has an open handle to the file, it can continue to read and write to the file and it will continue to consume disk space until the file is closed or the process ends.

niels.horn 09-06-2009 10:45 PM

I am closely monitoring my space on / since this happened and something is eating away my disk space :(
I'll leave my computer running again tonight and will check tomorrow morning.

If it continues to leak disk space, I'll try the logoff-logon process again...

niels.horn 09-07-2009 07:19 AM

This is getting annoying...

I left my desktop on the whole night, with only Xfce and a terminal window open. I woke up this morning with 300MB gone on '/'
Going to runlevel 3 gives me everything back, but I really would like to know what is eating my hard disk space, because this way I cannot leave my computer on 24x7.

Any idea how I can find the process that's filling my root partition with 'invisible' files?

trhodes 09-07-2009 03:08 PM

(300 megabytes) per (8 hours) = 10.6666667 kilobytes per second
That's a fair amount of data being written overnight.
Have you tried this ?
Code:

lsof +f -- /dev/root | grep deleted
If I'm not mistaken, (deleted) appears in files that are still held open. The eighth column shows the inodes in use. However, given that information, I'm not sure how to determine inode disk usage. I thought a filesystem debugger (in my case, I use XFS, so maybe "xfs_db -c blockuse" or "blockget") would shed some light on what is hogging up space, but then I read that an xfs filesystem would probably have to be unmounted :(. debugfs (for ext[234]) appears to not require that a filesystem be unmounted, so you can probably "stat" suspect deleted files to see the number of blocks used, if you can narrow down the potential open, deleted, files.

niels.horn 09-07-2009 03:51 PM

It is indeed a fair amount of data :)
It filled up 8GB on my drive in at most two weeks (last time I had booted).

I am not at home right now, but I'll check at night and report back here...
I tried the 'lsof' but did not narrow it down to deleted files.

trhodes 09-07-2009 04:19 PM

Ha, the seventh column in lsof output shows size ... I should have looked more closely :). No need for a filesystem debugger, although that works (with ext[234]) too.
This article seems wholly relevant to linux's lsof command with respect to deleted-but-open files. The deleted file stuff is about 2/3 down the page.

This:
Code:

watch -d -n 8 'lsof -X +f -- / | grep deleted | sort -r -n -k 7,7'
might help troubleshoot your problem when you get around to it.

If that doesn't account for your disk space, maybe you should try checking for a rootkit ? That's the way a tty/pty sniffer or keylogger could behave.

rg3 09-07-2009 04:42 PM

iotop to the rescue! You could use it and see which program is performing disk writes.

Edit: the -a switch shows the accumulated amount of IO and is interesting too if we suppose this is a single process. If they are several processes than spawn from time to time, accumulated IO will not tell you much, AFAIK.

bgeddy 09-07-2009 05:35 PM

Quote:

iotop to the rescue! You could use it and see which program is performing disk writes.
I was interested in this and had a look. It seems to need certain CONFIG_TASK* parameters to have been set in the running kernel (I/O accounting support built in). These are not built into the standard Slackware kernels and so the program fails. You could of course rebuild a kernel but it seems a bit far !

rg3 09-07-2009 06:33 PM

Oh, sorry. I didn't know it didn't work with a stock kernel. I always build mine and had to activate the corresponding options too.

trhodes 09-07-2009 08:16 PM

Also, to ask the obvious, does
Code:

du -xsh /
show 6GB used, or does it show more than 6GB?
(My xfs / partition is 12G total, 9.7G used (1 year of /tmp), and it took about 3 minutes to run; I don't like waiting for du)

Edit: for finding large, but not hidden, files, "fsview" is useful.

niels.horn 09-07-2009 09:23 PM

I haven't found the cause yet...

df -h shows that / has 6.1GB in use
du -xsh / show that it has 5.6GB

That's a difference of about 500MB :( - a lot of space to waste...

trhodes 09-08-2009 05:06 AM

What I find on the 'net about big "du" and "df" disk usage inconsistencies seems to blame open, unlinked files.
I learned a more elegant way of doing these lsof commands:
Code:

lsof -a -X +L1 /
It finds all link counts (excluding sockets) with a link count of less than one on the root filesystem, similar to the previous "grep"s of lsof output for "deleted".

Does / did lsof show changes in the sizes of open, unlinked files?
lsof output likely has clues.

Possibly an email client, web browser, or plugin thereof could be misbehaving, although an (average) 6-10 kB/s is a significant portion of an internet connection's download speed, so just http/html cache doesn't seem likely to write that much data. That data rate is not a whole lot less than what goes to a terminal's stdout! The most open-yet-unlinked files on my system are related to html renderers (web browsers + email client), and their disk usage is variable and not excessive. Browsers' caches would certainly be a process only present in runlevel 4 and not in runlevel 3 (if you only use level 4 for GUI). Asterisk also has a consistently small open and unlinked file.

Interestingly, on a nearby computer, this:
Code:

export count=0 ; for x in $(lsof -F s -a -X +L1 / | egrep '^s.*' | sed -r 's/s//') ; do count=$(( $x + $count )) ; done ; echo $count ; unset count
showed 1398131244 bytes, or 1.3 gigabytes worth of data in open, unlinked files, and its user happens to use many browser plugins and addons. My computer and another nearby one, combined, only have 2 MB worth of open unlinked files and neither have lots of browser plugins and addons.

Have you seen this data growth happen in runlevel 3 or left it there long enough to see the increased disk usage? It's not likely to help much, but if it's your machine, (ie nobody complains about CLI login :) ) you could try startx instead of the usual graphical login -- which could either narrow the problem process down to [xkg]dm, or at least eliminate it as a culprit. Realistically, a graphical login manager is probably not the offending process. I think the offender is some GUI app., and my guess is you'll still see this problem if you startx from runlevel 3.

There are a few tricks you could, but should not, use (like debufs's kill_file <$INODE> on open, unlinked files) to free up disk space to get more uptime. Telinit 3 is a whole lot safer than deallocating blocks at the filesystem level to free up (destroy) space.

The following doesn't apply if you're on a machine with shell access only for yourself, but if the lsof commands don't account for the disk space, I would consider looking for a root kit. Look in rc.local for weird stuff you didn't put there and lsmod output for modules that look suspicious. Do you have debian-based distro users ssh'ing in ? There could have been weak keys allowing certain rootkits in.

niels.horn 09-08-2009 06:26 AM

Well, still without clues here...

The lsof commands show some 0-byte MySql files, 1 0-byte firefox temp file and 1 300K cupsd file (probably something that got lost while printing).
None of them are growing.

What continues growing is the difference between df & du. I lost another 200MB this night.
I am already running runlevel 3 + startx (since yesterday), but it doesn't seem to make a difference.

I checked rc.local etc., but found nothing strange...

I do have ssh access on this machine, but the key is quite strong and access is from a few Slackware boxes and one Windows machine (mine, at work).
I'll leave my computer at runlevel 3 when I go to work today and check if the problem continues...

hitest 09-08-2009 08:19 AM

As previously mentioned I would scan your unit for a rootkit using rkhunter or something else as you do have ssh enabled on the box. Because you have a strong key I highly doubt it is a rootkit that may be writing data to your HD. However, it might be an idea to eliminate that possibility by scanning for an intruder.


All times are GMT -5. The time now is 10:06 AM.