LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   disk full when it isn't (https://www.linuxquestions.org/questions/linux-newbie-8/disk-full-when-it-isnt-172012/)

pete-wilko 04-19-2004 07:53 AM

disk full when it isn't
 
Hi everyone,

kinda stuck on this one, so if anyone has any clues i'd be really appreciative! :) No doubt however I will leave pertinent details out, but here goes:

We're running an old linux server here with RedHat 7.3. The former admin kept the machine well patched and it's running a 2.4 kernel.

However I think one of the hard disks may have died or one of the file tables been corrupted.

The current output of df is:
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 321G 305G 0 100% /
/dev/sda1 197M 34M 153M 19% /boot
/dev/sda3 6.7G 5.4G 1.0G 84% /home
none 503M 0 503M 0% /dev/shm


Where sda4 should have around 85G free. Running du shows that only 244GB is being used.

Also df is returning variable results. sda4 is always 100% full, but total size fluctutates between the value above and 345G(size)/322(used).

Any clues would be most appreciated. About to give e2fcsk a whirl once some further backups have been completed (so at present it looks like data can be copied fine, all of which is causing a bit of head scratching).

Cheers,

Pete

druuna 04-19-2004 08:04 AM

Most distro's keep 5% of the total partition space for root use only:

321 - 5*(321/100) = 305

So, for non-root users the disk is full :)

Hope this helps.

pete-wilko 04-19-2004 08:10 AM

Thanks for the quick response :) - don't think that is the problem though. When I left for home on friday night df was reporting 85GB free, now it's monday, as far as I can see nobody has even logged in over the weekend, and there is no space.

Also i'm still stumped by du telling me that there is only 244GB used at all. So there is some weirdness in the numbers floating around. /tmp is only using 14MB, and there's pleanty free on /home, so not many would have had permissions to touch anything else anyhow.

The variable results of df are also quite concerning, either being 325 or 345.

That's an interesting point though, I may be able to use that 'root' space to do some fixing hopefully.

Thanks for the message :)

Cheers,

Pete

druuna 04-19-2004 08:20 AM

Quote:

don't think that is the problem though. When I left for home on friday night df was reporting 85GB free, now it's monday, as far as I can see nobody has even logged in over the weekend, and there is no space.

Also i'm still stumped by du telling me that there is only 244GB used at all. So there is some weirdness in the numbers floating around. /tmp is only using 14MB, and there's pleanty free on /home, so not many would have had permissions to touch anything else anyhow.

The variable results of df are also quite concerning, either being 325 or 345.
What do the logfiles say? Could be that something is going wrong and output is written to log/file that fills up your partition.

On the other hand, a 85Gb difference in reality and what the tools say is too much (are you sure it says 85 Gb, not 85 Mb??)

PS: /tmp is part of / and /home is not, so I do not understand the 14Mb remark about /tmp.

Quote:

That's an interesting point though, I may be able to use that 'root' space to do some fixing hopefully.
That's why 5% is left for root......

pete-wilko 04-19-2004 08:47 AM

Heya, thanks for the quick response again :)

The log files (in /var/*) total only 6.2M. I'm still hoping it's some rampant process that was selling discount furniture at low low prices and has since gone crazy (in other words some process causing all the bother).

Is there any chance that when running du from / that it would miss any files? I mean i'm really stumped why its still only saying 244GB used.

Are du or df known to misreport it's figures? Or if so any idea of the cause? That's why i'm leaning to a corrupted file table or something similar perhaps (although if that were the case then I shouldn't be able to read anything 'normally' i'd imagine).

I'm 99% certain that 85G was the figure, but you're right, my misreading could have been a possibility, to be honest though I doubt it.

The /tmp reference was only because I thought someone might say to check how full /tmp was (even given it's on the other partition), as i'd been asked that several times here at work that :).

Once again, thanks a heap for your input and putting up with the newbie stuff, most appreciated.

Cheers,

Pete

druuna 04-19-2004 12:15 PM

Quote:

The log files (in /var/*) total only 6.2M. I'm still hoping it's some rampant process that was selling discount furniture at low low prices and has since gone crazy (in other words some process causing all the bother).
I wasn't talking about the size of the logfile, but what's in it ;)
Check the logfiles, maybe there's a clue to which program might be causing this. On the other hand, maybe no program is......

Quote:

Is there any chance that when running du from / that it would miss any files? I mean i'm really stumped why its still only saying 244GB used. Are du or df known to misreport it's figures? Or if so any idea of the cause?
df and du have been around for a while, and I've never heard/encountered any strange behaviour. Neither manpages mention any (known) bugs.

Quote:

That's why i'm leaning to a corrupted file table or something similar perhaps (although if that were the case then I shouldn't be able to read anything 'normally' i'd imagine).
Did you reboot during the time you noticed this behaviour?? If you did, did fsck mention any problems?

If nothing was mentioned during boot, it's highly unlikely that your partition was/is corrupt. Personally I think it's something you (we) overlook.

Do you have many symbolic links? If you do use du's -D option, symbolic links are not followed/counted.

JaseP 04-19-2004 04:30 PM

I would also consider doing a rebuild of the tree with the appropriate fsck command...

I use Reiser and on some occasions, I need to rebuild the tree if I caused a bad crash (usually a Wine or WineX related incident) with reiser's fsck command (which is reiserfsck or something like that). You might need to boot from a rescue disk if you are rebuilding the tree with a root partition... since you can usually only do this on an unmounted partition.

pete-wilko 04-20-2004 11:02 AM

Hey everyone,

thanks for all your help, it's being most helpful! Problem is now fixed, was resolved by a reboot with fsck. However the funny thing is that no errors were reported.

What I think has happened is that some users use Xwin32 to log into the machine, and I think a rogue KDE has gone nuts and claimed a heap of resources. What I was unaware of and explains a lot, is that tools like du will only report the space being used by closed files, so files that havn't had thier handle closed will not be included. This would account for the discrepency between du and df. Found that the tool 'lsof' was helpful in starting to track down what went wrong.

Again thanks to everyone, glad that is solved and i've got that 80GB of disk back!

Fingers crossed though that dosn't happen again! Although I know the solution now, and it forced me to getting around to backing up a heap of stuff .

Cheers,

Pete

retoreto 02-20-2005 11:43 AM

hi,
I had the same problem and then saw this thread in google..
I have a Philips HDD100 mp3 Player which is easily mounted in linux. But when copying it always said "full disk"..
'df' showd 13MB free of 14GB

here my solution:
you need fsck.vfat .. ('apt-get install dosfstools')
then start 'fsck -a /dev/sda1'

fsck said something about wrong number of blocks and repaired it.
now everything's fine.. 14GB of 14GB free..

devfreak 06-14-2006 08:55 AM

Hey guys, I'm also experiencing this problem, haven't found a solution yet, and kinda need to figure one out fast because the web server is going down every night when it apparently "runs out of space".

I have FC3 running on it and the two 74gb raptors that are used together in a logical volume are reporting 75% disk full in df... but then the server will go down and everything I try to do (with the exception of delete files in a console) fails.

The funny thing is the drives are only supposed to have like 20gb used up. I tried running "du -sh /[!d,p]*" and it ran until /var and it's still trying to calculate (I think). In nautilus the properties of /var show 9gb used, the rest of the folders were about 7gb combined.

I ran fsck -n (-n because the disks were mounted) and it said the partition was clean.

I'm out of ideas... can you guys think of anything? I appreciate the comments big time.

Here's some server details to help:
dual opteron w/ ECC ram running 32bit FC3... apache, mysql 4.1, coldfusion 6, php4, exim, vncserver serving gnome-session, um... urchin..., about 6 constant smb mounts... I dunno... feel free to ask for more!

devfreak 06-14-2006 11:08 AM

Welp, now the server is really down for the count. I can't get it to boot into any kernel, it just sits right after it initializes hardware... when it Configures Kernel Prameters or something like that. I have the backup web server taking orders, but it doesn't run right and I need to fix this. I really don't want to format because there's a million settings. Really, any help here would be so unbelievably appreciated.

devfreak 06-14-2006 12:09 PM

ok I may have it. In the serenty of my downtime I persued the stalling of du in /var. I thought this would be explained by the presence of the websites in that folder, as they have many files... but the stalling happened in the spool.

Originally I thought exim was writing too many error logs because it was the only thing I had changed recently. I thought it was misconfigured and storing tons of error messages in /var/log. I found that, in fact, that it was /var/spool/exim/msglog that had too many files to du OR ls. I started rm -rvfd on them while I was on a rescue cd, and when a few thousand were deleted and my df hadn't changed (but I had plenty of room to zip things for last-minute backup), I was able to reboot into one of my previous kernels. The one I prefer and have as grub default still hangs at Configuring Kernel Parameters, which I find confusing.

This appears to be it, as I said. The files are still being deleted while the machine is up. When they're done I'm going to try and figure out the kernel thing and then I'll post my findings (for google, since that's the only way I ever find solutions on this site).

devfreak 06-15-2006 06:11 AM

Yeah, that was most certainly it. The machine was deleting all night... still going when I came in to work, and it had deleted 43gb of exim bullcrap.

pete-wilko 06-17-2006 04:04 AM

Nice one, i've got another server that i've noticed is displaying behaviour similar to yours, thanks for the info :)

tkedwards 06-17-2006 10:36 AM

Quote:

Yeah, that was most certainly it. The machine was deleting all night... still going when I came in to work, and it had deleted 43gb of exim bullcrap.
At a previous job we've had really funny behaviour around this type of thing too. Try
Code:

df -i
to see the number of inodes used/available, its possible on ext2/3 filesystems to run out.


All times are GMT -5. The time now is 01:51 PM.