LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Disk full -> major system meltdown. HELP!!! (https://www.linuxquestions.org/questions/linux-general-1/disk-full-major-system-meltdown-help-607002/)

hasi 12-15-2007 02:35 PM

Disk full -> major system meltdown. HELP!!!
 
My KDE X session (Kubuntu) suddenly crashed and I found myself at the KDM login window. Login to any account failed (without notice), even into the failsafe sessions. I found out that a reboot did not help.
I booted from the Kubuntu LiveCD and discovered that my root disk had filled up due to an undiscovered background backup process. However, even after freeing space from the drive, I still could not log on.
Booting into the failsafe mode works (kind of), I can log into the root account. However, I found out that my root drive is now surprisingly mounted as "read-only". I thus cannot do anything useful on that computer. Also, the shell after logon says "root@ (none)", so it does not recognize my computer name. Since it is read-only, there is no log file, and I do not have any idea how to debug it!!! I do have backups, but I still have some hope that it is only some very small setting that is damaged.

Questions:
1) Any idea what is going on?
2) Are there any log files in read-only mode (RAM-disk, etc)? Where would that be?
3) Why is the computer name not recognized?

Any help would be greatly appreciated.
--hasi

jailbait 12-15-2007 04:23 PM

Quote:

Originally Posted by hasi (Post 2991577)
My KDE X session (Kubuntu) suddenly crashed and I found myself at the KDM login window. Login to any account failed (without notice), even into the failsafe sessions. I found out that a reboot did not help.
I booted from the Kubuntu LiveCD and discovered that my root disk had filled up due to an undiscovered background backup process. However, even after freeing space from the drive, I still could not log on.
Booting into the failsafe mode works (kind of), I can log into the root account. However, I found out that my root drive is now surprisingly mounted as "read-only". I thus cannot do anything useful on that computer. Also, the shell after logon says "root@ (none)", so it does not recognize my computer name. Since it is read-only, there is no log file, and I do not have any idea how to debug it!!! I do have backups, but I still have some hope that it is only some very small setting that is damaged.

Questions:
1) Any idea what is going on?
2) Are there any log files in read-only mode (RAM-disk, etc)? Where would that be?
3) Why is the computer name not recognized?

Any help would be greatly appreciated.
--hasi

Failsafe mode mounts your / partition as read only. If you want to make changes to the file system go back to using the Kubuntu LiveCD.

"2) Are there any log files in read-only mode (RAM-disk, etc)? Where would that be?"

The log files are in /var/log. The logs that you are interested in will be found in /var/log/dmesg. dmesg is the log of boot messages from the last boot. Most distributions also keep several dmesg files from the previous boots. So you should boot once where the boot does not work and then boot into failsafe and examine the next to newest copy of the dmesg log.

"3) Why is the computer name not recognized?"

The computer name is in /etc/hostname. Check /etc/hostname to see if the name is there correctly. If not you can repair /etc/hostname from the live CD.

You did not say anything about fsck. From your symptoms it is possible that your file system is broken. Have you run fsck during any of your attempts to boot? If fsck found orphan files or directories it put them into a directory called lost+found. Boot into failsafe and take a look at the lost+found directory to see if any missing files are in lost+found. Since fsck does not know what the files are it gives the files in lost+found an arbitrary number for a name.

--------------------
Steve Stites

syg00 12-15-2007 04:46 PM

You can (usually) remount the root as "rw" from single user mode. Don't worry about the hostname - that's not a problem; the network initscripts don't get run.
The log from "this" boot is not relevant - but may have overlaid your real logs from last time since it didn't close down properly. See if ".?.gz" logs exist from logrotate; they may have something useful.
Or not.

I'd just see what is using all the space "du -x / | sort -nr | less" (will take a while). The biggest will be at the top of the list.

hasi 12-15-2007 04:56 PM

Thanks very much, Steve.

I had actually done a fsck (forgot to mention it), and it came out clean. There is nothing in lost+found. Also, I had already checked the /etc/hostnames; everything there looked normal. I will now look at /var/log/dmesg if that tells me anything.

Is it possible that it is related to an authentication problem? (Damaged pw file or something similar?)

Best,
--hasi.

hasi 12-15-2007 05:07 PM

Dear syg00:
I suppose my HD became clogged because "Keep" (KDE backup utility based on rdiff-backup) ran in the background which I did not realize. Usually, it writes on my external disk. However, I found the backup files on my local disk (in the usual mount path of the external drive). Bad luck.
How do I remount the root disk as rw? "mount -a"?
Thanks for your help.
--hasi

btmiller 12-15-2007 06:54 PM

Code:

mount -o remount,rw /
Should remount the root partition in read-write mode. Watch carefully for any errors printed to the screen or in dmesg when you execute this command. Good luck!

syg00 12-15-2007 06:59 PM

"mount -o remount,rw /" should do it.

hasi 12-16-2007 12:13 PM

Thanks a bunch, all.
Unfortunately, I cannot try your suggestions. I had already lost patience and did a complete system re-install. Pretty bad that something as simple as a full disk would force me to do that! Admittedly, I am not yet extremely good in Linux. Anyway, I did not expect that. I read in some threads that a automatic "disk full" warning is in the works. Sounds like a good idea.
--hasi.

unSpawn 12-16-2007 12:54 PM

Quote:

Originally Posted by hasi (Post 2992333)
I had already lost patience and did a complete system re-install. Pretty bad that something as simple as a full disk would force me to do that!

No it doesn't force you to do that, it just that you're not that experienced to know there are alternatives. There actually are only a few cases where you would resort to reinstalling GNU/Linux from scratch. Most of the time people reinstall because they (for whatever reason) don't want to get to the bottom of the problem to fix it.



Quote:

Originally Posted by hasi (Post 2992333)
I read in some threads that a automatic "disk full" warning is in the works. Sounds like a good idea.

Totally unnecesary. If you partition your system properly with separate partitions for /, /home, /tmp, var/tmp and /var and run something like Logwatch regularly to alert you (of course you would have to read the e-mails), then you wouldn't have this problem.

hasi 12-16-2007 01:34 PM

Thank you very much, unSpawn.
I took the decision to reformat after I had already spent many hours to find out what was going on. I am sure after some time I would have been able to figure it out. However, as you mentioned, I am not yet that good in Linux, and the time to figure it out may have been very significant. Probably much longer than the time it took me to restart from scratch, which was the rationale behind doing it. Unfortunately, I also was under pressure to continue with the work I was using the computer for.
The system I am talking about is actually my personal notebook. I already have data (\home) and system on 2 different partitions. Are you suggesting I need to have 5 independent partitions on a notebook to have a sound configuration?
Also, my system partition was filled within probably one hour or less (from having ~25% free). Would the Logwatch technique you mention have been "fast" enough to detect it?

btmiller 12-16-2007 04:22 PM

For a personal workstation or laptop, just having /, /home, and swap is generally sufficient. However, it sounds like you did not make the partitions big enough and you did not disable the backup program even though it did not have an external drive to back up onto. Just having extra partitions wouldn't have fixed it, and it sounds like it happened so fast that unless you had logwatch running every few hours you wouldn't have noticed it. So this falls under the category of knowing what's going on in your system and checking up on it. Not knowing what's going on in this case bit you, and it's bitten every computer user and admin at one point or another ... take it as a learning experience.

Incidently this sounds like a good opportunity for you to get some practice. Some day when you have some free time, maybe find an old unused PC (not an important system), install Linux, and purposely fill up the root drive (dd /dev/random somewhere as root, for example). Then try to figure out how to recover. Again, try this only on a computer that you can afford to mess up!!

unSpawn 12-16-2007 06:32 PM

Check over a period of time (say two months) where the most write activity takes place and where the highest volume of changes are made wrt installing and updating. Chances are it won't be in / since it's for system stuff and /lib, /etc, /sbin and /bin don't regularly increase volume tenfold overnight. Depending on the usage of the system you'll find it's a mix of /var, /usr, /tmp and /home. Apart from user quotas and mount flags a more elaborate partition scheme wouldn't have "fixed" a write DoS in / but definately mitigates damage since a simple write DoS won't cross those boundaries. Needless to say I don't agree with the "/, /home plus swap" simplification.


All times are GMT -5. The time now is 12:59 PM.