Recurring Kernel Problem
Hi all,
I'm dealing with a recurring problem and I just thought someone might be able to tell me what sensible next steps would be to figure out what's causing it. A few weeks ago I got a Lenovo SL500 and installed Debian 9 with KDE. Around a week ago when I booted it up from standby it gave me the standard login mask and I logged in like normal. Then an error flashed up (too fast to read) and after that I only got a black screen. When I booted it up again, the GUI wouldn't load and after some google and experimentation I figured out the harddrive was full. I booted it with a puppy disc and found that the kernel log was pretty big, chalked it up to kernel error and reinstalled Debian. Now on Debian 10 it's been working well for the past week, until yesterday while doing some work for Uni I a notice popped up that my drive is almost full. It was late, so I just saved and closed everything and when it wouldn't let me call up the shut down menu, I force shut down. Now when I booted it up (big surprise) it won't boot into GUI. I don't have a problem just reinstalling Debian again, but if I have to do it every week it's going to get pretty time consuming so I'd like to figure out what's causing this. If I recall correctly the first time when I looked through the drive with Puppy Linux there was also around 30GB of full space unaccounted for, but I kind of just stopped looking once I saw the full kernel log. I did try to read the kernel log, but after an hour of stuff scrolling by too fast for me to read and it still going, I just gave up. So, what should I do / look for now / after I reinstall Debian to figure out what's causing this? Thanks everyone! |
Quote:
One way to do that: (My way, not saying there might not be a better one.)_ Boot with a live-cd and examine that log for the issue. Once you have that determined, clean up the log space so you can boot normally. Then check for logrotate. If it is installed it needs configuration, if not install and configure it. That solves the space issue going forward. If you record the specific errors or messages that have been filling the log, post them here and see if we can help you deal with that. I will be watching for any update. Please let me know what you find. |
That is not a kernel error. Either your root partition is too small, or something is writing a huge amount of data to files. It could be log files, or it could be something else. Backup files written to the internal drive can take a lot of space, and backups should not be done to the root drive. An incorrect mount to /media can cause files to be written to the root drive instead of to an external drive. There are multiple possibilities, and you need to sort out the cause instead of just reinstalling. When you boot from the Puppy drive, you can read the logs with a text editor, or pipe the output through more to read them at your leisure if you want to use the terminal. In addition to wpeckham's good advice, check the drive for directories that might be larger than normal. Some obvious places to start are /media, /var, and /usr. There should be no files in /media in most cases, just mountpoints for external drives. If all USB drives are removed and there are still files in /media, you may have found the problem. All this can take time and effort, but it's worth it to solve the problem. Reinstalling every time it happens will never solve it.
|
Much thanks!
I know that my habit of just re-installing whenever I get stuck / frustrated isn't the best approach, but I mostly work by trying out things until they work and manage to accidentally shoot my system to bits a lot along the way, which is what I thought had happened the first time over. Only when it showed up again now that I hadn't done any previous experimentation since the latest reinstall, I realized there is a bigger problem here. Anyway, I booted with puppy now and the big files seem to be var/log/kernel.log.1 and var/log/messages.1 with around 30GB each, which still leaves some 30GB unaccounted for. I'm currently trying to read those files, but they're still loading. But I was just told it might be because my installation is not SSD optimized. I have never worked with an SSD before and therefore didn't consider this. Might that be the problem? |
Something is wrong somewhere. There should not be 60GB of log files.
|
Quote:
|
Ellster,
As wpeckham has mentioned, once you have discovered exactly what is causing this problem, you can progress to setting up logrotate. Scroll down to “This is an old question...on: https://stackoverflow.com/questions/...files/35658810 You could also install and then run ncdu: https://www.binarytides.com/check-di...ge-linux-ncdu/ |
Quote:
I hope you are not trying this in a GUI editor... Try Code:
less var/log/kernel.log.1 Even so it will probably take many minutes. |
Thanks all!
Life came in between the last days, but now logrotate is set up. Quote:
Also I had look into the logs and while there was a lot, most seemed to be roughly either of those two warnings: Code:
WARNING: CPU: 0 PID: 0 at drivers/mtd/nand/raw/r852.c:746 r852_irq.cold.25+0xc/0x13 $ Code:
WARNING: CPU: 0 PID: 213 at drivers/mtd/nand/raw/r852.c:746 r852_irq.cold.25+0xc/0x13 [r852] From what I found online it seems there might be a card reader at fault which I don't need anyway, so I would now just disable it. But since I don't actually know anything I'll wait for what you think before I make a mess of things again. |
That does not enlighten me, so I will wait with you.
|
If that is what is causing your troubles.
Consider age of kernel version vs. age of hardware - the hardware must be significantly older than the kernel! Debian Stable is very stable, but also very conservative (some might say outdated). In other words, a backported kernel might recognize new hardware. Also the log file tells you which kernel modules & which hardware is involved, you can pinpoint this a little better, start with Code:
lspci -vv |
Thanks!
The very last lines I got from that were: Code:
Subsystem: Lenovo xD-Picture Card Controller The dpkg command gave me a package not installed in return, I used the current kernel version as determinded with uname -r. Not sure if I did something wrong here? |
No no no no no, that's not what I meant at all.
Please re-read my instructions. It's not well structured, each sentence stands for itself, some of it refers to output you already provided etc. In any case, posting only the last line of which command we do not know is pointless. |
Apologies. Life has been busy so I posted a bit hastily last time.
The above readout was what I got from the lspci command. I also checked the rest of that readout but couldn't identify any other paragraph that mentioned something from the logs. That one just immediately jumped out at me as the last on the page (so first visible) and because of the r852 driver also mentioned in the logs. It is possible that this is an age issue, since it's a fairly new computer (I only worked with at least 10 year old hardware before so wasn't aware up to now that that might be an issue) so I'll be looking into backports now. Anyway I have now disabled the card reader through BIOS and it seems to have stopped the overflow. Thank you all for your help! |
All times are GMT -5. The time now is 03:03 PM. |