Or: Who Would Have Thought rm -r ./e could cause so much trouble?
I've been doing this sys admin thing for about a year now. I was pretty much a complete n00b when I started. I still have a lot to learn obviously.
This morning I was relaxing in my office, poking around on my server. I say my server, but I actually inherited this server from another admin, so there are things about it that are mysterious to me. I noticed a partition called scratch. After some investigation, I determined that it was used for exactly what it sounded like and everything on it was old enough to merit deleting (I have tape backups after all). I suppose I could have reformatted it, but deleting a few folders seemed easy enough.
I started with virusmail going with the command "rm -r ./virusmail" which worked fine. Encouraged, I turned to a directory simply called "e". It was a two year old backup for a professor who is notorious for hoarding backups and such going back decades. In this directory, there were 2 GB of data. It was obviously a stale backup and not needed. I'm certain the individual didn't even know it was there. After a bit of thought I issued (as root) the command: rm -r ./e
And there was an uncomfortable pause.
The virusmail folder went away pretty quick, I figured this probably shouldn't take more than a few seconds. To reassure myself, I called up a second ssh terminal and it just hung. Oh **** what did I do?!
What if there was a softlink to the root directory in that folder? Would that have made this the equivalent of rm -r /*
After running upstairs to the server room the terminal says something about being out of memory. Somehow the command froze the machine up.
After initiating the nail-bitingly-long reboot process (still wondering if I hadn't wiped out the system) everything seemed to work normally at first. Then I realized anytime someone tried to run various programs, the program would just hang. Not good. I had little energy left for freaking out though, I was more annoyed and determined to fix it at this point. The worse they can do is fire me. Then maybe I'll move out of state and work my way into the video game industry (something I probably should have done a long time ago).
I tried to kill various hanging processes. I couldn't remember if -9 was the signal and found a neat page lamenting that so many n00bs thought "kill -9 pid" is a good idea. It suggested:
Code:
kill pid (sends a TERM, wait 5 seconds)
kill pid (yes, try again, wait 5 seconds)
kill -INT pid (wait for it)
kill -INT pid (damn, still not dead?)
kill -KILL pid (same thing as -9)
kill -KILL pid (something is wrong)
Well, something was indeed wrong. Kill -9 pid wouldn't even work. A Google search turned up, what else, a LQ thread where I got the idea that an unkillable process indicates a wait for IO. Maybe a drive was bad. That made sense after a freeze and surprise shutdown. After a bit of exploration I found that any attempt to access the /tmp partition caused hanging. I went from annoyance to complete relief.
After a year of experience (thanks for all the help LQ
) that seemed like a really easy problem to fix. I ended up editing fstab with pico (vi tried to use /tmp and hung) so that /tmp would not be mounted on reboot. After reboot, root remade /tmp for me as I expected. The old pine program complained rather specifically that permissions needed to be 1777 on /tmp. I complied (verifying the truth of this on another server) and now everything seems to be working perfectly. Fortunately the /root partition has 20 GB of unused space and was only used to 1% capacity the last 4 years so /tmp should be able to exist on it happily. The old /tmp only had about 4GB of stuff on it.
I don't guess there is anything too important in those 4GB of stuff. Probably just years of temporary files not being properly deleted. I suppose I could try to remount that partition as something else and investigate further, but I'm migrating to a new Debian server soon so I may just leave well enough alone
Well, that was cathartic. Now I'm going to go beat my head against a wall for being so stupid
or maybe have some lunch. I have enough problems without causing my own! Then again, I learned several valuable things, though I still don't know why the: rm -r ./e command failed. I'm going to be scared to ever use rm -r again (especially as root).