Slackware 12.1: ? Non-killable, hanging commands - including "reboot"
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Slackware 12.1: ? Non-killable, hanging commands - including "reboot"
Hi -
I was happily backing up a directory of music files to an external drive when the cp command locked up. Now the system won't respond to a reboot request either.
As root:
(1) mount -t vfat /dev/sda1 /mnt/tmp
(2) cd /mnt/tmp
(3) cp -arv /srv/public/music.orig/* .
The cp ran for a while (about 10Gb transferred), then hung. Grabbed another virtual console window. Tried to kill and kill -9 the cp, but it wouldn't die. Tried to umount /mnt/tmp. Busy. Tried umount -l which appeared to unmount it. cp command still hung. Kill -9 root shell that owned the cp. Shell died, but cp process now owned by init.
Tried "eject /dev/sda". Hung, just like the original cp. Grabbed another console window but couldn't kill the eject. Killed the root shell that owned eject. Eject now owned by init. (Are we starting to see a pattern here???!!!)
Finally just pulled the usb plug on the external drive (which appeared to be unmounted at this time anyway).
Reinserted external drive usb and attempted to remount. mount command hung. You can guess what I did next, and the result was the same. Yep, hung, couldn't kill, mount now owned by init.
OK. Try the big gun. "reboot". Of course it hung! No doubt in some tizzy over that (currently unplugged!) external disk. Gave it quite a while to see if it would eventually cooperate. Nope. System still alive as I can ssh into from a different server. But refusing to reboot.
Any ideas on what happened? And how I can gracefully recover? ("graceful" being a relative term at this point!) I can't think of much else besides killing the power and crashing the box.
Some further information on this problem ... just for the record.
I tried "telinit s" with the intent of going to single user mode, manually unmounting everything possible,and then crashing the box with a power kill. The telinit hung (why didn't I expect that?!) It hosed up all my other virtual console windows too. I could still get into the box via ssh from another machine, so I did that and unmounted everthing I could stil in multi-user mode. I power cycled the box and it came back up perfectly (having to recover from the journels for /usr, /var, etc.) No problems.
After rebooting, I tried my original cp command for the backup to external drive again. I was able to duplicate the hung processes. I now know the steps to avoid, but not why they caused the problem in the first place.
Here's how to duplicate my failure mode:
(1) sudo su -
(2) mount -t vfat /dev/sda1 /mnt/tmp (this is the external usb drive)
(3) cp -rv /srv/public/music.orig/* /mnt/tmp/Music/.
The copy goes along nicely ... no problems
Then...
(4) from a different computer, "ssh <the_slackware_box>"
(5) cd /mnt/tmp/Music
(6) ls (I did this to check on the progress of the original cp)
That's it. The original cp command locked up as soon as I issued that ls command. Processes that touch that /mnt/tmp mountpoint lockup and cannot be killed. Shutdown doesn't work.
So now I know what NOT to do in the future! I guess a usb vfat drive just doesn't like being messed with by more than one process at a time. Weird.
The original cp works flawlessly if I don't "interrupt" it by doing an ls on the destination directory while cp is active.
If your system locks up completely and you are afraid to do a hard reboot because of the possible data loss, try this:
Alt + SysRq + s (sync)
Alt + SysRq + u (unmount)
Alt + SysRq + b ((re)boot)
This also works on Windows (where it's even more handy )
Hmmm. I am unfmiliar with those commands. Thanks for pointing them out. This is the first time I remember getting so snarled up in Linux that I had to crash the system (except during a few OS installs, but that doesn't really count for me). In my case this time, Linux was still cruising along and basically working, up until some command - any command - had to mess with that external vfat disk. "reboot", "shutdown", telinit", etc. all had to touch that disk in some manner, so they all hung. But other things continued working.
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810
Rep:
Yes - the "magic keys". Raising Elephants Is So Utterly Boring is a mnemonic for the keys Alt+SysRq+r"+e"+i"+s"+u"+"b"which nearly always works when I have a lock up (which isn't often). For information is here if you are interested.
There is a certain limit to how much the globbing expansion * can do ... too many files and it will hang. Try using tar to copy the files or don't copy all of them at once.
There is a certain limit to how much the globbing expansion * can do ... too many files and it will hang. Try using tar to copy the files or don't copy all of them at once.
About how many files are there ?
The * was only matching the subdirectories in that /srv/public/music.orig directory ... about eight of them. Those eight subdirectories probably had another 600 subdirs under them, and about 5000 actual files.
Normally I use tar or cpio or rsync for huge things, but in this case I knew that cp -r * should be OK since there were only eight subdirectories (and no files) at that top level of the copy.
And indeed the cp did work perfectly ... as long as I didn't mess with its target directory from a different window. I imagine this is some rarely encountered glitch in Linux's usb/vfat mounting. But that's just a guess.
This effort is part of my backup strategy. Daily (cronjob), I mount an internal backup drive, rsync to it, then unmount it. Then manually (maybe once a month) I plug in an external usb drive and backup to it. Usually I use rsync for that too, but in today's case I had just done a major rearrangement of my music files and decided I'd do a fresh copy from scratch using cp. Sometimes I further backup to DVD's, but not so much anymore. It just takes too many of them. The really important stuff routinely (OK, "semi-routinely"!) gets backed up to DVDs'/CD's and put in my safe deposit box, but music files don't really qualify for that.
I have experienced such behavior with external USB devices like HDDs and some card-readers - usually in slow USB 1.1 mode. It seems that intensive copy operation saturates USB write buffer and make the delivery of control commands from PC to the device very slow (remember, USB is serial - there is no means to perform data exchange in parallel so read command must wait until all the write ones are sent!). Then the program using the device gets stuck in "uninterruptible i/o operation" state and cannot be killed even by otherwise-deadly SIGKILL - just as in your case.
When you try to ls the contents of the drive while copying large amount of data - in other words, when write buffer is full - your ls will stuck in the same "uninterruptible i/o" state as well, nothing bad happens actually. If you wait for some time it should show you the directory listing but it may require quite a long time on a slow device.
So I'd recommend you check the actual mode of operation - is it USB 2.0 and not 1.1, is your drive functioning properly and so on. BTW, why not use "cp -v ..." if you need to watch the progress of operation?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.