LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 05-08-2009, 04:24 PM   #1
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD, Raspbian, Arch
Posts: 2,331

Rep: Reputation: 357Reputation: 357Reputation: 357Reputation: 357
Slackware 12.1: ? Non-killable, hanging commands - including "reboot"


Hi -

I was happily backing up a directory of music files to an external drive when the cp command locked up. Now the system won't respond to a reboot request either.

As root:

(1) mount -t vfat /dev/sda1 /mnt/tmp
(2) cd /mnt/tmp
(3) cp -arv /srv/public/music.orig/* .

The cp ran for a while (about 10Gb transferred), then hung. Grabbed another virtual console window. Tried to kill and kill -9 the cp, but it wouldn't die. Tried to umount /mnt/tmp. Busy. Tried umount -l which appeared to unmount it. cp command still hung. Kill -9 root shell that owned the cp. Shell died, but cp process now owned by init.

Tried "eject /dev/sda". Hung, just like the original cp. Grabbed another console window but couldn't kill the eject. Killed the root shell that owned eject. Eject now owned by init. (Are we starting to see a pattern here???!!!)

Finally just pulled the usb plug on the external drive (which appeared to be unmounted at this time anyway).

Reinserted external drive usb and attempted to remount. mount command hung. You can guess what I did next, and the result was the same. Yep, hung, couldn't kill, mount now owned by init.

OK. Try the big gun. "reboot". Of course it hung! No doubt in some tizzy over that (currently unplugged!) external disk. Gave it quite a while to see if it would eventually cooperate. Nope. System still alive as I can ssh into from a different server. But refusing to reboot.

Any ideas on what happened? And how I can gracefully recover? ("graceful" being a relative term at this point!) I can't think of much else besides killing the power and crashing the box.

Thanks!

Code:
ps -ef | grep root...

root      9671     1  1 14:14 ?        00:01:03 cp -arv /srv/public/music.orig/Bedroom_C_Music /srv/public/music.orig/Bedroom_F-DocAndSet-Erin_MyMusic /srv/public/music.orig/Bedroom_F-Livingroom-Backup_Music /srv/public/music.orig/Bedroom_F_Music /srv/public/music.orig/CarolLaptop_C_Music /srv/public/music.orig/CarolLaptop_D_Music /srv/public/music.orig/Livingroom_C-DocAndSet-Chris_MyMusic /srv/public/music.orig/Livingroom_C-DocAndSet-Erin_MyMusic /srv/public/music.orig/Livingroom_C_Music .

root      9817     1  0 14:59 ?        00:00:00 eject /dev/sda

root      9878     1  0 15:06 ?        00:00:00 mount /dev/sda1 /mnt/tmp

root      9899  9856  0 15:07 tty1     00:00:00 shutdown -r 0 w
Code:
head /proc/9671/status...

Name:   cp
State:  D (disk sleep)
Tgid:   9671
Pid:    9671
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0 1 2 3 4 6 10 11 17 18 19 26 83

Last edited by haertig; 05-08-2009 at 04:27 PM.
 
Old 05-09-2009, 03:21 PM   #2
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD, Raspbian, Arch
Posts: 2,331

Original Poster
Rep: Reputation: 357Reputation: 357Reputation: 357Reputation: 357
Some further information on this problem ... just for the record.

I tried "telinit s" with the intent of going to single user mode, manually unmounting everything possible,and then crashing the box with a power kill. The telinit hung (why didn't I expect that?!) It hosed up all my other virtual console windows too. I could still get into the box via ssh from another machine, so I did that and unmounted everthing I could stil in multi-user mode. I power cycled the box and it came back up perfectly (having to recover from the journels for /usr, /var, etc.) No problems.

After rebooting, I tried my original cp command for the backup to external drive again. I was able to duplicate the hung processes. I now know the steps to avoid, but not why they caused the problem in the first place.

Here's how to duplicate my failure mode:

(1) sudo su -
(2) mount -t vfat /dev/sda1 /mnt/tmp (this is the external usb drive)
(3) cp -rv /srv/public/music.orig/* /mnt/tmp/Music/.

The copy goes along nicely ... no problems

Then...

(4) from a different computer, "ssh <the_slackware_box>"
(5) cd /mnt/tmp/Music
(6) ls (I did this to check on the progress of the original cp)

That's it. The original cp command locked up as soon as I issued that ls command. Processes that touch that /mnt/tmp mountpoint lockup and cannot be killed. Shutdown doesn't work.

So now I know what NOT to do in the future! I guess a usb vfat drive just doesn't like being messed with by more than one process at a time. Weird.

The original cp works flawlessly if I don't "interrupt" it by doing an ls on the destination directory while cp is active.
 
Old 05-09-2009, 03:54 PM   #3
adriv
Member
 
Registered: Nov 2005
Location: Diessen, The Netherlands
Distribution: Slackware 15
Posts: 700

Rep: Reputation: 43
If your system locks up completely and you are afraid to do a hard reboot because of the possible data loss, try this:
Alt + SysRq + s (sync)
Alt + SysRq + u (unmount)
Alt + SysRq + b ((re)boot)
This also works on Windows (where it's even more handy )
 
Old 05-09-2009, 04:39 PM   #4
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD, Raspbian, Arch
Posts: 2,331

Original Poster
Rep: Reputation: 357Reputation: 357Reputation: 357Reputation: 357
Hmmm. I am unfmiliar with those commands. Thanks for pointing them out. This is the first time I remember getting so snarled up in Linux that I had to crash the system (except during a few OS installs, but that doesn't really count for me). In my case this time, Linux was still cruising along and basically working, up until some command - any command - had to mess with that external vfat disk. "reboot", "shutdown", telinit", etc. all had to touch that disk in some manner, so they all hung. But other things continued working.
 
Old 05-09-2009, 05:19 PM   #5
bgeddy
Senior Member
 
Registered: Sep 2006
Location: Liverpool - England
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810

Rep: Reputation: 232Reputation: 232Reputation: 232
Yes - the "magic keys". Raising Elephants Is So Utterly Boring is a mnemonic for the keys Alt+SysRq+r"+e"+i"+s"+u"+"b"which nearly always works when I have a lock up (which isn't often). For information is here if you are interested.
 
Old 05-10-2009, 04:11 AM   #6
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
There is a certain limit to how much the globbing expansion * can do ... too many files and it will hang. Try using tar to copy the files or don't copy all of them at once.

About how many files are there ?
 
Old 05-10-2009, 01:20 PM   #7
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD, Raspbian, Arch
Posts: 2,331

Original Poster
Rep: Reputation: 357Reputation: 357Reputation: 357Reputation: 357
Quote:
Originally Posted by H_TeXMeX_H View Post
There is a certain limit to how much the globbing expansion * can do ... too many files and it will hang. Try using tar to copy the files or don't copy all of them at once.

About how many files are there ?
The * was only matching the subdirectories in that /srv/public/music.orig directory ... about eight of them. Those eight subdirectories probably had another 600 subdirs under them, and about 5000 actual files.

Normally I use tar or cpio or rsync for huge things, but in this case I knew that cp -r * should be OK since there were only eight subdirectories (and no files) at that top level of the copy.

And indeed the cp did work perfectly ... as long as I didn't mess with its target directory from a different window. I imagine this is some rarely encountered glitch in Linux's usb/vfat mounting. But that's just a guess.

This effort is part of my backup strategy. Daily (cronjob), I mount an internal backup drive, rsync to it, then unmount it. Then manually (maybe once a month) I plug in an external usb drive and backup to it. Usually I use rsync for that too, but in today's case I had just done a major rearrangement of my music files and decided I'd do a fresh copy from scratch using cp. Sometimes I further backup to DVD's, but not so much anymore. It just takes too many of them. The really important stuff routinely (OK, "semi-routinely"!) gets backed up to DVDs'/CD's and put in my safe deposit box, but music files don't really qualify for that.
 
Old 05-14-2009, 10:41 AM   #8
jmacloue
LQ Newbie
 
Registered: Apr 2009
Location: Kharkiv, UA
Distribution: Slackware
Posts: 18

Rep: Reputation: 7
I have experienced such behavior with external USB devices like HDDs and some card-readers - usually in slow USB 1.1 mode. It seems that intensive copy operation saturates USB write buffer and make the delivery of control commands from PC to the device very slow (remember, USB is serial - there is no means to perform data exchange in parallel so read command must wait until all the write ones are sent!). Then the program using the device gets stuck in "uninterruptible i/o operation" state and cannot be killed even by otherwise-deadly SIGKILL - just as in your case.

When you try to ls the contents of the drive while copying large amount of data - in other words, when write buffer is full - your ls will stuck in the same "uninterruptible i/o" state as well, nothing bad happens actually. If you wait for some time it should show you the directory listing but it may require quite a long time on a slow device.

So I'd recommend you check the actual mode of operation - is it USB 2.0 and not 1.1, is your drive functioning properly and so on. BTW, why not use "cp -v ..." if you need to watch the progress of operation?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Standard commands give "-bash: open: command not found" even in "su -" and "su root" mibo12 Linux - General 4 11-11-2007 10:18 PM
Slackware "hanging" on boot Rakka Linux - Newbie 1 10-08-2004 06:58 PM
Tiny Sofa 2.0 - I thought "halt", "reboot" were only root command ?? sorcerer Linux - Distributions 1 08-21-2004 03:28 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 12:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration