LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (http://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   Process crash while it reads write on the USB HDD and USB is plugged out. (http://www.linuxquestions.org/questions/linux-kernel-70/process-crash-while-it-reads-write-on-the-usb-hdd-and-usb-is-plugged-out-724065/)

nitinarora 05-06-2009 03:14 AM

Process crash while it reads write on the USB HDD and USB is plugged out.
 
Hi All,

I have arm based board with Linux kernel.
I connected a USB HDD (harddisk) to a usb port on the board and run a
Disk Performance test tool (process name: performancetest).

I observed that while the tool is running (i.e. reading and writing on the disk), If the usb disk is plugged out it crashes the process (performancetest) with "Bus Error". Traces also show the page fault occured.

Can you please answer the following questions.

1. Is it the correct behaviour, traces show that it is unhandled page fault, why it is unhandled page fault, why can't it be handled, shouldn't kernel handle it inside and return errors to read write system calls.

2. Or, Is it applications responsibility to implement relevant signal handler and kernel is doing its job correctly.



Pid: 139, comm: performancetest
CPU: 0
PC is at 0x31ab8
LR is at 0x14a9c
pc : [<00031ab8>] lr : [<00014a9c>] Tainted: P
sp : bedbe9b8 ip : 04000000 fp : 00013c18
r10: 00013c20 r9 : 00000079 r8 : 0008034c
r7 : 00008a00 r6 : 0008037c r5 : 0002c000 r4 : 000a4000
r3 : 0007f6c0 r2 : 00000000 r1 : 00080798 r0 : 000d0000
Flags: nzCv IRQs on FIQs on Mode USER_32 Segment user
Control: C5387F
Table: 6C2C4000 DAC: 00000015
[<c0021494>] (show_regs+0x0/0x50) from [<c00273ac>] (__do_user_fault+0x60/0xec)
r4 = CB734CE0
[<c002734c>] (__do_user_fault+0x0/0xec) from [<c00276ac>] (do_page_fault+0x1f8/0x228)
r7 = C06FE128 r6 = CB734CE0 r5 = 00031000 r4 = 00000001
[<c00274b4>] (do_page_fault+0x0/0x228) from [<c0027784>] (do_translation_fault+0x24/0xd4)
[<c0027760>] (do_translation_fault+0x0/0xd4) from [<c002784c>] (do_PrefetchAbort+0x18/0x1c)
r5 = 0002C000 r4 = FFFFFFFF
[<c0027834>] (do_PrefetchAbort+0x0/0x1c) from [<c001fde0>] (ret_from_exception+0x0/0x10)
[<c0024890>] (show_stack+0x0/0x48) from [<c00273b8>] (__do_user_fault+0x6c/0xec)
[<c002734c>] (__do_user_fault+0x0/0xec) from [<c00276ac>] (do_page_fault+0x1f8/0x228)
r7 = C06FE128 r6 = CB734CE0 r5 = 00031000 r4 = 00000001
[<c00274b4>] (do_page_fault+0x0/0x228) from [<c0027784>] (do_translation_fault+0x24/0xd4)
[<c0027760>] (do_translation_fault+0x0/0xd4) from [<c002784c>] (do_PrefetchAbort+0x18/0x1c)
r5 = 0002C000 r4 = FFFFFFFF
[<c0027834>] (do_PrefetchAbort+0x0/0x1c) from [<c001fde0>] (ret_from_exception+0x0/0x10)
-----------------------------------------------------------
* dump maps on pid (139)
-----------------------------------------------------------
00008000-00078000 r-xp 00000000 08:03 65 /mnt/performancetest1
0007f000-00080000 rw-p 0006f000 08:03 65 /mnt/performancetest1
00080000-000a4000 rwxp 00080000 08:03 65
40000000-40001000 rw-p 40000000 08:03 65
bedaa000-bedbf000 rwxp bedaa000 08:03 65
-----------------------------------------------------------

task stack info : pid(139) stack area (0xbedaa000 ~ 0xbedbf000)
scsi 0:0:0:0: rejecting I/O to dead device
scsi 0:0:0:0: rejecting I/O to dead device

[24] + Bus error ./performancetest1 Guide.avi all


Thanks & Regards
Nitin Arora

titan22 05-06-2009 02:31 PM

Everything worked as expected.

1. Page fault is supposed to fetch in the image from your disk. The reason the page fault cannot be completed successfully was because VM __do_page_fault() or handle_mm_fault() cannot fetch in the content from disk (disk was already pulled out when page fault happened). VM sent the SIGBUS to the process to indicate that page fault failed. The process accessed a valid linear address but unfortunately no content is available.

The pending signals are picked up by the application (performancetest in your case) when CPU returns the control from page fault (which sets the pending signal on your application thread) back to your application thread and the thread is switching back to or is already in user task. That is no SIGBUS signal handling in kernel task, only in user task or about to return to user task. Since performancetest does not register SIGBUS handler so the default action (dump core and exit) was executed.

Page fault is an exception which can happen at any code location. It does not change caller's code (like adding an error). To change the code execution path you need to provide SIGBUS handler.

2. Application should provide its own SIGBUS handler if it does not like the default action. For example, ignore the rest of IOs and report the result and exit. VM was doing its job correctly. (sets the signal). Application's signal handler decides what to do when the signal is detected.

nitinarora 05-07-2009 04:51 AM

Ya I understand that VM is doing its job right by setting the appropriate signal for the Application, Because VM has no idea that why this page is not in the memory it only detects the page fault tries to handle it and finally helplessly sets the signal handler.

But Kernel has a knowledge why this page fault occured, so Isn't it possible for the Kernel that when it detects USB plug out, At that very moment it marks tha pages in memory (where device is mapped) to be invalid OR unavailable and any further access to that memory location shall not be treated as page fault, because VM in advance know what actually has happened, and an error like (EUNAVAILABLE) returned.

This is only my own thinking, I am not sure if its feasible or not so please suggest if I am talking some crap?

Thanks & Regards
Nitin Arora

titan22 05-07-2009 01:08 PM

Yes. You are talking crap and everybody went through the same phase before it starts to make sense.

No OS does what you described. A clean way to remove a device is to umount the filesystem first. Normally umounting a filesystem with sources in use leads to "device busy" failure. Some allow force umount which shuts down a filesystem with dangling open sources (like open file descriptors, memory mapped). The application accessing these dangling object file descriptor is going to get an error, touching the mapped address is going to get SIGBUG.

You want a feature that force-umount (pluggable device) frees resources (for example unmap ranges) behind processes' back. Should the application re-mmap() again when the device is re-plugged because it is freed by unplug. Or is re-mmap done automatically by "mount"? Should "mount" reacquire those forcefully freed resources during "umount(unplug)" and where to store these info?

The only thing kernel can do when accessing a problem address is sending the application a signal. This simple mechanism is how demand paging works.


Can you answer what you expect from the follwing scenario?
address = mmap();
<-- unplug -->
var = *(address);
<-- What do you want Kernel to do when content of address is no longer available?
<-- As you said "an error like (EUNAVAILABLE) returned."
Are you expecting "var = EUNAVAILABLE"?

nitinarora 05-18-2009 03:37 AM

Thanks for help.
You are correct.


All times are GMT -5. The time now is 12:54 PM.