LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   cpu hung after doing insmod (https://www.linuxquestions.org/questions/linux-kernel-70/cpu-hung-after-doing-insmod-4175454522/)

aasthakm 03-18-2013 09:11 AM

cpu hung after doing insmod
 
Hello,

I am trying to write a block driver module in which I create a kernel thread for doing the actual job of servicing the I/O requests. I create the thread as part of module_init and make it wait until there is a request to be served. However, sometimes I see a cpu hung panic while doing insmod or right after doing insmod. The dump looks like the following:

Code:

kernel: [  159.893591] NMI backtrace for cpu 3
kernel: [  159.893591] CPU 3
kernel: [  159.893591] Call Trace:
kernel: [  159.893591]  [<ffffffff8159d776>] wait_for_common+0x26/0x150
kernel: [  159.893591]  [<ffffffff8159d908>] wait_for_completion_interruptible+0x18/0x30
kernel: [  159.893591]  [<ffffffffa00ed4be>] tsdd_worker_thread+0x4e/0x1e0 [tsdd]
kernel: [  159.893591]  [<ffffffff81075eee>] kthread+0x7e/0x90

[  280.033982] BUG: soft lockup - CPU#0 stuck for 22s! [blkid:1672]
[  280.034009] CPU 0
[  280.034027]
[  280.034032] Pid: 1672, comm: blkid Tainted: G        W
[  280.034034] Process blkid (pid: 1672, threadinfo ffff880037ade000, task ffff880037a4a7c0)
[  280.034034] Stack:
[  280.034034]  0000000000000000 01ff880037adfbc0 ffff880037adfbc0 ffff880037ade000
[  280.034034]  ffff880037adffd8 000000000000101d ffff88011b1ed098 ffff880116ac0800
[  280.034034]  ffff8801050d1490 ffffffff8108d3e3 ffff88011b1ed2a0 ffffffff81183180
[  280.034034] Call Trace:
[  280.034034]  [<ffffffff8108d3e3>] smp_call_function+0x33/0x60
[  280.034034]  [<ffffffff8108d443>] on_each_cpu+0x33/0xa0
[  280.034034]  [<ffffffff81189e95>] __blkdev_put+0x185/0x1f0
[  280.034034]  [<ffffffff8115570a>] __fput+0xaa/0x200
[  280.034034]  [<ffffffff81151f3f>] filp_close+0x5f/0x90
[  280.034034]  [<ffffffff810562b6>] put_files_struct.part.11+0x76/0xe0
[  280.034034]  [<ffffffff8105800f>] do_exit+0x17f/0x450
[  280.034034]  [<ffffffff81058471>] do_group_exit+0x41/0xb0
[  280.034034]  [<ffffffff810695bc>] get_signal_to_deliver+0x20c/0x480
[  280.034034]  [<ffffffff81002775>] do_signal+0x35/0x110
[  280.034034]  [<ffffffff81002a05>] do_notify_resume+0x65/0x90
[  280.034034]  [<ffffffff815a75e0>] int_signal+0x12/0x17
rtkit-daemon[1468]: The canary thread is apparently starving. Taking action.
rtkit-daemon[1468]: Demoting known real-time threads.
rtkit-daemon[1468]: Demoted 0 threads.
udevd[432]: timeout: killing '/sbin/blkid -o udev -p /dev/tsdd0' [1672]

Below is a rough sketch of my code for module init and the worker thread:

Code:

struct completion *start = NULL;
struct completion *done = NULL;
struct request *sch_req;
my_module_init() {
        start = kzalloc();
        done = kzalloc();

        struct task_struct *task = kthread_create(my_worker_thread);
        if (task)
                wake_up_process(task);
        wait_for_completion(start);
        ...
}

int my_worker_thread() {
        complete(start);
        allow_signal(SIGINT);
        while (!kthread_should_stop()) {
                if (wait_for_completion_interruptible(done)) {
                        continue;
                }
                ...
                while (sch_req) {
                        // process request
                        // check for new request
                }
                init_completion(done);
        }
}

// function which first gets the request and hands over to the worker thread
void transfer_request() {
        // fetch request
        served = 0;
        while (!served) {
                if (completion_done(done)) {
                        continue;
                }
                sch_req = req;
                served = 1;
                complete(done);
        }
}

Could someone help me figure out what is wrong with this code that leads to the above mentioned panic?
Thanks in advance.


All times are GMT -5. The time now is 04:27 PM.