LinuxQuestions.org - cpu hung after doing insmod

Hello,

I am trying to write a block driver module in which I create a kernel thread for doing the actual job of servicing the I/O requests. I create the thread as part of module_init and make it wait until there is a request to be served. However, sometimes I see a cpu hung panic while doing insmod or right after doing insmod. The dump looks like the following:

Code:

kernel: [  159.893591] NMI backtrace for cpu 3

kernel: [  159.893591] CPU 3

kernel: [  159.893591] Call Trace:

kernel: [  159.893591]  [<ffffffff8159d776>] wait_for_common+0x26/0x150

kernel: [  159.893591]  [<ffffffff8159d908>] wait_for_completion_interruptible+0x18/0x30

kernel: [  159.893591]  [<ffffffffa00ed4be>] tsdd_worker_thread+0x4e/0x1e0 [tsdd]

kernel: [  159.893591]  [<ffffffff81075eee>] kthread+0x7e/0x90



[  280.033982] BUG: soft lockup - CPU#0 stuck for 22s! [blkid:1672]

[  280.034009] CPU 0

[  280.034027]

[  280.034032] Pid: 1672, comm: blkid Tainted: G        W 

[  280.034034] Process blkid (pid: 1672, threadinfo ffff880037ade000, task ffff880037a4a7c0)

[  280.034034] Stack:

[  280.034034]  0000000000000000 01ff880037adfbc0 ffff880037adfbc0 ffff880037ade000

[  280.034034]  ffff880037adffd8 000000000000101d ffff88011b1ed098 ffff880116ac0800

[  280.034034]  ffff8801050d1490 ffffffff8108d3e3 ffff88011b1ed2a0 ffffffff81183180

[  280.034034] Call Trace:

[  280.034034]  [<ffffffff8108d3e3>] smp_call_function+0x33/0x60

[  280.034034]  [<ffffffff8108d443>] on_each_cpu+0x33/0xa0

[  280.034034]  [<ffffffff81189e95>] __blkdev_put+0x185/0x1f0

[  280.034034]  [<ffffffff8115570a>] __fput+0xaa/0x200

[  280.034034]  [<ffffffff81151f3f>] filp_close+0x5f/0x90

[  280.034034]  [<ffffffff810562b6>] put_files_struct.part.11+0x76/0xe0

[  280.034034]  [<ffffffff8105800f>] do_exit+0x17f/0x450

[  280.034034]  [<ffffffff81058471>] do_group_exit+0x41/0xb0

[  280.034034]  [<ffffffff810695bc>] get_signal_to_deliver+0x20c/0x480

[  280.034034]  [<ffffffff81002775>] do_signal+0x35/0x110

[  280.034034]  [<ffffffff81002a05>] do_notify_resume+0x65/0x90

[  280.034034]  [<ffffffff815a75e0>] int_signal+0x12/0x17

rtkit-daemon[1468]: The canary thread is apparently starving. Taking action.

rtkit-daemon[1468]: Demoting known real-time threads.

rtkit-daemon[1468]: Demoted 0 threads.

udevd[432]: timeout: killing '/sbin/blkid -o udev -p /dev/tsdd0' [1672]

Below is a rough sketch of my code for module init and the worker thread:

Code:

struct completion *start = NULL;

struct completion *done = NULL;

struct request *sch_req;

my_module_init() {

        start = kzalloc();

        done = kzalloc();



        struct task_struct *task = kthread_create(my_worker_thread);

        if (task)

                wake_up_process(task);

        wait_for_completion(start);

        ...

}



int my_worker_thread() {

        complete(start);

        allow_signal(SIGINT);

        while (!kthread_should_stop()) {

                if (wait_for_completion_interruptible(done)) {

                        continue;

                }

                ...

                while (sch_req) {

                        // process request

                        // check for new request

                }

                init_completion(done);

        }

}



// function which first gets the request and hands over to the worker thread

void transfer_request() {

        // fetch request

        served = 0;

        while (!served) {

                if (completion_done(done)) {

                        continue;

                }

                sch_req = req;

                served = 1;

                complete(done);

        }

}

Could someone help me figure out what is wrong with this code that leads to the above mentioned panic?
Thanks in advance.