kernel module, while being called from one process, writes to a page from another pro
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
kernel module, while being called from one process, writes to a page from another pro
I am writing a kernel module that is to be called by process p1 to overwrite a data page that belongs to a target process p2.
First, inside the kernel module and while responding to a write event to proc file system issued by p1. I used the process id of the target process (p2) to search for the latter's task struct (p2_task).
To search for the particular page I used get_user_pages(), obviously calling it on (p2_task->mm). I then called kmap() on the page returned by the previous function. Once I got the pointer I used the typical memory functions (e.g. memset()) to write to that memory. Finally called kunmap() as part of the clean-up code.
However, once the process starts running again I can see that what I did had no effect on the target process p2.
I am not sure what I did wrong. Can anyone help?
I suspect that somehow you can not write to memory belongs to process p2 while responding to a request coming from p2. Since here we are in a different context.
Is this true, if not what else I can check. If it is the problem, is there anyways I can get around that?
You need to say more about what your module is intending to do, and in what context. I suggest that it is usually best to do such things from a virtual-file that is opened by both processes ... if an existing interprocess-communication mechanism is not actually what you need anyhow.
Remember that a process could be running on another CPU literally at the same time. I strongly prefer designs that have both processes involved, not one trying to do something "to" another.
The module is trying to to create a check point of the target process. You can assume the target is sleeping (i.e. received SIGSTOP) while the module is working its magic. So, by design the target process need not be aware of what the module is trying to do, which means its source code should not be modified.
Also, I do not care what p1 is. It is just the process that triggers the check point logic.
for now, I just need to know if I am accessing the right pages and this is why I am writing to them (I picked one page which is the data segment). Eventually, and once I verify that I have reached the right set of pages, I will modify the code so that it check-points and restarts the process.
So, back to the step I am working on right now, writing to the data segment of the target process, I tried that with kernel 3.2 on amd64 and the target process seems to be working fine after being resumed (sent SIGCONT). That is, there is no effect for writing to that page. I am suspecting that I am doing something wrong here related to relative addressing. In other words, I did not even write to p2's data segment. To get the data segment I used get_user_pages(p2_task, p2_task->mm, p2->mm->start_data, ..).
In kernel 3.2 running on x86 (vmware-based VM with 1 CPU) : I get an error that reads "BUG: unable to handle kernel paging request at ffdba01c" right when it tries to run memset() within the the kernel module.
What I'm suggesting is this: even though a process is not "in running state," due to SIGSTOP or what-have-you, from a kernel perspective you really don't know what is going on, or on what CPU and so-forth. What I would suggest doing is preparing a message, in effect, that will cause the kernel to eventually switch to the context of the other process (on some CPU), and do the work that you want it to do. (Which might also eventually include putting the process back into runnable state.) Do this in a fully-serialized way because, for all you know, another CPU might be doing something in kernel-mode with regard to this other process. The swapper might be stealing the very page that you're intending to modify, and so on and on and on.
The crashes you're seeing are clearly indicative of "timing holes," and that means endless head-banging and code that simply will never be reliable.
What I'm suggesting is this: even though a process is not "in running state," due to SIGSTOP or what-have-you, from a kernel perspective you really don't know what is going on, or on what CPU and so-forth. What I would suggest doing is preparing a message, in effect, that will cause the kernel to eventually switch to the context of the other process (on some CPU), and do the work that you want it to do. (Which might also eventually include putting the process back into runnable state.) Do this in a fully-serialized way because, for all you know, another CPU might be doing something in kernel-mode with regard to this other process. The swapper might be stealing the very page that you're intending to modify, and so on and on and on.
The crashes you're seeing are clearly indicative of "timing holes," and that means endless head-banging and code that simply will never be reliable.
Sorry for the late reply, I have been busy with other stuff.
I partially figured out what the problem was. My code now works in some way when using 32-bit kernel, but it does not work when using 64-bit kernels. I am guessing there are different assumptions in different parts of the kernel code when it comes to addressing; some parts use the old addressing from 32-bit kernels and other functions use the new relative addressing developed in 64-bit. This eventually confuses get_user_pages() into fetching a page other than the one you need. And when I modify the page, I was actually modifying a page that is possibly not used by the process. And that is why I was seeing that my code has no effect.
When I used 32-bit kernel (v3.2) I saw my code working in some way. However, still not exactly, the way I want it to work. The new problem is when I copy the whole page from one process to another the target process (p2) crashes. however, if I copy the value of only one variable it works.
To elaborate more:
assume I want to copy the data segment which starts at address a1 from process p1 to process p2. in that data segment there is a variable (of type e.g. unsigned long) which is located at address va2. Assume also that the data segment size is 1000 bytes. So, we have a1 < va2 < a1+1000.
Then I "kmap()"ed the pages that contain data segments of both processes p1 and p2 (assume addresses returned by kmap() are k_a1 and k_a2).
Now, if I memcpy(k_a2, k_a1, 1000) then p2 crashes.
But if memcpy(k_a2 + (va2-a1), k_a1 + (va2-a1), sizeof(unsigned long)), the code works. i.e. I see that the unsigned long variable in p2 contains the value of its counterpart from p1.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.