LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Virtualization and Cloud (https://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/)
-   -   How does QEMU shuts down a Guest OS. (https://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/how-does-qemu-shuts-down-a-guest-os-884403/)

alphy 06-03-2011 03:53 PM

How does QEMU shuts down a Guest OS.
 
Hi,

I am trying to understand the internals of the shutdown procedure in a virtual machine.

The aim is to understand:
1) How does the QEMU process, shuts down the Guest OS gracefully.

Any pointers in this regard would be useful

Thanks,
Alphonse

TobiSGD 06-03-2011 04:07 PM

It doesn't. The OS shuts down itself, like it would do on a physical machine.

jefro 06-03-2011 04:37 PM

It might be more of the way you pose the question.

If you shutdown qemu it would not allow the guest os time to shutdown properly. Depending on the VM you may have choices as to how that process happens also. Some allow you to choose. One choice may be to save state. One next start it resumes from where you left it. One one choice it is a hard power down. Other choices may exist too.

If you shutdown the guest from the normal OS shutdown then the end would be that qemu shuts down.

alphy 06-03-2011 07:17 PM

Ok. Let me rephrase my question to be specific.

* Assume that I do not have control over Guest OS, but still I would like to signal the OS to
shutdown gracefully. Via QEMU-Monitor, I am able to issue a shutdown command to the VM.

I would like to understand the execution steps in this case.

Thanks for the information.

-Alphonse

jefro 06-03-2011 08:17 PM

Then I think this is the correct answer. If you shutdown qemu it would not allow the guest os time to shutdown properly. UNLESS this works but you'd have to test it.


"system_powerdown

This has an effect similar to the physical power button on a modern PC. The VM will get an ACPI shutdown request and usually shutdown cleanly."

If you had access to the qemu monitor you may be able to save or stop or take a snapshot. See this page for the commands in monitor that may help. (assumes some level and compile and arch too)

http://en.wikibooks.org/wiki/QEMU/Monitor

In no case that I know of can qemu send a shutdown command to the client os. ssh, vnc, telnet or such is used if acpi issue.

dyasny 06-04-2011 06:15 AM

in very simple terms, QEMU sends the VM an ACPI shutdown command, same as what would happen when you click the power button on a physical machine. If that fails, you can issue the "destroy" command which will act as if you've pulled the plug on the VM

---------- Post added 06-04-11 at 11:16 AM ----------

note that for this to work, the VM has to be running with ACPI enabled and using an ACPI enabled HAL

Skaperen 06-06-2011 10:22 AM

Quote:

Originally Posted by dyasny (Post 4376017)
in very simple terms, QEMU sends the VM an ACPI shutdown command, same as what would happen when you click the power button on a physical machine. If that fails, you can issue the "destroy" command which will act as if you've pulled the plug on the VM

---------- Post added 06-04-11 at 11:16 AM ----------

note that for this to work, the VM has to be running with ACPI enabled and using an ACPI enabled HAL

Is there a signal that could be sent in lieu of a command, to QEMU's process, so that rc scripts shutting the host down can get QEMU to send that power button ACPI signal to the guest OS?

dyasny 06-06-2011 12:14 PM

system_powerdown is the command to send to the VM's monitor of course, it has already been mentioned.

Skaperen 06-06-2011 01:11 PM

Quote:

Originally Posted by dyasny (Post 4377974)
system_powerdown is the command to send to the VM's monitor of course, it has already been mentioned.

But for host system rc scripts, a signal is the usual means. Entering a command line is not readily done, as a script would have to hunt down the appropriate pty and take over. Normally rc scripts send a SIGTERM signal to processes to have them do a graceful end. QEMU just exits when it gets SIGTERM. Maybe some other signal will make it do the guest ACPI thing?

dyasny 06-07-2011 02:01 AM

you have it completely wrong.
Every VM running with QEMU maintains a socket, through which the host communicates with the QEMU process.

This socket is called qemu monitor, and can be accessed. When you access it, you do not access the VM itself, but the actual process that keeps it running, so when you issue "system_powerdown" to that process, it will emulate an ACPI shutdown in the guest space.

If the VM itself supports ACPI, it will detect the ACPI shutdown request, and that will trigger the normal init scripts and the rest of the typical sequence.

Skaperen 06-07-2011 02:00 PM

Quote:

Originally Posted by dyasny (Post 4378493)
you have it completely wrong.
Every VM running with QEMU maintains a socket, through which the host communicates with the QEMU process.

I don't see anything in the documentation where a socket can be accessed to send the emulator a monitor command. Doing "lsof" on a running qemu emulator shows 2 pipes and 2 unit domain sockets without names (hence, there is no target to make a connection to).

Quote:

Originally Posted by dyasny (Post 4378493)
This socket is called qemu monitor, and can be accessed. When you access it, you do not access the VM itself, but the actual process that keeps it running, so when you issue "system_powerdown" to that process, it will emulate an ACPI shutdown in the guest space.

There is a monitor already in the manual command line session. Is that what you are referring to? That's where I can type in "system_powerdown" and watch the guest OS either do the right thing or not (depending on whose opinion of the right thing we are referring to).

jefro 06-07-2011 03:02 PM

I think it would boil down to a proper test on a test platform to be sure. ACPI has failed me before on qemu.

dyasny 06-08-2011 05:39 AM

Quote:

Originally Posted by Skaperen (Post 4379049)
There is a monitor already in the manual command line session. Is that what you are referring to? That's where I can type in "system_powerdown" and watch the guest OS either do the right thing or not (depending on whose opinion of the right thing we are referring to).

that's the monitor exactly. what happens when it doesn't work for you?

Skaperen 06-08-2011 09:09 AM

Quote:

Originally Posted by dyasny (Post 4379678)
that's the monitor exactly. what happens when it doesn't work for you?

Desktop systems put up a prompt asking to shutdown, just like selecting the shutdown menu item. There's usually a 60 second wait. That means I'd have to be sure the host has an extra 60 seconds shutdown process time before doing an aggressive process kill.

Server systems work in some cases (Slackware) and not at all in other (Ubuntu server just stays up).

The real issue is getting that signal to the QEMU process at the time init is sending SIGTERM to other processes. If QEMU were to handle SIGTERM by doing the power button emulation, the issue would be gone. If it could make made to handle it on another signal like SIGHUP, that would be easy to work around. But, getting init to start a script that will hunt down the QEMU monitor sockets to feed in the command line, I have no clue where to start with that.

Slax-Dude 06-08-2011 09:12 AM

@Skaperen

I have a script, which is called from rc.local_shutdown, on my slackware host which will issue a ACPI shutdown command to all running guests, then wait for 60 secs for them to cleanly shutdown.
If all the guests cleanly shutdown before the timeout, the script will exit and the host will continue shutting down, if not, it will force shutdown to all running guests and the host will continue shutting down.

I use libvirt, one of the available management tools you are not inspired to use ;)


Code:

#!/bin/bash

# number of seconds to wait for VMs to shutdown, before killing them
MAX_TIMEOUT=60

RUNNING_VM_LIST=$(/usr/sbin/virsh list|grep running|cut -d " " -f 4-4)

for RUNNING_VM in $RUNNING_VM_LIST; do
  /usr/sbin/virsh shutdown $RUNNING_VM
done

TIMEOUT_COUNTER=0
while true; do
  RUNNING_VM_LIST=$(/usr/sbin/virsh list|grep running|cut -d " " -f 4-4)
  if [ -z "$RUNNING_VM_LIST" ]; then
    exit 0
  else
    if [ $TIMEOUT_COUNTER -lt $MAX_TIMEOUT ]; then
      sleep 1
      TIMEOUT_COUNTER=$[$TIMEOUT_COUNTER+1]
    else
      for RUNNING_VM in $RUNNING_VM_LIST; do
        /usr/sbin/virsh destroy RUNNING_VM
      done
    fi
  fi
done


Skaperen 06-08-2011 12:45 PM

The command "virsh list" doesn't list any of my virtual machines. I have one running now but it doesn't list it.

dyasny 06-08-2011 02:14 PM

Quote:

Originally Posted by Skaperen (Post 4379910)
Desktop systems put up a prompt asking to shutdown

This is what we've started with! I'm no desktop expert, but I'm pretty sure this prompt can be handled, timeout set to 0 or disabled completely.


Quote:

The real issue is getting that signal to the QEMU process at the time init is sending SIGTERM to other processes.
so what you want, is to automate VM shutdown when the host goes down? I'd start by looking at K$(num)libvirt in /etc/rcX.d

---------- Post added 06-08-11 at 07:14 PM ----------

Quote:

Originally Posted by Skaperen (Post 4380129)
The command "virsh list" doesn't list any of my virtual machines. I have one running now but it doesn't list it.

were they started with libvirt or manually?

Skaperen 06-08-2011 02:25 PM

Quote:

Originally Posted by dyasny (Post 4380213)
This is what we've started with! I'm no desktop expert, but I'm pretty sure this prompt can be handled, timeout set to 0 or disabled completely.



so what you want, is to automate VM shutdown when the host goes down? I'd start by looking at K$(num)libvirt in /etc/rcX.d

---------- Post added 06-08-11 at 07:14 PM ----------



were they started with libvirt or manually?

They were started by a script that figures out which instance is needed for the current batch request, and other things like where to get and place the image files, and such. Using virsh to start them isn't an option for multiple reasons, like its requirement for root (in many cases using virtual machines was a direction meant to avoid running as root).

Hey, I have an idea. Why not just add some code to each VM engine to allow specifying that a power down will be emulated on a signal? By default not requesting the feature would keep the previous behavior (so no change to those not expecting any). The default signal would be SIGTERM so it works consistently with existing init shutdown processes. This should be quite simple to do (details vary by engine internals).

dyasny 06-08-2011 02:56 PM

Quote:

Originally Posted by Skaperen (Post 4380225)
They were started by a script that

explains why virsh list doesn't see them. libvirt serves as a wrapper for kvm-qemu after all.

Quote:

Hey, I have an idea. Why not just add some code to each VM engine to allow specifying that a power down will be emulated on a signal?
How do you see this happening? When you run the VM in a wrapper, like you would with libvirt or vdsm, you have an API command to send an ACPI shutdown, and the API also provides a way to send monitor commands, in case you need to. Without a wrapper, you're left with only the monitor to interface with _directly_ and nothing else. and since qemu is a process, it should react to sigterm and sigkill like any other process would.
You could add extra behaviour for the qemu process to react to a sigterm with an acpi shutdown in the guest, but right now this is not the case, and I'm not sure the qemu devs will set such a feature at a high priority, considering the large amount of wrappers that already do this anyway. doesn't mean you can't ask for it, or write it yourself of course

Skaperen 06-09-2011 07:52 AM

Quote:

Originally Posted by dyasny (Post 4380253)
explains why virsh list doesn't see them. libvirt serves as a wrapper for kvm-qemu after all.

But someone said this, earlier:

Quote:

Originally Posted by dyasny (Post 4378493)
you have it completely wrong.
Every VM running with QEMU maintains a socket, through which the host communicates with the QEMU process.

That sure didn't sound like a wrapper. It sounded like someone thought QEMU created a socket to which I could connect and send a message to QEMU. Ironically, that might have been workable if it existed. Of course, a script would have to know how to find all of these sockets wherever they are placed.

Another alternative would have been a message bus feature programs can connect to so they can find out when the system goes down (or other important events). Oh, wait, the kernel people already did a simpler form of this long long ago. I think they call it signals.

Quote:

Originally Posted by dyasny (Post 4380253)
How do you see this happening? When you run the VM in a wrapper, like you would with libvirt or vdsm, you have an API command to send an ACPI shutdown, and the API also provides a way to send monitor commands, in case you need to. Without a wrapper, you're left with only the monitor to interface with _directly_ and nothing else. and since qemu is a process, it should react to sigterm and sigkill like any other process would.

It would normally be expected for a process to do a graceful shutdown for SIGTERM. Of course, it can't for SIGKILL. This seems to be a difference of opinion about what is meant by "graceful". I think it should mean to carry out steps to at least try to let other running components in the process shut down gracefully, too.
Quote:

Originally Posted by dyasny (Post 4380253)
You could add extra behaviour for the qemu process to react to a sigterm with an acpi shutdown in the guest, but right now this is not the case, and I'm not sure the qemu devs will set such a feature at a high priority, considering the large amount of wrappers that already do this anyway. doesn't mean you can't ask for it, or write it yourself of course

I really don't know why it didn't have it. I'm only now wanting to have that ability. But I can't use a wrapper that requires root. Why isn't virsh itself installed as suid root so it can get that root permission with code that properly manages the original user's access rights?

Does vdsm also require root access? If so, that's a show stopper for it, too. Remember, an important cause for using virtual machines is for users that specifically do not have root access. Of course, some programs do run as root for non-root users. But those programs know the user context they are running under and manage the access rights correctly (supposedly). They get installed with the SUID bit on, owned by root. Can virsh/libvirt and/or vdsm handle the context correctly so they are not security breakage vectors? If they could, there would be no reason to not install it "suid root". But it isn't, so I suspect it isn't coded with that grade of security, and thus limits itself to just the actual root user (fine for the shutdown scripts, but not fine for normal running of the VM).

The really sad part of this is that handling SIGTERM so it also does the power down emulation would not be that hard. If the function that carries that out can't actually be called directly in the signal handler context, then the signal handler can set a flag that the main tasks in the process can check on periodically.

dyasny 06-09-2011 11:20 AM

[QUOTE]
Quote:

Originally Posted by Skaperen (Post 4380891)
But someone said this, earlier:

That sure didn't sound like a wrapper. It sounded like someone thought QEMU created a socket to which I could connect and send a message to QEMU. Ironically, that might have been workable if it existed. Of course, a script would have to know how to find all of these sockets wherever they are placed.

right, well, that was my mistake, and without a wrapper to do this - no easy to access socket. moreover - no accounted for and maintained socket.

Quote:

Another alternative would have been a message bus feature programs can connect to so they can find out when the system goes down (or other important events). Oh, wait, the kernel people already did a simpler form of this long long ago. I think they call it signals.
you really think virtualization is the same as running a simple forked process?

Quote:

It would normally be expected for a process to do a graceful shutdown for SIGTERM. Of course, it can't for SIGKILL. This seems to be a difference of opinion about what is meant by "graceful". I think it should mean to carry out steps to at least try to let other running components in the process shut down gracefully, too.
qemu has no way of being aware of the processes running in the guest OS and vice versa. Using the hypercall, with drivers and agents can be done, but hey, if you need it so much - why don't you write it? sure would be nicer than complaining.

Quote:

I really don't know why it didn't have it. I'm only now wanting to have that ability. But I can't use a wrapper that requires root. Why isn't virsh itself installed as suid root so it can get that root permission with code that properly manages the original user's access rights?
again, this is open source, what is stopping you from patching a package that doesn't work the way you want it to?

Quote:

Does vdsm also require root access? If so, that's a show stopper for it, too.
no it doesn't.

Quote:

Remember, an important cause for using virtual machines is for users that specifically do not have root access.
those users can be provided a simple start/stop functionality, without access to libvirt. libvirt provides it's own API, remember?


Quote:

Of course, some programs do run as root for non-root users. But those programs know the user context they are running under and manage the access rights correctly (supposedly). They get installed with the SUID bit on, owned by root. Can virsh/libvirt and/or vdsm handle the context correctly so they are not security breakage vectors? If they could, there would be no reason to not install it "suid root". But it isn't, so I suspect it isn't coded with that grade of security, and thus limits itself to just the actual root user (fine for the shutdown scripts, but not fine for normal running of the VM).
actually, there's an entire part of selinux that deals in virtualization and libvirt specifically. if you're that concerned with security - http://selinuxproject.org/page/SVirt

Quote:

The really sad part of this is that handling SIGTERM so it also does the power down emulation would not be that hard. If the function that carries that out can't actually be called directly in the signal handler context, then the signal handler can set a flag that the main tasks in the process can check on periodically.
I think you're really confused about what is happening inside qemu. When you call a sigterm on a normal process, that's one flow. calling a sigterm on qemu (or qemu-kvm) will try to erform a sigterm on the process itself. This process is not one of the processes running inside the guest, but the process that emulates the CPU of the guest. and this process, when accessed from the host side, cannot provide access to the entire structure of processes running in the guest OS and using this virtual CPU, which from the host side, looks like a process. this is why a sigkill sent to qemu's process is roughly equivalent to pulling the plug on the VM.

qemu performs a graceful shutdown by emulating an acpi shutdown request sent to it's virtual BIOS. Whether ACPI shutdowns are respected by the guest OS is entirely up to the guest OS.

Skaperen 06-09-2011 01:13 PM

Quote:

Originally Posted by dyasny (Post 4381108)
you really think virtualization is the same as running a simple forked process?

In what context are you asking this? No two processes are the same. And if they are different executables running, they are very different.

Quote:

Originally Posted by dyasny (Post 4381108)
qemu has no way of being aware of the processes running in the guest OS and vice versa. Using the hypercall, with drivers and agents can be done, but hey, if you need it so much - why don't you write it? sure would be nicer than complaining.

It doesn't need to be aware of what the guest OS does. It only need to present the hardware interface in some context. The suggested context in this case is to emulate the ACPI action when shutting down.

It's far easier for the original developers to add this than for someone like me, who, despite nearly 3 decades experience with C and nearly 4 decades experience programming, knows nothing about the internal organization of qemu. The time cost of adding that in could be very high for me because of the need to spend lots of time reading the source code to come up to speed. It would be lower for them because they already should have some idea about how qemu's execution flow works, its contexts, and basic organization.

But, this is just a low level thought at the moment, triggered to talking about it because of this thread. If the need for this escalates enough, it might be worth doing it.

Quote:

Originally Posted by dyasny (Post 4381108)
again, this is open source, what is stopping you from patching a package that doesn't work the way you want it to?

See above.


Quote:

Originally Posted by dyasny (Post 4381108)
no it doesn't.

So I should look at vdsm instead of virsh/libvirt? No mention of "vdsm" in any Ubuntu package description. An independent project somewhere?

Quote:

Originally Posted by dyasny (Post 4381108)
those users can be provided a simple start/stop functionality, without access to libvirt. libvirt provides it's own API, remember?

So it's really not a ready-to-go turn-key tool for managing virtualization.

Quote:

Originally Posted by dyasny (Post 4381108)
actually, there's an entire part of selinux that deals in virtualization and libvirt specifically. if you're that concerned with security - http://selinuxproject.org/page/SVirt

So we need yet another tool to manage security for something that was previously reasonably secure, but became broken by a wrapper for which the only need I have is something that could have been done by handling a signal. How many more wrappers are going to be added to make this nice?

Quote:

Originally Posted by dyasny (Post 4381108)
I think you're really confused about what is happening inside qemu. When you call a sigterm on a normal process, that's one flow. calling a sigterm on qemu (or qemu-kvm) will try to erform a sigterm on the process itself. This process is not one of the processes running inside the guest, but the process that emulates the CPU of the guest. and this process, when accessed from the host side, cannot provide access to the entire structure of processes running in the guest OS and using this virtual CPU, which from the host side, looks like a process. this is why a sigkill sent to qemu's process is roughly equivalent to pulling the plug on the VM.

Have you even been reading what I suggested? I am not suggesting that the SIGTERM be passed to the inner processes of the guest OS. I am ONLY suggesting that when the qemu process itself receives a SIGTERM from the hosting OS, that qemu simply emulate what some real hardware would do for the case of pressing a soft power off button. In the case of modern PCs, that would be some kind of ACPI message/status (I don't know the specific details there, nor do I need to if I'm not the one to code this) indicating the shutdown/poweroff request. As to what the guest OS does, it shall do whatever it is designed to do for that kind of power off scenario. It might be a Unix/Linux/BSD system which will trigger something to start it's own shutdown sequence, which in turn sends its own SIGTERM signals to processes inside the guest OS. Or it might be some new fangled OS that has no concept of SIGTERM or of processes or whatever. But if such an OS is designed to run on a PC, it may be designed to understand ACPI. It can do with it whatever makes sense for it (and some form of graceful shutdown makes sense). If the OS doesn't do it, then it's an issue with that OS.

Again, I am not suggesting, and never have suggested, that the host OS somehow be able to send SIGTERM to the guest OS's processes (if even there are any).

Quote:

Originally Posted by dyasny (Post 4381108)
qemu performs a graceful shutdown by emulating an acpi shutdown request sent to it's virtual BIOS. Whether ACPI shutdowns are respected by the guest OS is entirely up to the guest OS.

But it does this only by a command to the monitor (yes, I have done that). I am suggesting that receiving a SIGTERM from the host OS it runs under also be another means to do the very same thing, no more, no less. At present, it does not do this. It may be doing some things like flushing buffers, but this isn't close to being graceful.

dyasny 06-10-2011 03:08 AM

ok, it seems to me like you have two issues here:
1. a missing feature to sends an acpi shutdown when a sigterm is received - not sure what priority such a request would receive
2. libvirts' security not being what you expected

two different projects, two bugzillas to open.

Skaperen 06-10-2011 08:53 AM

Quote:

Originally Posted by dyasny (Post 4381638)
ok, it seems to me like you have two issues here:
1. a missing feature to sends an acpi shutdown when a sigterm is received - not sure what priority such a request would receive
2. libvirts' security not being what you expected

two different projects, two bugzillas to open.

I would not consider either of these to be bugs. They are features. A feature request would seem to me to be more appropriate.

dyasny 06-11-2011 08:02 AM

afaik feature requests are normally filed in bugzilla with the appropriate "feature request" flags :)

Skaperen 06-13-2011 07:30 AM

Quote:

Originally Posted by dyasny (Post 4382627)
afaik feature requests are normally filed in bugzilla with the appropriate "feature request" flags :)

For the projects that ask for their feature requests in a bugzilla somewhere they operate in, that makes sense. Elsewhere, email to the developer mailing list if it is open, or to the developers directly, or just make a formal mention of it a forum.


All times are GMT -5. The time now is 09:39 PM.