WARNING! When deleting or moving swap partition!

IsaacKuo · 05-18-2007, 12:22 PM

What misdirection? In the very start of the very first post, I stated that I was doing something I'd done many times with Sarge without incident--but it's no longer safe to do in Etch.

Is it really a "user error" if it's something which has worked fine many times in the past, and there's no particular reason to expect that it shouldn't work?

Would you still consider it a "user error" if the hardware absolutely identical except for the on-board NIC's MAC address? I purposefully bought several identical motherboards so that I could swap motherboards on my server in case of hardware damage. A good plan, I thought, but in Etch swapping the motherboard means breaking the on board NIC. I figured out a way to manually repair it afterwards...well, mostly. Name resolution is still broken.

2damncommon · 05-18-2007, 09:49 PM

Okay, you're making me laugh now. You can't be serious.

Quote:

Originally Posted by IsaacKuo

What misdirection? In the very start of the very first post, I stated that I was doing something I'd done many times with Sarge without incident--but it's no longer safe to do in Etch.

Is it really a "user error" if it's something which has worked fine many times in the past, and there's no particular reason to expect that it shouldn't work?

When your original post states:

Quote:

Originally Posted by IsaacKuo

I was attempting to migrate an OS partition from a drive with a swap partition in hda2 to a file server drive with existing data. I thought I was being quite clever--first install/configure the OS on a different drive, and then copy over the OS partition to minimize downtime. Unfortunately, I put in a swap partition in hda2, whereas the file server drives have swap in hdb1.

As a result, when I copied the OS over the copy couldn't boot!

Am I really misreading this and the swap partition was not misconfigured by the user? I am understanding you copied an operating system that is configured to assume the swap partition is on hda2 when it was actually on hdb1 on the target computer. My understanding is that is what caused your problem. It is also why I fail to understand why Etch is responsible for this.
Are you stating that you always install misconfigured operating systems and have never had a problem until now?
And you just continue to list other subjects altogether asking if I consider them to be user error never once addressing what I am suggesting is?

IsaacKuo · 05-19-2007, 07:08 AM

Quote:

Originally Posted by 2damncommon

I am understanding you copied an operating system that is configured to assume the swap partition is on hda2 when it was actually on hdb1 on the target computer. My understanding is that is what caused your problem. It is also why I fail to understand why Etch is responsible for this.

Okay, I see what your problem is. You insist upon assigning blame somewhere, whereas to me it's simply a question of explaining a problem and achieving a solution.

I actually do consider it to be a serious bug that a Debian operating system is unable to boot up in single user mode if something happens to the swap partition--for example, the swap could be on a second hard drive which fails. But the developers of swsusp or initrd simply may not care about this. Obviously, you don't care.

But assigning blame really wasn't my point. My whole point was simply to warn people of what to do if you want to move or delete swap. I don't care whether it's the user's fault or the swsusp developers's fault or the initrd developers's fault or if it's no one's fault.

The basic fact about free (as in beer) software is that I'm not paying for this software, so the developer is under no obligation whatsoever to supply me with what I want or need. The developers supply whatever they feel like, and I use it at my own risk. So ultimately, EVERY problem in free software is a "user error". Essentially, it's my fault that I decided to use non-commercial software. See, assigning blame isn't helpful!

No matter how much you deride me or insult me or laugh at me for having a problem, I'm the one who actually went and solved the problem, and posted how to solve it. Jlinkels is the only one who was being helpful, suggesting a different possible solution.

2damncommon · 05-19-2007, 10:53 AM

Quote:

Originally Posted by IsaacKuo

I actually do consider it to be a serious bug that a Debian operating system is unable to boot up in single user mode if something happens to the swap partition--for example, the swap could be on a second hard drive which fails. But the developers of swsusp or initrd simply may not care about this. Obviously, you don't care.

Okay, consider that instead of misconfigured swap the boot or root partition is misconfigured. Would it boot? Would single user work?

GrueMaster · 05-19-2007, 11:30 AM

I kind of see what Isaac is getting at. While his usage model isn't the norm, if the swap partition were on a drive that was failing, how is the user/administrator to reboot into a single user mode to do some recovery operations?

I know with Mandriva and Suse, you could change the boot options for the kernel to disable suspend/resume image checking and boot to single user mode, but what are the options for this if the config is embedded in initrd? Are there boot override parameters?

This may be something as trivial as a documentation fix (for the parameters, not the usage model).

makuyl · 05-19-2007, 11:31 AM

Just out of interest, did you try the "noresume" boot option?

jlinkels · 05-19-2007, 11:40 AM

We are talking here about misconfigured. Say that the boot partition appears in /dev/hdb1 instead of /dev/hda1. I darn well would expect that I am able to boot Linux in one way or another.

Remember, the Windows philosphy is: make it easy to install, if it breaks: reinstall. (In 7 years Windows sysadmin (Win2k and before) I have never been able to repair an unbootable system)

The Linux policy always was: maybe it is a bit harder to install, but once it is installed, if it breaks: work around it and fix it.

This distinguishes Linux from Windows, and it is a reason to run Linux.

The swap partition is NOT crucial for proper booting. You can work around it. Think what happens if your computer hangs during hibernating, or you have a power failure. You might end up very well with a corrupted image. Would the penalty be for that that you have to reinstall Linux?

Please, distinguish two different issues in this post:
(1) Is it wise to transfer an OS the way the OP did?
(2) Should Linux boot if the hibernate image is corrupted or otherwise unavailable?

For (1) I don't have an opinion. For (2) I want to state a very strong YES. There should always be a boot mode which boots with the least available installation. Resuming is an addition, not a primary requirement.

I haven't tested it yet, but if it is true that booting is impossible with a damaged swap partition, I appreciate the BIG FAT WARNING by the OP. I only can recommend to recompile the kernel without initrd and nuke any hibernation options. You should have a choice to boot or to resume. In Linux you ought to have choices.

jlinkels

IsaacKuo · 05-19-2007, 02:12 PM

Quote:

Originally Posted by makuyl

Just out of interest, did you try the "noresume" boot option?

That's a good thought, and no I didn't. I'll do a clean Etch install to test out if it works.

I wasn't even aware of the resume options on GRUB at the time I had these problems. I only fiddled with them after jlinkels suggested it. But I was fiddling with a system I had already fixed with my kludgy solution (pre-emptively zapping the resume feature out of initrd.img).

With a clean Etch install, I'll be able to see if your suggestion works. If so, then it's a better solution for someone else who has gotten into this situation. My kludge only really works if applied beforehand.

NoPane · 05-20-2007, 11:05 AM

Without entering the heated debate here, I'd simply like to thank Isaac for starting this thread and solving my problem! I've just spent the day building my ideal desktop sytem - and I tend to physically remove the drives with other data (just in case!). As I use XOSL as my boot manager, it happily switches drives and hides un-needed partitions. It's worked perfectly before for many years, and now this change has stopped it.

It's not a problem, it's progress. But again, thanks Isaac!

2damncommon · 05-20-2007, 03:36 PM

Quote:

Originally Posted by NoPane

Without entering the heated debate here,..

I have been re-reading posts to see why my original comment resulted in this and I find I must ask IsaacKuo a question.
As I have posted before, when I read:

Quote:

I was attempting to migrate an OS partition from a drive with a swap partition in hda2 to a file server drive with existing data. I thought I was being quite clever--first install/configure the OS on a different drive, and then copy over the OS partition to minimize downtime. Unfortunately, I put in a swap partition in hda2, whereas the file server drives have swap in hdb1.

As a result, when I copied the OS over the copy couldn't boot!

I assumed the OS was still configured for the hda2 swap when copied and started in the computer set up with the hdb1 swap. I now realize since this is not really stated one way or another I must ask. This is why I pursued my "user error" comments.

The dire warnings strike me the wrong way because there is in all cases a primary cause. User configuration error, failing hard drive, and non-standard operations such as copying partitions are all primary causes. Dealing with the repercussions of those actions are secondary. I.E. blaming software for user error, failing hard disks, or non-standard operations is misplaced if it did not really cause it.
A couple of good comments have already been made:

Quote:

Originally Posted by jlinkels

Ah, I see. I was not aware of that swsusp2 is now a part of the kernel. I am running Debian Lenny, but I compiled my own kernel instead of installing the one which came with Debian.

Quote:

Originally Posted by makuyl

Just out of interest, did you try the "noresume" boot option?

I believe these show that the software maintainers have not ignored this sort of issue altogether.
I would find further discussion of the actual feature at issue interesting. When would you need or not need it? What would you leave out of a custom kernel.? What user configuration can be useful?

IsaacKuo · 05-20-2007, 06:51 PM

Quote:

Originally Posted by 2damncommon

I assumed the OS was still configured for the hda2 swap when copied and started in the computer set up with the hdb1 swap. I now realize since this is not really stated one way or another I must ask.

As far as I knew it was no longer set up for hda2 swap. I had commented out the swap partition from /etc/fstab, which in the past was enough. I didn't know anything else which would need to be changed.

But obviously part of the OS was still configured for hda2 swap!

Quote:

The dire warnings strike me the wrong way because there is in all cases a primary cause. User configuration error, failing hard drive, and non-standard operations such as copying partitions are all primary causes. Dealing with the repercussions of those actions are secondary. I.E. blaming software for user error, failing hard disks, or non-standard operations is misplaced if it did not really cause it.

I fundamentally disagree. I think that dealing with the problem is primary, regardless of the cause. Playing the blame game is secondary, if it's relevant at all.

Quote:

A couple of good comments have already been made:

Originally Posted by jlinkels
Ah, I see. I was not aware of that swsusp2 is now a part of the kernel. I am running Debian Lenny, but I compiled my own kernel instead of installing the one which came with Debian.
Quote:
Originally Posted by makuyl
Just out of interest, did you try the "noresume" boot option?

I believe these show that the software maintainers have not ignored this sort of issue altogether.

How so? Neither of those comments have anything to do with what the software maintainers have done or not done. Jlinkels never encountered the problem because he compiles his own kernel. Makuyl had a good suggestion, but I don't see how this is relevant unless he's one of the software maintainers.

Quote:

I would find further discussion of the actual feature at issue interesting. When would you need or not need it? What would you leave out of a custom kernel.? What user configuration can be useful?

Suspend/resume features might not be useful for all Debian systems, but it should be included by default in the default "Desktop Workstation" or at least the "Laptop" suite. The "Desktop Workstation" suite is what newbies will use, and a newbie shouldn't have to hunt around and discover that he needs to install something extra to get suspend/resume functionality.

That said, it ought to be possible to boot up into single user mode even if something happens to the swap partition. I don't care what the cause of the swap failure is, there simply isn't any compelling technical reason for swap failure to prevent a basic bootup!

Maybe the solution is as simple as including an extra "noresume" option in the default single user mode GRUB entry. I can't think of a situation where you'd want to use the resume feature if you're booting into single user mode. And even if someone did want to resume with single user mode, it's easier to manually remove a "noresume" boot option than it is to take a wild guess at adding a "noresume" option.

IsaacKuo · 05-20-2007, 07:05 PM

Quote:

Originally Posted by NoPane

Without entering the heated debate here, I'd simply like to thank Isaac for starting this thread and solving my problem!

You're welcome, and thanks for the response! Hopefully, the "noresume" or "RESUME=" grub boot option would work, and it would be a better solution (or at least an alternative solution).

I've been using my spare workstation for some OS transfers and reconfiguration, so I haven't yet had the chance to experiment with a new clean Etch install. I can see it could be a few days before I get the chance, and this question will bug me in the back of my head until then...

jlinkels · 05-20-2007, 07:28 PM

I forgot the exact name for this option, but indeed I believe it is NORESUME.

Please bear in mind that when you are confronted with the GRUB boot menu, you can stop the process and enter any kernel parameter you like.

So if you try booting a system with the wrong parameters and you forgot to change menu.lst, you can always add/edit those while booting.

jlinkels

2damncommon · 05-20-2007, 08:06 PM

Quote:

Originally Posted by IsaacKuo

As far as I knew it was no longer set up for hda2 swap. I had commented out the swap partition from /etc/fstab, which in the past was enough. I didn't know anything else which would need to be changed.

But obviously part of the OS was still configured for hda2 swap!

Then I do need to say I was mistaken about user error being involved in your situation.
Did you just comment it out or change it to the correct partition on the target computer?
If needing the resume image is what caused the problem then I don't think it would have mattered if the swap partitions shared the same designation if the image was not there.
So now my question is, why don't I have any problems using a shared swap partition? Obviously anything there would be overwritten booting between OSs.
I haven't installed Etch yet. Will I be cursed when I do for my replies to this post?

IsaacKuo · 05-20-2007, 10:24 PM

Quote:

Originally Posted by 2damncommon

Then I do need to say I was mistaken about user error being involved in your situation.
Did you just comment it out or change it to the correct partition on the target computer?

I just commented it out. After making a transfer and confirming everything is working, then I'll add extra partition entries to /etc/fstab. While performing the transfer, I like to have /etc/fstab in a "lowest common denomenator" state that will work in both the source and destination.

Quote:

If needing the resume image is what caused the problem then I don't think it would have mattered if the swap partitions shared the same designation if the image was not there.

What I found was rather interesting. It actually did not matter what the contents of hda2 were, as long as it was a primary partition and it was big enough. Whether hda2 was a swap partition, or an ext2 partition, or a FAT32 partition...it didn't care. It would read from the partition (I assume on a block level) and decide there wasn't a valid resume image on it--and the rest of the boot process would continue happily.

One of my early fix attempts was to try and squeeze in a small hda2 partition between my OS hda1 and my (already stuffed full of data) hda5. But it just wasn't big enough. Swsusp still hung, with an error saying something about attempting to access an address beyond the device (the same error verbiage as when there was no hda2).

Quote:

So now my question is, why don't I have any problems using a shared swap partition? Obviously anything there would be overwritten booting between OSs.

Presumably, there is some sort of checksum involved so swsusp can tell the difference between a valid resume image and just some random swap contents. I figure if it has no problem reading and discarding the contents of a FAT32 partition, it'll have no problem reading and discarding of a swap partition leftover from another OS.

Quote:

I haven't installed Etch yet. Will I be cursed when I do for my replies to this post?

I don't think you'll have any problems.