OpenSuse 12.1 boot problem

warriorjames · 09-03-2012, 08:17 PM

Feel free to move this to the necessary board, cuz I have no idea where this would go.

Just a note, this happened a few months ago.

I found something pertaining to a piece of this in the Linux server section, but I'm dealing with openSUSE (12.1 to be exact). Fortunately, I took a picture of the screen.

The line "[drm:atom_get_src_int] *ERROR* ATOM: fb read beyond scratch region: 1245188 vs. 16384" shows up about 10 times. What that is, according to Robertjinx in the "Memory Leaks in CentOS 6.2" thread, is a video driver error, and to just ignore it.

Seemed simple, but then it's followed by:
-------------=---------------

Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
resume: libgcrypt version: 1.5.0
Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Waiting for device /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 to appear: ok
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a -C0 /dev/sdd2
/dev/sdd2: clean, 140170/1313280 files, 1372526/5242880 blocks
fsck succeeded. Mounting root device read-write.
Mounting root /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2
mount -o rw,acl,user_xattr -t ex4 /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 /root
[7.866522] k19temp 000:00:18.3: unreliable CPU thermal sensor: monitoring disabled
[8.107836] SP5100 TCO timer: mmio address 0xfec000f0 already in use
systemd-fsck[825]: /dev/sdd3: clean, 52335/13828096 files, 8921639/55280304 blocks
Welcome to emergency mode. Use "systemctl default" or ^D to activate default mode.
Give root password for login:
--------------=-------------------

The Samsung is the hard drive Suse is on.

I know I created a RAID0 (striped) on Windows around that time, but when I went into OpenSUSE 12.1 about 3 to 4 times after the RAID was created it had no problem. From maybe the 5th day to the present, it now gives me that message. I really can't remember if it's connected to something like updates (since, while I've been trying to figure this out for at least 3 months, there have been other things grabbing my attention), but I do know that I hadn't installed anything at that time.

I also know I can't get the GUI (Init5). It just loops me back to that screen.

Any idea what's going on? Should I just wait for 12.2 to come out, pull a clean install and hope for the best?

salasi · 09-05-2012, 03:52 PM

Quote:

Originally Posted by warriorjames

Any idea what's going on? Should I just wait for 12.2 to come out, pull a clean install and hope for the best?

Well, 12.2 is out now. OTOH, understanding things is good, and there is always a danger that 12.2 just offers you an upgraded version of the same problem...

Quote:

Originally Posted by warriorjames

The line "[drm:atom_get_src_int] *ERROR* ATOM: fb read beyond scratch region: 1245188 vs. 16384" shows up about 10 times. What that is, according to Robertjinx in the "Memory Leaks in CentOS 6.2" thread, is a video driver error, and to just ignore it.

What I understand of this is that 'fb' is likely to be frame buffer, and that agrees with possibility of a video error of some kind...

Quote:

Originally Posted by warriorjames

Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
resume: libgcrypt version: 1.5.0
Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Waiting for device /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 to appear: ok
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a -C0 /dev/sdd2
/dev/sdd2: clean, 140170/1313280 files, 1372526/5242880 blocks
fsck succeeded. Mounting root device read-write.
Mounting root /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2
mount -o rw,acl,user_xattr -t ex4 /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 /root

So, there is a resume (the other half of a suspend 'n resume) being attempted, something is (probably) being detected as wrong with the disk, a fsck (the check part of file system check in this case actually being fix rather than check) is being run, it is successful and it proceeds on to mount the disk partition. There is a question about why exactly something is being detected as wrong with the disk (repeatedly, if it is repeatedly), but it is getting fixed, so this isn't stopping you from booting.

Quote:

Originally Posted by warriorjames

[7.866522] k19temp 000:00:18.3: unreliable CPU thermal sensor: monitoring disabled

Odd, but probably irrelevant. The CPU temp sensor is somehow detected as unreliable, possibly from your CPU version and revision. Doesn't seem to have anything to do with the matter at hand.

Quote:

Originally Posted by warriorjames

[8.107836] SP5100 TCO timer: mmio address 0xfec000f0 already in use

Pass.

Quote:

Originally Posted by warriorjames

systemd-fsck[825]: /dev/sdd3: clean, 52335/13828096 files, 8921639/55280304 blocks
Welcome to emergency mode. Use "systemctl default" or ^D to activate default mode.
Give root password for login:

Now, this is more interesting. sdd3 is 'clean', so that shouldn't cause a problem, but the next thing that happens is that you go to emergency mode (so something is seriously wrong...usually the kind of something that you'd hope an fsck would fix, but that was tried 'automagically' and the fix didn't work. Previously sdd2 was fsck'ed, and that seems to have been fine, but there was no similar message about sdd3 being fine.

Is it sdd3 or a subsequent partition that is causing the problem? Pass.

Presumably, you did try to log in as root, as per the prompts. What happened?

Quote:

Originally Posted by warriorjames

The Samsung is the hard drive Suse is on.

I know I created a RAID0 (striped) on Windows around that time, but when I went into OpenSUSE 12.1 about 3 to 4 times after the RAID was created it had no problem. From maybe the 5th day to the present, it now gives me that message. I really can't remember if it's connected to something like updates (since, while I've been trying to figure this out for at least 3 months, there have been other things grabbing my attention), but I do know that I hadn't installed anything at that time.

There is a 1 disk Raid 0 array? Or are there other disks that you haven't mentioned?

It would be useful to know which partitions are used for what purpose, as that may well help to make things clearer. But if I had to make a WAG at this stage, I'd guess that one OS has touched something belonging to the other OS and subsequent attempts to boot are going 'Something unexpected/bad has happened to some of my data. Got to do something about that.'

You haven't tried to do something 'too clever by half' by trying to share a swap partition and then tried to interfere with the boot order (eg, suspend one OS and then try to boot the other) have you?

Quote:

Originally Posted by warriorjames

I also know I can't get the GUI (Init5). It just loops me back to that screen....Any idea what's going on?

If you haven't managed to get the partitions mounted (due to not getting the second set of 'use fsck' errors fixed (the sdd3 ones) then that is probably to be expected (well, depending on what sdd3 provides...assuming that it provides something that you need, then it is to be expected, anyway).

warriorjames · 09-06-2012, 01:54 PM

Quote: