LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   SUSE / openSUSE (https://www.linuxquestions.org/questions/suse-opensuse-60/)
-   -   OpenSuse 12.1 boot problem (https://www.linuxquestions.org/questions/suse-opensuse-60/opensuse-12-1-boot-problem-4175425499/)

warriorjames 09-03-2012 08:17 PM

OpenSuse 12.1 boot problem
 
Feel free to move this to the necessary board, cuz I have no idea where this would go.

Just a note, this happened a few months ago.

I found something pertaining to a piece of this in the Linux server section, but I'm dealing with openSUSE (12.1 to be exact). Fortunately, I took a picture of the screen.

The line "[drm:atom_get_src_int] *ERROR* ATOM: fb read beyond scratch region: 1245188 vs. 16384" shows up about 10 times. What that is, according to Robertjinx in the "Memory Leaks in CentOS 6.2" thread, is a video driver error, and to just ignore it.

Seemed simple, but then it's followed by:
-------------=---------------

Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
resume: libgcrypt version: 1.5.0
Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Waiting for device /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 to appear: ok
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a -C0 /dev/sdd2
/dev/sdd2: clean, 140170/1313280 files, 1372526/5242880 blocks
fsck succeeded. Mounting root device read-write.
Mounting root /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2
mount -o rw,acl,user_xattr -t ex4 /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 /root
[7.866522] k19temp 000:00:18.3: unreliable CPU thermal sensor: monitoring disabled
[8.107836] SP5100 TCO timer: mmio address 0xfec000f0 already in use
systemd-fsck[825]: /dev/sdd3: clean, 52335/13828096 files, 8921639/55280304 blocks
Welcome to emergency mode. Use "systemctl default" or ^D to activate default mode.
Give root password for login:
--------------=-------------------

The Samsung is the hard drive Suse is on.

I know I created a RAID0 (striped) on Windows around that time, but when I went into OpenSUSE 12.1 about 3 to 4 times after the RAID was created it had no problem. From maybe the 5th day to the present, it now gives me that message. I really can't remember if it's connected to something like updates (since, while I've been trying to figure this out for at least 3 months, there have been other things grabbing my attention), but I do know that I hadn't installed anything at that time.

I also know I can't get the GUI (Init5). It just loops me back to that screen.

Any idea what's going on? Should I just wait for 12.2 to come out, pull a clean install and hope for the best?

salasi 09-05-2012 03:52 PM

Quote:

Originally Posted by warriorjames (Post 4771880)

Any idea what's going on? Should I just wait for 12.2 to come out, pull a clean install and hope for the best?

Well, 12.2 is out now. OTOH, understanding things is good, and there is always a danger that 12.2 just offers you an upgraded version of the same problem...

Quote:

Originally Posted by warriorjames (Post 4771880)

The line "[drm:atom_get_src_int] *ERROR* ATOM: fb read beyond scratch region: 1245188 vs. 16384" shows up about 10 times. What that is, according to Robertjinx in the "Memory Leaks in CentOS 6.2" thread, is a video driver error, and to just ignore it.

What I understand of this is that 'fb' is likely to be frame buffer, and that agrees with possibility of a video error of some kind...


Quote:

Originally Posted by warriorjames (Post 4771880)
Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
resume: libgcrypt version: 1.5.0
Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Waiting for device /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 to appear: ok
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a -C0 /dev/sdd2
/dev/sdd2: clean, 140170/1313280 files, 1372526/5242880 blocks
fsck succeeded. Mounting root device read-write.
Mounting root /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2
mount -o rw,acl,user_xattr -t ex4 /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 /root

So, there is a resume (the other half of a suspend 'n resume) being attempted, something is (probably) being detected as wrong with the disk, a fsck (the check part of file system check in this case actually being fix rather than check) is being run, it is successful and it proceeds on to mount the disk partition. There is a question about why exactly something is being detected as wrong with the disk (repeatedly, if it is repeatedly), but it is getting fixed, so this isn't stopping you from booting.

Quote:

Originally Posted by warriorjames (Post 4771880)
[7.866522] k19temp 000:00:18.3: unreliable CPU thermal sensor: monitoring disabled

Odd, but probably irrelevant. The CPU temp sensor is somehow detected as unreliable, possibly from your CPU version and revision. Doesn't seem to have anything to do with the matter at hand.

Quote:

Originally Posted by warriorjames (Post 4771880)
[8.107836] SP5100 TCO timer: mmio address 0xfec000f0 already in use

Pass.

Quote:

Originally Posted by warriorjames (Post 4771880)
systemd-fsck[825]: /dev/sdd3: clean, 52335/13828096 files, 8921639/55280304 blocks
Welcome to emergency mode. Use "systemctl default" or ^D to activate default mode.
Give root password for login:

Now, this is more interesting. sdd3 is 'clean', so that shouldn't cause a problem, but the next thing that happens is that you go to emergency mode (so something is seriously wrong...usually the kind of something that you'd hope an fsck would fix, but that was tried 'automagically' and the fix didn't work. Previously sdd2 was fsck'ed, and that seems to have been fine, but there was no similar message about sdd3 being fine.

Is it sdd3 or a subsequent partition that is causing the problem? Pass.

Presumably, you did try to log in as root, as per the prompts. What happened?


Quote:

Originally Posted by warriorjames (Post 4771880)
The Samsung is the hard drive Suse is on.

I know I created a RAID0 (striped) on Windows around that time, but when I went into OpenSUSE 12.1 about 3 to 4 times after the RAID was created it had no problem. From maybe the 5th day to the present, it now gives me that message. I really can't remember if it's connected to something like updates (since, while I've been trying to figure this out for at least 3 months, there have been other things grabbing my attention), but I do know that I hadn't installed anything at that time.

There is a 1 disk Raid 0 array? Or are there other disks that you haven't mentioned?

It would be useful to know which partitions are used for what purpose, as that may well help to make things clearer. But if I had to make a WAG at this stage, I'd guess that one OS has touched something belonging to the other OS and subsequent attempts to boot are going 'Something unexpected/bad has happened to some of my data. Got to do something about that.'

You haven't tried to do something 'too clever by half' by trying to share a swap partition and then tried to interfere with the boot order (eg, suspend one OS and then try to boot the other) have you?

Quote:

Originally Posted by warriorjames (Post 4771880)
I also know I can't get the GUI (Init5). It just loops me back to that screen....Any idea what's going on?

If you haven't managed to get the partitions mounted (due to not getting the second set of 'use fsck' errors fixed (the sdd3 ones) then that is probably to be expected (well, depending on what sdd3 provides...assuming that it provides something that you need, then it is to be expected, anyway).

warriorjames 09-06-2012 01:54 PM

Quote:

Originally Posted by salasi (Post 4773559)
Well, 12.2 is out now. OTOH, understanding things is good, and there is always a danger that 12.2 just offers you an upgraded version of the same problem...

I kind of figured as much. Uh...what does OTOH stand for?

Quote:

Originally Posted by salasi (Post 4773559)
What I understand of this is that 'fb' is likely to be frame buffer, and that agrees with possibility of a video error of some kind...

Alright...worry about that later.

Quote:

Originally Posted by salasi (Post 4773559)
So, there is a resume (the other half of a suspend 'n resume) being attempted, something is (probably) being detected as wrong with the disk, a fsck (the check part of file system check in this case actually being fix rather than check) is being run, it is successful and it proceeds on to mount the disk partition. There is a question about why exactly something is being detected as wrong with the disk (repeatedly, if it is repeatedly), but it is getting fixed, so this isn't stopping you from booting.

I didn't think so. If that were the case then GRUB 2.0 wouldn have a problem, right?

Quote:

Originally Posted by salasi (Post 4773559)
Odd, but probably irrelevant. The CPU temp sensor is somehow detected as unreliable, possibly from your CPU version and revision. Doesn't seem to have anything to do with the matter at hand.

I figured

Quote:

Originally Posted by salasi (Post 4773559)
Pass.

Yeay

Quote:

Originally Posted by salasi (Post 4773559)
Now, this is more interesting. sdd3 is 'clean', so that shouldn't cause a problem, but the next thing that happens is that you go to emergency mode (so something is seriously wrong...usually the kind of something that you'd hope an fsck would fix, but that was tried 'automagically' and the fix didn't work. Previously sdd2 was fsck'ed, and that seems to have been fine, but there was no similar message about sdd3 being fine.

Is it sdd3 or a subsequent partition that is causing the problem? Pass.

Is it sdd3 or a subsequent...wait..what?

I've gone through the sd thing before this happened. If I remember correctly:
sda was the 1st of 2 1.5TB Seagate drives
sdb was windows
sdc was the 2nd of 2 1.5TB Seagate drives
sdd was linux

sdd1 is the root (/) if I remember correctly, which would make sdd2 the swap and would make sdd3 where /home goes. I don't think anything happened to /home.

Quote:

Originally Posted by salasi (Post 4773559)
Presumably, you did try to log in as root, as per the prompts. What happened?

Well, it successfully took the password and I became root. Thing is, I have no idea where to go from there.


Quote:

Originally Posted by salasi (Post 4773559)
There is a 1 disk Raid 0 array? Or are there other disks that you haven't mentioned?

It would be useful to know which partitions are used for what purpose, as that may well help to make things clearer. But if I had to make a WAG at this stage, I'd guess that one OS has touched something belonging to the other OS and subsequent attempts to boot are going 'Something unexpected/bad has happened to some of my data. Got to do something about that.'

You haven't tried to do something 'too clever by half' by trying to share a swap partition and then tried to interfere with the boot order (eg, suspend one OS and then try to boot the other) have you?

It's a 2 disk RAID0, not that whole "just a bunch of disks" array. SUSE is on the Samsung HD (250GB) and has the standard 3 partitions. All the Windows partitions are on a Western Digital HD (1TB) (XP Media is C:, XP Home is D:, Server 03 is E:, F: is a blank partition I have set aside for me to learn server 08 if I can get my hands on it, and G: is Win 7). The RAID 0 HDs consist of 2 Seagates (1.5TB each) and are set a W:. There is an external Seagate I use (it comes up as H: in windows, and I believe it was sde in Suse. It is also 1.5TB), but even if its not connected we hit the same problem (and even before this happened it didn't cause any real issues...just a slightly slower boot).

Also, I know the whole sd stuff mentioned above because I considered creating a RAID1 with the Seagate HDs in Suse, but then after researching I took a rough guess that, even if it was formatted in FAT, Windows wouldn't be able to read it...so I abandoned the idea. Is this what you mean by a 'swap partition'? If so, I've considered it, but never done it. If not, please explain it to me.


Quote:

Originally Posted by salasi (Post 4773559)
If you haven't managed to get the partitions mounted (due to not getting the second set of 'use fsck' errors fixed (the sdd3 ones) then that is probably to be expected (well, depending on what sdd3 provides...assuming that it provides something that you need, then it is to be expected, anyway).

sdd3, as stated, is /home. I don't think it would have anything necessary.

There's one other thing I can think of that may have caused a problem, but it appeared to have been reversed when I upgraded. Back in 11.3 though, I did mess with how Suse talked with XP (C:\) in...I believe it was fstab (the whole defaults,locale=en_US thing), but 11.4 undid that and made it impossible to re-do it (I know because after the upgrade to 11.4 I found that I couldn't write to C. I went into fstab and it was back to what it normally be...and it wouldn't allow me to change it) and so I gave up on the idea. 12.1 has had no changes done in that area.

So does that help any?

rigor 09-07-2012 02:57 PM

Fyi, fwiw
 
Hi warriorjames!

I don't know that I can provide any substantial help beyond what salasi already has. But I can address what are perhaps some of the lesser details.
  1. the log messages you posted:
    A) My feeling about kernel log messages in general, is that there are a variety of messages that either should not be present, or should be more fully identified. I believe
    it's largely due to not propagating full context information between kernel levels. There are messages that may occur as a matter of course, they are often, or
    always present. But when someone is troubleshooting, those "routine" messages can cause concern. For example, I have several "media slots", such as for an SD memory
    card. For every slot that doesn't have a card in it, I'll generally get at least one message per empty slot, stating that an open of the associated Linux device failed.
    That's the type of thing I would normally expect to be an error message. Instead it's effectively just saying the slot is empty.
    B) In the case of the attempt to resume, AFAIK, when I boot openSUSE, I see that virtually every time, even if I've completely shut the system down, requiring a "full
    cold boot" to start it again. It's as if no status is available about what sort of "shutdown" I might have done, and so the OS is trying to see if maybe it should
    resume, just in case I might not have done a full shutdown.
    C) As to the thermal sensor message, I get a similar message on my machine. But since I can use a variety of thermal monitor programs from MS-Windows on the same machine,
    and they agree with one another as to the proper readings, those programs appear to work fine. So I strongly suspect it's an issue with openSUSE.
  2. NTFS on Linux:
    Some people insist that NTFS works just fine on modern Linux systems. My experience has been somewhat different. When I've used NTFS in a primary partition, writing only
    from MS-Windows, just reading from Linux, there are no problems of which I'm aware. BUT, when I try to do anything much beyond that, combine NTFS(3g specifically) with
    other "disk handling features", I've usually seen problems, sooner or later. So if you are using NTFS with a RAID array on Linux, I would be very concerned about potential failures.
  3. openSUSE 12.1 in general:
    When it was first released, there were some known, rather fundamental problems, including packages missing. On my machine, no graphics mode install would work, even though
    I was already running 11.4. I was forced to crawl through a text install. Even then, there were crashes during the install, and corruptions of the completed install. I worked through some
    repairs on my own, and others with help from people at SUSE who worked on some trouble tickets I submitted. But even after those early repairs, and after patches had been released,
    I didn't really trust it enough to use it, day to day. Especially since I had been testing some of the milestone versions leading up to the release, and AFAIK, had reported problems that
    had not been fixed by the final release. OTOH ( On The Other Hand ) I have downloaded 12.2, and I am very hopeful about it.

Hope this Helps.

John VV 09-08-2012 05:19 AM

I think it was FDR (Roosevelt) that wanted a "one armed economist "BECAUSE of the OTOH ( On The Other Hand ) issue

opensuse12.1 is notorious for needing some " odd " kernel setting for some hardware
some intel built in chips on laptops and some cpu's or other hardware


but this is a raid0 problem
is windows7 the ONLY os using those disks ? ( i do not think so)
Quote:

Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
resume: libgcrypt version: 1.5.0
------------
Mounting root /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2
if so open the box and unplug them

then reboot


you DO have a backup of any important data on it ? right?
if so reformat and install 12.2

warriorjames 09-09-2012 11:16 AM

It's not Windows 7 using the RAID 0, it's XP...and as I said, I initially didn't have a problem.

As for NTFS, it wasn't to the RAID but to Windows/C (as seen from root) since there were only things I could do in Linux, and I wanted to be able to just dump them to C instead of using a flash drive. That worked in 11.3, but 11.4 reversed that change and I have been unable to do it since (thus forcing the flash).

Also, I never had text run for me before this incident; I only saw our little guy with his eye moving when Suse loaded.

I detached the SATA cables to the RAID 0 drives (as John VV suggested). I wound up with this:
------------------------------------------------=-------------------
[drm:atom_get_src_int] *ERROR* ATOM: fb read beyond scratch region: 1245188 vs. 16384 (again, numerous times)

Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
resume: libgcrypt version: 1.5.0
Trying manual resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Invoking userspace resume from /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part1
Waiting for device /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 to appear: ok
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a -C0 /dev/sdd2
/dev/sdd2: clean, 140170/1313280 files, 1372526/5242880 blocks
fsck succeeded. Mounting root device read-write.
Mounting root /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2
mount -o rw,acl,user_xattr -t ex4 /dev/disk/by-id/ata-SAMSUNG_SP2504C_S09QJ1MYC11612-part2 /root
[7.542394] SP5100 TCO timer: mmio address 0xfec000f0 already in use
[7.577207] k19temp 000:00:18.3: unreliable CPU thermal sensor: monitoring disabled
[19.247285] end request: I/O error, dev sr1, sector 4096
[19.247295] Buffer I/O error on dev sr1, logical block 512
[28.166138] end request: I/O error, dev sr1, sector 4096
[28.166190] Buffer I/O error on dev sr1, logical block 512
[37.883277] end request: I/O error, dev sr1, sector 4096
[37.883287] Buffer I/O error on dev sr1, logical block 512
[46.792053] end request: I/O error, dev sr1, sector 4096
[46.792061] Buffer I/O error on dev sr1, logical block 512
systemd-fsck[825]: /dev/sdd3: clean, 52335/13828096 files, 8921639/55280304 blocks
Welcome to emergency mode. Use "systemctl default" or ^D to activate default mode.
Give root password for login:
-------------------------
So...apparently we hit a new error. I thinking that a wipe may be the only way to make this work. I'll try booting from the flash I have that has mini version of linux on it and see what on the suse drive (provided I can get to it).


All times are GMT -5. The time now is 03:31 AM.