LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 07-28-2011, 06:26 PM   #1
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,836
Blog Entries: 1

Rep: Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251
Hardware/CPU problems.


edit: sorry the photos did not show rotated for some reason.

I've recently flashed my BIOS and noticed something strange. My CPU is an Intel i7 920 2.66Ghz.
That was the frequency that I had with the previous BIOS verions (v. F2)

I downoaded the new firmware from here:
http://www.gigabyte.com/products/pro...?pid=3251#bios
I upgraded it from F2 to F7. I get the .exe files, unpack it with 7z and flash BIOS from a USB stick. The reason I upgraded it to F7 is that the most recent versions F8/F9 don't seem to work. I mean the QFlash in BIOS complains about incorrect file size and it won't flash BIOS.

Anyway, back to the issue, the BIOS doesn't seem very stable. Sometimes it hangs when I browse through the settings. The reason I flashed the BIOS in the first place is that apparently the old firmware had some issues with SATA3 controllers (I'm still waiting for my ssd drive). I managed to configure most of the BIOS settings as it was before apart from the CPU frequency.

Although it "seems" to be set as 2.66GHz there:
http://s1092.photobucket.com/albums/...nt%3Dbios1.jpg
The other BIOS screen shows 2.8GHZ:
http://s1092.photobucket.com/albums/...nt%3Dbios2.jpg

And when I boot the computer it displays 2.8GHz (133x21) as well:
http://s1092.photobucket.com/albums/...ootmessage.jpg

I have done the memtest which didn't give me any errors. I have also done a cpu test from an Arch live cd and that's the output. To be honest I don't know how to interpret it.

http://s1092.photobucket.com/albums/...%3Dcputest.jpg

I've been testing my 64-current with this configuration for a few days. It doesn't crash with normal desktop usage. I also tried to stress test it with an online bitcoin generator - the system always freezes after less than 30 seconds. Before it freezes the output of 'top' shows that java uses between 600 and 700% of CPU.
I also started testing it with kernel compilation. It compiles fine until up to -j6. I've tried j6 without gui and it completed successfully every time. When I startx and run some programs (thunderbird/firefox) it also crashes the system. With -j7 and 8 it always crashes the system.
That's the error I get:
http://s1092.photobucket.com/albums/...or-compile.jpg

Please not that I've been having the "kernel hardware error no human readable mce decoding support on this cpu type" error regularly before I flashed the bios. Every now and then it just pops up on the console. Also, compiling with higher -j flags used to crash my system as well before I upgraded the BIOS firmware so it might not be directly connected. Besides I found the following information on the mcelog website: http://mcelog.org/faq.html#13
Whenever this error pop up, I'd try 'mcelog --asci' but it didn't show anything. So I'm not sure if the mcelog error is a real hardware fault or just a bug in newer kernels (as the mcelog page suggests). I also don't know if any of the things described here by me are related.
I'd like to sort it out before I install my ssd drive to eliminate any stability issues.

Thank you for your suggestions.

Last edited by sycamorex; 07-28-2011 at 06:28 PM.
 
Old 07-28-2011, 10:00 PM   #2
afreitascs
Member
 
Registered: Aug 2004
Distribution: Debian
Posts: 443

Rep: Reputation: 30
Big problem!

When I updated the bios, I download the same file from different servers to see if their md5sum is the same. What I mean is that you may have lowered corrupted file ..

You also can chat with support from Gigabyte

http://www.gigabyte.com/support-down...l-support.aspx

See also ...

http://forum.giga-byte.co.uk/index.php?topic=2753.0

In

I think the CPU temperature is a little high! 58 º C!

good luck
 
1 members found this post helpful.
Old 07-30-2011, 08:55 AM   #3
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,836

Original Poster
Blog Entries: 1

Rep: Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251
Thanks for your reply.

I'll need to look for answers on some gigybyte/ocz forums as it seems like an issue with incorrect BIOS settings. For 2 days I've been running the system with 'safe' BIOS settings and it can generate bitcoins and compile the kernel (-j8) at the same time without any problems. Also I haven't received any errors from mcelog yet. According to the wikipedia entry, machine exception errors may result from overclocking, but my previous BIOS settings had RAM frequency BELOW the manufacturer's specs and CPU was exactly 2.66GHz (which is an advertised frequency for i7 920). Mind you, I don't know anything about voltages and a great number of other settings in BIOS so I guess it wasn't optimally configured.

I realise it's highly unlikely but has anyone got the very same specs by any chance?

Intel Core i7 920 D0 2.66GHz Socket 1366 8MB Cache
Gigabyte GA-X58A-UD7 X58 Socket 1366 8 Channel Audio ATX (rev 1.0, BIOS: F7)
OCZ 6GB (3x2GB) DDR3 2000MHz Gold Memory Kit CL10(10-10-10-30) 1.65V
 
Old 07-30-2011, 07:33 PM   #4
afreitascs
Member
 
Registered: Aug 2004
Distribution: Debian
Posts: 443

Rep: Reputation: 30
See if that helps ... It is a very complete review of this card ...

http://www.hardwarecanucks.com/forum...rd-review.html
 
1 members found this post helpful.
Old 07-31-2011, 06:47 AM   #5
cascade9
Senior Member
 
Registered: Mar 2011
Location: Brisneyland
Distribution: Debian, aptosid
Posts: 3,753

Rep: Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935
Quote:
Originally Posted by sycamorex View Post
I'll need to look for answers on some gigybyte/ocz forums as it seems like an issue with incorrect BIOS settings. For 2 days I've been running the system with 'safe' BIOS settings and it can generate bitcoins and compile the kernel (-j8) at the same time without any problems. Also I haven't received any errors from mcelog yet. According to the wikipedia entry, machine exception errors may result from overclocking, but my previous BIOS settings had RAM frequency BELOW the manufacturer's specs and CPU was exactly 2.66GHz (which is an advertised frequency for i7 920).
I dont think that it was due to incorrect BIOS settings.

The i7s (along with most other CPUs that have ever come out of intel factories) should be multiplier locked, in the case of the i7 920 at x 20. Your 'overclock' was from bumping the multi to x21, which shouldnt be possible. (though IIRC that is how 'turbo' mode works, by bumping the multi).

I wouldnt be suprised if that was the only symptom to some more serious problem, 58C in BIOS seems *very* high.

I'd be trying to reflash the BIOS myself.

BTW, going from 2.66GHz to 2.8GHz shouldnt cause any serious problems in itself. If there was some other hidden/non-obvious problem (like the CPU vlotages being too high) that could cause problems.

Quote:
Originally Posted by afreitascs View Post
When I updated the bios, I download the same file from different servers to see if their md5sum is the same. What I mean is that you may have lowered corrupted file ..
Great idea, I'd never thought of doing that.
 
1 members found this post helpful.
Old 08-01-2011, 08:20 AM   #6
cascade9
Senior Member
 
Registered: Mar 2011
Location: Brisneyland
Distribution: Debian, aptosid
Posts: 3,753

Rep: Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935
I know, double posting is bad. I would have just edited my last post but I dont think that poster/the OP get updates if I just edit a post.

Anyway, high CPU temps in BIOS, and a multi that makes me think you are running turbo? I'd add both of them together, and that sure looks like your CPU is under load in BIOS.

I feel really silly for not getting this yesterday. I'll blame the head injury I gave myself a few days ago.

Last edited by cascade9; 08-01-2011 at 08:22 AM.
 
Old 08-01-2011, 09:25 AM   #7
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,836

Original Poster
Blog Entries: 1

Rep: Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251
Quote:
Originally Posted by cascade9 View Post
I know, double posting is bad. I would have just edited my last post but I dont think that poster/the OP get updates if I just edit a post.

Anyway, high CPU temps in BIOS, and a multi that makes me think you are running turbo? I'd add both of them together, and that sure looks like your CPU is under load in BIOS.

I feel really silly for not getting this yesterday. I'll blame the head injury I gave myself a few days ago.
I managed to install the latest BIOS firmware (F9a). It hasn't changed much. Yes, you were right I was running it in a turbo mode. I switched it back to the standard mode and it shows as 133x20 now.

Now I'm inclined to believe that there's something wrong with my RAM. The only variable that makes some difference in BIOS is the RAM multiplier. If I set it to 14 (=1866MHz) as I used to run it for 2 years, the cpu temperature in bios fluctuates between 55-58C and the computer crashes when doing some memory intensive tasks. When, however, I change the multiplier to 9 (1066Mhz) the temperature in bios drops to around 46C and it doesn't crash when compiling/generating bitcoins. The DRAM voltage is set to 1.64V (the specifications for the RAM show 1.65V, but I have never been able to set it in BIOS to exactly 1.65 - I can do 1.64 or 1.66, but there's a red warning next to 1.66 so I've never tried it. I have done the memtest86 and it didn't show any errors.

BTW, I hope your head is getting better
 
Old 08-01-2011, 10:25 AM   #8
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Quote:
Originally Posted by sycamorex View Post
Now I'm inclined to believe that there's something wrong with my RAM. The only variable that makes some difference in BIOS is the RAM multiplier. If I set it to 14 (=1866MHz) as I used to run it for 2 years, the cpu temperature in bios fluctuates between 55-58C and the computer crashes when doing some memory intensive tasks. When, however, I change the multiplier to 9 (1066Mhz) the temperature in bios drops to around 46C and it doesn't crash when compiling/generating bitcoins. The DRAM voltage is set to 1.64V (the specifications for the RAM show 1.65V, but I have never been able to set it in BIOS to exactly 1.65 - I can do 1.64 or 1.66, but there's a red warning next to 1.66 so I've never tried it. I have done the memtest86 and it didn't show any errors.
Two things:
1. The maximum voltage for RAM on the Intel i7 for socket 1366 is 1.65V, Intel warns that the memory controller may be damaged when going higher.
2. AFAIK, the clockspeed of the memory controller on that CPU is directly related to the clockspeed of the RAM. If I remember correctly, the multiplier for the Uncore-part of the CPU (which, besides other things, contains the memory controller) must always be double the value than the memory multiplier. Intel states that the maximum speed for memory for that CPU is DDR3-1066. Running that memory controller with 1866 settings is a massive overclock for the meory controller, which may have lead to a degradation in your memory controller. But it may also be that you simply forgot to adapt the Uncore multiplier and that causes your issues. The default value on an i7 920 for that multiplier is 16, which resembles 2133 MHz (16*133MHz), which is exactly what you need for DDR3-1066. To run your machine with DDR3-1866 you logically have to set that to 28 (3732 MHz, 174% of normal clock speed).
 
1 members found this post helpful.
Old 08-01-2011, 12:46 PM   #9
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,836

Original Poster
Blog Entries: 1

Rep: Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251
Quote:
Originally Posted by TobiSGD View Post
Two things:
1. The maximum voltage for RAM on the Intel i7 for socket 1366 is 1.65V, Intel warns that the memory controller may be damaged when going higher.
2. AFAIK, the clockspeed of the memory controller on that CPU is directly related to the clockspeed of the RAM. If I remember correctly, the multiplier for the Uncore-part of the CPU (which, besides other things, contains the memory controller) must always be double the value than the memory multiplier. Intel states that the maximum speed for memory for that CPU is DDR3-1066. Running that memory controller with 1866 settings is a massive overclock for the meory controller, which may have lead to a degradation in your memory controller. But it may also be that you simply forgot to adapt the Uncore multiplier and that causes your issues. The default value on an i7 920 for that multiplier is 16, which resembles 2133 MHz (16*133MHz), which is exactly what you need for DDR3-1066. To run your machine with DDR3-1866 you logically have to set that to 28 (3732 MHz, 174% of normal clock speed).
Tobi, I think you've nailed the problem. I did not think that RAM clockspeed can be limited by the CPU. It makes sense now. The uncore multiplier has always been set to auto, which as I just checked defaults to 16 (giving 2133MHz), as you said.
I hope I haven't done much damage to my hardware by overclocking it for 1.5 year.

Thank you.
 
Old 08-04-2011, 04:41 AM   #10
cascade9
Senior Member
 
Registered: Mar 2011
Location: Brisneyland
Distribution: Debian, aptosid
Posts: 3,753

Rep: Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935
Quote:
Originally Posted by sycamorex View Post
Tobi, I think you've nailed the problem. I did not think that RAM clockspeed can be limited by the CPU. It makes sense now. The uncore multiplier has always been set to auto, which as I just checked defaults to 16 (giving 2133MHz), as you said.
I hope I haven't done much damage to my hardware by overclocking it for 1.5 year.
The 'tweaky' 'overclockers' RAM has always been more fiddly than 'normal' RAM. I've worked on a number of machines owned by friends with high rated DDR2/DDR3, and when I've checked the BIOS I've found that they are running at DR2-667/800 or DDR3-1333, not the much faster speed the RAM is rated at.

Its mainly a SPD problem.

I'd doubt you've done any damage to your memory controller, I know people who have pushed just as hard as that for longer periods. I'd be far more worried about the whole 'CPU under load in BIOS' than overclocking the memory controller.

Quote:
Originally Posted by sycamorex View Post
BTW, I hope your head is getting better
Thanks, its getting there...slowly.
 
Old 08-04-2011, 05:05 AM   #11
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,836

Original Poster
Blog Entries: 1

Rep: Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251Reputation: 1251
It's been running stable since my last post. From time to time I check the cpu temperature in bios and it's around 42C - which seems acceptable.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Dell laptop has hardware problems. CPU? the_mulletator Linux - Hardware 8 01-16-2010 11:45 PM
Second CPU not showing in hardware monitor Wakinglimb Debian 2 09-05-2005 11:21 AM
hardware interrupts using too much cpu Rocker Linux - General 4 10-12-2004 05:55 PM
hardware name/platform/CPU type? chinmay nautiya Linux - Software 2 08-19-2003 06:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 10:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration