Slackware This Forum is for the discussion of Slackware Linux.
|
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
02-22-2014, 06:32 PM
|
#31
|
Senior Member
Registered: Nov 2008
Posts: 1,050
Original Poster
Rep:
|
Quote:
You would be amazed at the problems low voltages can create. Unfortunately, you can have memory corruption problems when data is in RAM and which can mean data corruption when you write that out to disk. I'd recommend testing the power supply after you pull it to see which voltages are out of spec.
|
I prefer low voltages than voltage peaks which can severely damage controllers and fry components.. How can I test the PSU myself? I have a multimeter , do I just plug the PSU in the outlet and stick the multimeters prongs in specific ports?
I wouldnt want to electrocute myself or fry it and void the warranty. Otherwise, I am replacing it for sure, $2000 of hardware I dont want to lose!
In the meantime, Richard Cranium, can you talk about filesystems? I run ext4 right now on my server's partitions except xfs on my raid5 array where I store my music and movies.
|
|
|
02-22-2014, 07:29 PM
|
#32
|
Member
Registered: Nov 2013
Posts: 748
Rep:
|
didnt read the whole topic
heat is the biggest problem for electronics
for example when my computer heats up doing something intensive,
the PSU heats up too and thus looses efficiency
meaning it has to do more to keep up
meaning it heats up more and gives less stable current
voltages and regulating of those... its complicated
like my cpu (fx8230) has by default "turbo boost" enabled
meaning when running something it would raise frequencies
at that frequencies it is less efficient and thus draws even more power
that not only heats up the cpu, but also the motherboard that converts voltages for it and thus the PSU also
also since they wanted it to be stable at the default speeds, they put it on 1.45V when not idle
i found by trial and error that i can run it at 1.2V at almost full peak speed by turning off the boost (3.4 instead of factory 3.5 + boost to 4)
trial being luxrender at too many threads, its the heaviest program i found so far except a piece of code made just to murder the cpu (use all paths)
and error being a crash, lockup or just getting a CRC message in terminal
ofc ran it for an hour to be sure
not it runs cool (~15-20C lower then factory) while being performant enough
so just turning off boost things will help stability
lowering voltages might or might not help since the cpu uses less power (1.45V vs 1.2V at same current) thus being cooler,
but a cpu needs to make sure the voltages across all its transistors are over ~0.3V, if it can't be ensured errors will happen
more heat moves the stability boundary
bdw, im an electrician by school
Last edited by genss; 02-22-2014 at 07:34 PM.
|
|
|
02-23-2014, 11:57 AM
|
#33
|
Senior Member
Registered: Nov 2008
Posts: 1,050
Original Poster
Rep:
|
genss thanks for these clarifications.
The server is not overheating. The case ventilation is more than sufficient, and the IPMI sensors are not reporting elevated temperatures. The server was also protected by a UPS but I wonder how reliable and good is that (BTW its a APC Br1500 SmartUPS).
There are no boot or overspeed features being used on this server (all deactivated in the BIOS).
|
|
|
02-23-2014, 02:29 PM
|
#34
|
Member
Registered: Nov 2013
Posts: 748
Rep:
|
UPS's are fairly simple things, thus reliable (usually ofc)
when plugged in mains they basically do nothing but monitor mains wave's,
the current passes through them normally (they have a... i forget the name, its a switch basically)
i looked at the msgs now
i agree with the people saying it's RAID's fault as i remember lots of people complaining about hardware raid
(and it is a fairly complex piece of electronics, thus subject to have bugs as any other complex thing)
other thing it could be is the electronics on the disk itself
so ye, replacing things to check whats at fault
and looking at redhat and others bug reports concerning that RAID
gl
edit: PS it can be hard to check a PSU without a proper test case
despite what people think, they do not give 12V flat (and 5V and.. what was it 3.3V)
Last edited by genss; 02-23-2014 at 03:13 PM.
|
|
|
02-23-2014, 05:58 PM
|
#35
|
Senior Member
Registered: Nov 2008
Posts: 1,050
Original Poster
Rep:
|
genss, just to be sure, where did you read that I was using hardware RAID??
Same for RedHat.. Where did you read that I was using redhat? I am using Slackware64-14.0
Again, I agree with you on the PSU replacement. Its fairly (if not very) hard for people to properly test.
Corsair will reply to my RMA tomorrow I guess. Then I will explain the problem to the rep but I will maintain: This PSU (other than the fan speeding up once in a while) was totally silent until last thursday with the metallic hissing noises..
|
|
|
02-23-2014, 06:01 PM
|
#36
|
Senior Member
Registered: Apr 2009
Location: McKinney, Texas
Distribution: Slackware64 15.0
Posts: 3,860
|
I've used something like this.
|
|
|
02-23-2014, 06:03 PM
|
#37
|
Senior Member
Registered: Apr 2009
Location: McKinney, Texas
Distribution: Slackware64 15.0
Posts: 3,860
|
Quote:
Originally Posted by lpallard
This PSU (other than the fan speeding up once in a while) was totally silent until last thursday with the metallic hissing noises..
|
That sounds like a leaking capacitor (leaking charge, that is).
|
|
|
02-23-2014, 06:14 PM
|
#38
|
Senior Member
Registered: Nov 2008
Posts: 1,050
Original Poster
Rep:
|
Quote:
Originally Posted by Richard Cranium
That sounds like a leaking capacitor (leaking charge, that is).
|
Would that cause electrical damage to motherboard or only data corruption?
|
|
|
02-23-2014, 07:37 PM
|
#39
|
Member
Registered: Nov 2013
Posts: 748
Rep:
|
Quote:
Originally Posted by lpallard
genss, just to be sure, where did you read that I was using hardware RAID??
Same for RedHat.. Where did you read that I was using redhat? I am using Slackware64-14.0
Again, I agree with you on the PSU replacement. Its fairly (if not very) hard for people to properly test.
Corsair will reply to my RMA tomorrow I guess. Then I will explain the problem to the rep but I will maintain: This PSU (other than the fan speeding up once in a while) was totally silent until last thursday with the metallic hissing noises..
|
"Ok but which one? There are 11 drives in this server.. 5 (2 as RAID1, 3 as RAID0) are using ext4, 6 (all as RAID5) are using XFS.. So if a drive is dying, I suspect one of the 5 drives that are using ext4 and assembled as RAID1 or the RAID0 array...
The 3 drives (out of the 5) used in the RAID0 array are quite old (7+years) so I wouldnt be surprised one of these may be dying. The two other drives used in the RAID1 array are brand new Seagates 2TB and are system drives so critical."
i assumed, don't know why
redhat keeps a good bugzilla, shouldn't matter what distro you use since afaik most of the patches land in the kernel eventually
metallic hissing ?
one thing a fan could be a bit broken
or if you hear like a constant high tone
that could be the psu struggling to put out current
either because it itself broke, or 'cuz something is drawing too much current
i would first try how it works without the old disks just to be sure
just my two cents
|
|
|
02-27-2014, 12:44 AM
|
#40
|
Senior Member
Registered: Apr 2009
Location: McKinney, Texas
Distribution: Slackware64 15.0
Posts: 3,860
|
Quote:
Originally Posted by lpallard
Would that cause electrical damage to motherboard or only data corruption?
|
My EE knowledge is from college courses taken over 30 years ago. So while I'm not the last person you should ask, I'm pretty far down the list.
genss appears to have better knowledge than I do. Computer power supplies (other than mainframes) didn't exist when I took my courses. I'd ask him.
|
|
|
02-27-2014, 12:55 AM
|
#41
|
Senior Member
Registered: Apr 2009
Location: McKinney, Texas
Distribution: Slackware64 15.0
Posts: 3,860
|
Quote:
Originally Posted by lpallard
In the meantime, Richard Cranium, can you talk about filesystems? I run ext4 right now on my server's partitions except xfs on my raid5 array where I store my music and movies.
|
Sorry for the delay on this part.
I'm running an mix of file systems on my main machine (the one I'm using to type this). I've got a couple of reiserfs partitions left over from when that was the only one that you could resize while mounted. They have been responsible for several system lock-ups during reboot. If you don't have reiserfs on your box, I think you should keep it that way.
I've had some very odd errors when expanding my xfs partitions lately. Those errors required me to unmount the partition in question and run xfs_repair on it prior to running xfs_growfs. I do not appear to have lost any data, but it has been somewhat unsettling. (For the record, those partitions are really LVM logical volumes that exist on top of software RAID-1 arrays.) I have not detected such errors with my ext3/ext4 partitions.
Even so, I still tend to use xfs which either makes me a wild and crazy guy or a complete *bleeping* idiot. Assuming there's a difference.
|
|
|
02-27-2014, 10:12 AM
|
#42
|
Member
Registered: Mar 2013
Location: Florida, USA
Distribution: Slackware, FreeBSD
Posts: 210
Rep:
|
I haven't read the definitive final update on the power supply issue, so I'm looking at kernel possibilities. Might you try kernel 3.2.29 and see if makes any kind of improvement? The original Slackware 14.0 kernel was 3.2.29, so if the original Slackware 14.0 kernel worked just fine for you, try that one first. Otherwise, you'll want to build a new kernel from a partition on the good controller, perhaps needing to reboot if the build process stalls. [The kernel itself should still be installed to wherever LILO or GRUB can find it.]
This assumes that you haven't rebuilt glibc. When in doubt, run /lib/libc.so.6 directly, and it will show you the kernel against which glibc was built. You should use a kernel no older than that.
Should I come across a semi-educated guess on a kernel after 3.2.45, I'll let you know that as well. Such things take time, and again, I'm hoping that the problem is as easy and simple as a weak/failing power supply.
|
|
|
02-27-2014, 09:49 PM
|
#43
|
Senior Member
Registered: Nov 2008
Posts: 1,050
Original Poster
Rep:
|
Because all the references I have found so far pointed to software issues, I am too tempted to think about a kernel bug of some sort.. Or perhaps the combination of kernel+mdadm+HDD firmware?
I had this idea in mind a few minutes ago: other than a weak/defective PSU not sending the right voltage to the components hence causing these intermittent problems, why would this problem happens "way" after I rebooted the system? I mean several minutes (10-20+++) or even hours after I reboot the machine.
I mean, if it was a defective controller, hard drive or motherboard, wouldn't the issue occur right after the system was active? Perhaps even during boot time?
To me it looks like a buffer somewhere was getting full, causing HDD trashing and concurently apps starting to crash or slow down.
EDIT: Maybe a bug with Percona MySQL ??
Last edited by lpallard; 02-27-2014 at 11:10 PM.
|
|
|
02-27-2014, 11:48 PM
|
#44
|
Senior Member
Registered: Apr 2009
Location: McKinney, Texas
Distribution: Slackware64 15.0
Posts: 3,860
|
Let me put this rather bluntly. You've got a power supply that's making a hissing noise. Power supplies don't normally make a hissing noise; therefore something is wrong with that power supply. However, let's assume that the power supply isn't broken enough to just outright fail but instead gives out-of-spec voltage and amperage to its outputs. It may even deliver power within specs most of the time, but not all of the time.
So let's say that your +5V line sometimes drops below +4.5V (or more!) during a RAM bank refresh, causing a random bit or more to have a changed value. Let's also say that you've got 4G of RAM. What are the odds that the changed memory will be in a place contains something your system is using? It depends upon how often the power supply fails to deliver and how randomly it fails to do so. Good luck finding a pattern.
*That's* the problem you might be dealing with.
If I were in your shoes, I'd change the power supply if I could afford it.
|
|
1 members found this post helpful.
|
02-28-2014, 07:39 AM
|
#45
|
Senior Member
Registered: Nov 2008
Posts: 1,050
Original Poster
Rep:
|
Hey Richard, thanks for replying.
Quote:
If I were in your shoes, I'd change the power supply if I could afford it.
|
You are absolutely right!! I am not taking any chances.. The PSU was sent for RMA 3 days ago! I am replacing it for sure.. I am just trying to determine what (other than the PSU) could be causing this because like I said, if you perform any search on google with chunks of the dmesg I was getting, you'll ultimately end up on bug reports, kernel oops reports, raid bugs, etc... all software related, none indicating or pointing to hardware malfunction.
What if the new PSU comes back, I install it and the issue just keeps happening? What then? RMA the mobo? RMA the CPU's?
Last edited by lpallard; 02-28-2014 at 07:40 AM.
|
|
|
All times are GMT -5. The time now is 08:44 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|