LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   all ports down (https://www.linuxquestions.org/questions/linux-newbie-8/all-ports-down-364645/)

chiggly 09-18-2005 05:43 PM

all ports down
 
Hi there,

I am hoping someone can help. I recently signed up to a 3 year managed OS agreement with ***** on which they installed Red Hat Enterprise Edition.

Direct Admin, PHP, MYSQL, APache anc SSL were also configured. It had only one website running.

Everything was ok for 3 mths and then we noticed one day that we could not log on vis SSH or FTP.

The services were down. Then the next day Apache died. No errors in the log files.. Then FTP and SSH went down again. It went to a stage where it was getting re-booted every day several times a day.

It had very low volume... I requested my ISP to look into it and they came back with the following
***********************************************************
The load average is fine, there are no errors at all in any of the log files.

I will continue to monitor the server see if any process does cause a severe spike in system resource usage.

If not, and the problem persists, I would recommend scheduling a time to swap the drives into a new chassis. This will confirm or eliminate bad hardware as the culprit.
***********************************************************

Before we had the chance to look further into this..

no one could log on via SSH.. The ISP couldn't via there serial consoles and neither could the support staff in the data centre. Though Apache was still running.

If you tried to log in you would get the login prompt and password prompt, but then the session would just hang and not report anything back.

or it would just timeout.

Though 'pinging' the server got a reply.

I was then told that they were:-
mounting the disks to stop services from starting at boot and then booting to see if they would then be able to login. It was unsuccessful.

THEN..
I was told they were giving me a new chassis (cpu and ram), new disks and I should install the server from scratch again!

I am not a systems administrator but I would like to know if anyone has come across anything like this before with RedHat?

Do services going down sound like a software or hardware problem?

As they can not tell me what went wrong I have no confidence that it won't re-occur and do not have the money or time to re-set everything up again.

Can anyone shed some light ? Someone with more knowledge than myself?

Any ideas of advice is much appreciated.

JimBass 09-18-2005 05:52 PM

Chances are very good that it is hardware failure. Were it to be apache or the server itself screwing up, you would have logs of it. Also the fact that everything breaks, the Apache, ftp and ssh, suggests some type of problem that is beyond the control of the OS, meaning something is broken/breaking on the hardware level.

Also, what in the hell is managed OS agreeent if they don't manage the box? Nothing you did caused this problem, how can they sell you a support contract and then not provide a running machine? Sounds very suspicious. They are pulling the old "cover-thine-own-buttocks" approach to this, which is a fairly good indication that it could be their fault.

Peace,
JimBass

Matir 09-18-2005 06:06 PM

Sounds to me, most likely, like failing hard disk(s). Suspicious program failure usually indicates bad ram or disk, consistent failures on the same services (after reboot) sounds like disk to me.

chiggly 09-18-2005 07:23 PM

Thank youvery much for your answers.


They are telling me that they are running over backwards to help and tryign to say it is my fault and I NEED to pay to have it all configured again as it is not part of their SLA!!!!!

They are also trying to say I got hacked into... and need to fork out an extra £145 a month after 3 'free' months!!

All they monitor for £300 a month is PING!!!!!

They said there hardware monitors did not show anything wroing with their disks or hardware and that they monitor hundreds of other servers.

Do companies sue if you break out of a contract! I am not confident with setting up a new server if they don't give me a diagnosis. Though you have confirmed what I thought!!!!

Matir 09-18-2005 07:28 PM

This all depends on the contract you have with them. Some observations, though. For 300 pounds (sorry, no pound sign on my kdb) on a dedicated server, they should damn well monitor more than that.

Who is your current host? I've heard very good things about rackspace in the UK. Might want to talk to them.

It's possible you were hacked in to. It *DOES* happen, like it or not. And if it did happen, it's not their fault. Conversely, if it did happen, all you need to do is reformat and reinstall, not move to new hardware. Something smells fishy here.

chiggly 09-18-2005 07:47 PM

VERIO UK.

I should have gone with Rackspace. I didn't only because I had been with VERIO since 1999 December.

I asked about hacking and they said there was no increase in bandwidth.. no unusual activity. no nothing on the server. Talking to them though it is like they are looking for anything to blame it on but themselves. PLUS, they still have the website up! How! Doesn't make sense if MY SOFTWARE caused the problem as they atried to suggest.

It is the fact that from the 3rd of sept to the 13th of sept. It was ME telling them it was down.. Me telling them to reboot. When it first happened on the 3rd. there support staff said it could be disks and suggested moving them to a new chassis. They didn't. Everytime they rebooted they I aksed them to look into the problem and stop rebooting as it was hiding the true nature of the problem.

My colleague checked all the OS log files nothing. !!! They said their hw monitors showed nothing. its the fact that they were so comfortable to do nothing but reboot.. then get active towards the end .. then calmly say.. here's a NEW empty server.. start again!!!!!

They only backup data and I was told when I asked if they could restore back to the point the server was at on the 30th of August that... they are unable to do a bare metal restore of applications etc. to get you into a state that resembles where yo were last week' as ONLY windows servers have that feature.

So I am out of pocket in terms of re-configuring the server...

I have a high volume site still on the VPS (virtual private server) and to tell the truth if it was on that server I would be bankrupt now.

AS they are offering no diagnosis to the problem I do nto feel confident with re-configuring (paying for it) and starting afresh as I have no assurances it won't happen again.

Matir 09-18-2005 08:50 PM

I would definitely move somewhere else. At the kind of prices you pay, they should offer bare metal restoration services. It's complete BS that they only back up data. On a DEDICATED server, how do they now how your filesystem is set up?

I can't imagine working with this company.

chiggly 09-19-2005 03:18 AM

So bare metal restoration is possible on a LINUX server?

The sales rep told me it was ONLY available on WINDOWS.

Matir 09-19-2005 12:39 PM

They may only provide it on Windows, but yes. They could copy your drive, use a tool like norton ghost, whatever.

chiggly 10-16-2005 05:12 PM

They have finally told me that this is what went wrong.... after I insisted..


We have identified a bug with the auditd daemon that froze access via shell (console and ssh) when the /var partition was more than 80% full. /var is now 21.3% full.


Anyone ever heard of this? Considering the box was hardly used? Little to no server activity.


Thanks in advance.

BoldKiller 10-16-2005 06:09 PM

I did a quick google. There seem to be quite a lot of know bugs with auditd on Red Hat enterprise. But I did not see anything saying it would freeze the console and ssh when /var got filled.

Frankly, I dont see how this could happend. If it was something like "when there is less than XX mb" Than would make sense. But the percentage??

I think they really dont know, but they dont want to say it. I too would be ashame to tell a client paying me this kind of money that I have no clue at what is going on!!

Anyway, good luck.


All times are GMT -5. The time now is 12:58 PM.