LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
Search this Thread
Old 06-11-2009, 03:47 AM   #1
elliot01
Member
 
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 71

Rep: Reputation: 16
Question Server lockup but NIC responds


Hi all,

I have an annoying issue with a HP Proliant ML350 server, running Mandrake:
Linux 2.6.8.1-12mdksmp #1 SMP Fri Oct 1 11:24:45 CEST 2004 i686 Intel(R) Xeon(TM) CPU 3.00GHz unknown GNU/Linux

The server's role is purely to run postfix for about 300 users.

It's been doing this task for a number of years, running the same hardware and software from original build (as far as I know).

The problem I have is that three to ten times a week (rough average) it simply locks up. I can ping it when this happens, indicating that something is still alive(?), but it no longer receives/sends mail or accepts SSH logins.

When I've physically travelled to site to manually reboot it, forcing a file check never really produces any significant damage or indicates drive failure.

Checking all the /var/log files doesn't suggest any major issues prior to the crashes either.

We've popped the lid and reseated cards, RAM, checked all the fans etc but the problem still persists.

It's in a temperature controlled environment with other business system/reporting/comms servers.

We have already decided to purchase some new mail servers, but I'd really like to re-use this server if possible (albeit at minimal cost!).

Any suggestions of things to try or double check would be very much appreciated.

Thanks.
 
Old 06-12-2009, 12:36 PM   #2
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Quote:
Originally Posted by elliot01
The problem I have is that three to ten times a week (rough average) it simply locks up. I can ping it when this happens, indicating that something is still alive(?), but it no longer receives/sends mail or accepts SSH logins.
Anything interesting in /var/log/messages when this happens? (I'm not sure which /var/log files you're looking at.)

Quote:
Originally Posted by elliot01
When I've physically travelled to site to manually reboot it, forcing a file check never really produces any significant damage or indicates drive failure.
You might install the SMART utilities and run hard drive tests.

Quote:
Originally Posted by elliot01
We've popped the lid and reseated cards, RAM, checked all the fans etc but the problem still persists.
Have you tested the RAM? If not, you can give that a go with memtest86+.
 
Old 06-12-2009, 01:27 PM   #3
Xeta
LQ Newbie
 
Registered: Apr 2006
Posts: 17

Rep: Reputation: 0
Hmmm, next time this happens can you check that the machine that actually responds to the pings is your machine?
Maybe an arp check?
Could be that another machine on the network is conflicting with your server.

Just a thought.
 
Old 06-16-2009, 05:13 AM   #4
elliot01
Member
 
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 71

Original Poster
Rep: Reputation: 16
Question

Hi Anomie,

/var/log/messages never shows anything out of the ordinary (at least from what I can see). We used to have the server under maintenance contract with a specialist company (who originally also supplied and configured it), but they were also stumped.

Here's three examples (containing five entries from /var/log/messages and /var/log/syslog preceding each crash (ie: restart)):

var/log/syslog
Jun 8 10:00:10 mail postfix/local[20819]: 0704279C23F: to=<rodmaden@hayley-group.co.uk>, orig_to=<rod.maden@hayley-group.co.uk>, relay=local, delay=4, statu
s=sent (delivered to command: /usr/bin/procmail)
Jun 8 10:00:10 mail postfix/qmgr[3306]: 0704279C23F: removed
Jun 8 10:00:10 mail ipop3d[22591]: Logout user=carlbosley2 host=jonest.hayley-group.co.uk [10.11.24.108] nmsgs=0 ndele=0
Jun 8 10:00:10 mail postfix/qmgr[3306]: B213179C27B: from=<colin.jarvis@hayley-group.co.uk>, size=26882, nrcpt=1 (queue active)
Jun 8 10:00:10 mail postfix/smtpd[20177]: disconnect from unknown[10.11.254.8]
Jun 8 10:10:01 mail syslogd 1.4.1: restart.

Jun 8 14:14:21 mail postfix/smtpd[25050]: 0ECA379C289: client=unknown[10.11.254.8]
Jun 8 14:14:21 mail postfix/cleanup[21957]: 0ECA379C289: message-id=<4A2D0DED00004E590001.kayleigh.ilic@hayley-group.co.uk>
Jun 8 14:14:21 mail postfix/qmgr[3289]: 0ECA379C289: from=<kayleigh.ilic@hayley-group.co.uk>, size=24918, nrcpt=1 (queue active)
Jun 8 14:14:21 mail postfix/smtpd[25050]: disconnect from unknown[10.11.254.8]
Jun 8 14:14:21 mail postfix/smtpd[25051]: connect from unknown[10.11.254.8]
Jun 8 14:27:33 mail syslogd 1.4.1: restart.

Jun 9 09:56:53 mail ipop3d[26305]: Logout user=jarrodb host=barwoodj.hayley-group.co.uk [10.11.22.149] nmsgs=0 ndele=0
Jun 9 09:56:53 mail ipop3d[26306]: pop3 service init from 10.11.12.106
Jun 9 09:56:53 mail postfix/cleanup[25338]: 7258C79C2AA: message-id=<20090609085801.DB1483A00FB@mail.georgelodge.co.uk>
Jun 9 09:56:53 mail ipop3d[26306]: Login user=sophie host=nicols.hayley-group.co.uk [10.11.12.106] nmsgs=0/0
Jun 9 09:56:53 mail ipop3d[26306]: Logout user=sophie host=nicols.hayley-group.co.uk [10.11.12.106] nmsgs=0 ndele=0
Jun 9 10:13:13 mail syslogd 1.4.1: restart.

/var/log/message
Jun 8 09:50:00 mail CROND[21991]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 8 09:55:00 mail CROND[22278]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 8 10:00:00 mail CROND[22619]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 8 10:00:00 mail CROND[22621]: (root) CMD (/bin/backup >/dev/null 2>&1)
Jun 8 10:00:01 mail kernel: end_request: I/O error, dev fd0, sector 0
Jun 8 10:10:01 mail syslogd 1.4.1: restart.

Jun 8 14:12:58 mail sudo: apache : TTY=unknown ; PWD=/u/admin ; USER=root ; COMMAND=/etc/sysconfig/oglfw/global changeuser steveams 100 Steve Crossley
Jun 8 14:12:58 mail sudo: apache : TTY=unknown ; PWD=/u/admin ; USER=root ; COMMAND=/etc/sysconfig/oglfw/installaliases
Jun 8 14:13:14 mail sudo: apache : TTY=unknown ; PWD=/u/admin ; USER=root ; COMMAND=/etc/sysconfig/oglfw/global changeuser steveams 100 Steve Crossley
Jun 8 14:13:14 mail sudo: apache : TTY=unknown ; PWD=/u/admin ; USER=root ; COMMAND=/etc/sysconfig/oglfw/global pwdset steveams BANGBANG1BANGBANGsGSAQ1zZB
ANGBANGxa8uAYfJD3X6VJu9ekEVX1
Jun 8 14:13:14 mail sudo: apache : TTY=unknown ; PWD=/u/admin ; USER=root ; COMMAND=/etc/sysconfig/oglfw/installaliases
Jun 8 14:27:33 mail syslogd 1.4.1: restart.

Jun 9 09:35:00 mail CROND[23903]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 9 09:40:00 mail CROND[24783]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 9 09:45:00 mail CROND[25553]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 9 09:50:00 mail CROND[25879]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 9 09:55:00 mail CROND[26180]: (mail) CMD (/usr/bin/python -S /usr/lib/mailman/cron/gate_news)
Jun 9 10:13:13 mail syslogd 1.4.1: restart.

I can post some more upon request.

/usr/lib/mailman/cron/gate_news <-- I have no idea what this is, except that it seems to run from a separate cron file, every 5 minutes of the day.

Quote:
You might install the SMART utilities and run hard drive tests.
To be honest, I really wouldn't feel comfortable installing anything onto this server, for fear of cocking anything up further (while it's still our primary mail host).

Quote:
Have you tested the RAM? If not, you can give that a go with memtest86+.
We haven't ran any RAM tests that I'm aware of (the maintenance company may have before my time). Though this would be awkward to perform as I can't afford to take the server down (I get plenty enough bitching from users when it crashes, voluntarily downtime would certainly not be well received!). It's certainly an option for when the new mail servers are live though.

Hi Xeta,

Pinging from my PC, or laptop or from colleagues' workstations also shows pings being returned (we always ping by IP). The second we switch off the server, the pings stop.

It's an interesting thought though, but restarting always corrects the issue (this has been going on for about 12 months now). Evern if we are on site (working in the server room for instance), and manage to reboot the server immediately, it never fails to restart fine.

The mail server and firewall are on their own 10.11.3.0 subnet too. No-one would really need or want to assign a device to this subnet, and it would stick out if users were trying to do this for any reason.

Anyone have any other ideas?

Last edited by elliot01; 06-16-2009 at 05:15 AM.
 
Old 06-16-2009, 05:05 PM   #5
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Quote:
Originally Posted by elliot01
Here's three examples (containing five entries from /var/log/messages and /var/log/syslog preceding each crash (ie: restart))
I don't see anything alarming in the log entries you posted. It seems unlikely at this point that you're having a software issue.

Quote:
Originally Posted by elliot01
To be honest, I really wouldn't feel comfortable installing anything onto this server, for fear of cocking anything up further (while it's still our primary mail host).
...
We haven't ran any RAM tests that I'm aware of (the maintenance company may have before my time). Though this would be awkward to perform as I can't afford to take the server down (I get plenty enough bitching from users when it crashes, voluntarily downtime would certainly not be well received!). It's certainly an option for when the new mail servers are live though.
In that case I'd recommend keeping regular backups and getting the new servers deployed ASAP. It sounds like proper hardware analysis may have to wait.
 
Old 06-23-2009, 03:13 AM   #6
elliot01
Member
 
Registered: Jun 2009
Location: UK
Distribution: CentOS / RedHat
Posts: 71

Original Poster
Rep: Reputation: 16
Hi Anomie,

Thank you very much for the time you have spent reading my posts.

It's comforting to know that I haven't blatently missed an easy fix for these lock-ups

New servers should be arriving next weeek. Fingers crossed all goes well.

Best Regards,

Elliot
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Soft lockup, tdb lockup, etc @ Fedora6/IBM Isabelle86 Linux - Server 6 11-16-2009 02:16 PM
Server took a poo, responds to pings but no http/ssh requests. Heres my log... zushiba Linux - Networking 2 05-14-2009 03:35 PM
SMTP server responds: Relaying denied, proper authentication required SaRS AeOL Linux - Newbie 2 04-28-2008 04:52 PM
Server responds to all configured ip addresses when the cable isn't plugged in. Mike_the_Man Linux - Networking 2 10-04-2006 02:37 PM
Server responds to 2 IP addresses albracco Linux - Networking 3 03-17-2006 12:07 PM


All times are GMT -5. The time now is 10:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration