LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-17-2009, 02:30 AM   #1
LGX
LQ Newbie
 
Registered: Dec 2009
Posts: 3

Rep: Reputation: 0
Question Centos Server Troubleshoot Tips


Hey all,

I install a centos 5.2 on an intel server box and been running for about one month smooth. Couple of days ago, no changes made or no new added programs, but the server has been acting up.

When I try to login in via SSH, there is a delay on every keystroke. The server itself does not preform correctly. When I see this, I usually just reboot the server to fix the issue and it runs perfect again. About one week later, it does it again.

I try to run memtest and all memory pass without any errors. I also try to run S.M.A.R.T on the hdd to see if they are bad, but it passes. Not sure if there are anything else i can try (linux commands/tools) to see what is causing this issue. I am not sure if there are system logs or if they are easy to read for a beginner like myself.

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 137G 39G 91G 31% /
/dev/md1 145M 21M 117M 15% /boot
/dev/md0 3.9G 73M 3.7G 2% /tmp
tmpfs 3.9G 0 3.9G 0% /dev/shm

# free -m
total used free shared buffers cached
Mem: 7900 4399 3500 0 199 1689
-/+ buffers/cache: 2510 5390
Swap: 8189 0 8189

Any help would be great.

Regards,
LGX

Last edited by LGX; 12-17-2009 at 02:34 AM.
 
Old 12-17-2009, 11:15 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,665
Blog Entries: 54

Rep: Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952Reputation: 2952
Actually one of the worst things to do is reboot the server. Sure it may seem like a nice stop-gap solution but rebooting makes all process data that might help troubleshoot the issue disappear unless you log it. Only by logging system stats you can get objective numbers to measure performance.

- What services does the machine provide (application names, versions)?
- Do these hiccups occur only from your IP address or others (different networks) as well?
- Could these hiccups be network related?
- Anything interesting in any /var/log/ logs around the time the hiccups occur?
- Do you run Logwatch to keep a tab on all things reported?
- Do you have any CPU hogs? A wee script like
Code:
__genCpuhoglist() { /bin/ps -eo %C -eo pid,command|sort -bgr -k1|head -10|while read cpu pid command; do
 [ ${cpu%%.*} -gt 5 ] && logger "CPU: ${command%% *}: ${cpu}%"; done; sleep 10s; __genCpuhoglist; }
could show (currently set for load > 5).
- Do you run anything like Dstat or Collectl? Atsar or SAR? With one of the first two you would have a macro view of the resource usage on the box. And if you want to run 'top' I would choose Atop instead: it can save process stats which you can replay later on.
- Do you log in over SSH as root account user (BAD)?
* And if you SSH in, use 'screen'. It enables you to re-attach to broken off sessions easily.

Whatever you do please be verbose in replying: the more information the better.

Last edited by unSpawn; 12-17-2009 at 11:17 AM.
 
Old 12-17-2009, 12:46 PM   #3
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,362

Rep: Reputation: 172Reputation: 172
Keep in mind that Centos only supports the most current dot release. So 5.2 has not had any support since 5.3 came out, and 5.4 is current(two years without an update?). This could be a flaw that is fixed in the later releases. Upgrading from 5.X to 5.X+1 on Centos is generally (read release notes first) a simple yum update away.
 
Old 12-17-2009, 01:10 PM   #4
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 542

Rep: Reputation: 33
Check /var/log/sa for sar?? files. These can show you how busy the CPU was in10 minute intervals. It is part of the sysstat package and I believe it installs by default. If not you can install sysstat with yum. yum sysstat install

Have you ran "top" it will show you real time how busy your cpu's are.\

uptime will also show you the load average. Normally you want the load average under 2.0 Top also shows load average. Load average it key to showing how busy your system really is at any given moment.

cat /etc/resolv.conf

I've seen slow log in's because this file was not set up properly. But that would not help to explain other quirkiness that you've experienced. You might have a mulit issue problem going on here.
 
Old 12-17-2009, 11:47 PM   #5
LGX
LQ Newbie
 
Registered: Dec 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Question

I will try to capture some logs when this acts up again. I will provide as much info as you requested because im running out of ideas on what to do. Again, I do appreciate your help you guys posted on this topic. The information below was found on my last hiccups.

- What services does the machine provide (application names, versions)?
There are only basic services (server defaults) being used and only one application installed. Its a linux remote host controler that connect to other webserver.

- Do these hiccups occur only from your IP address or others (different networks) as well?
I do have mult ip binded to the servers and it does impact all other (IP's) network traffic.

- Could these hiccups be network related?
I confirm with my dc, there was no network related issues when this happens.

- Anything interesting in any /var/log/ logs around the time the hiccups occur?

On the last issue i had, this was posted in the /var/log message1
===============================================================
timeout: status=0xd0 { Busy }
Dec 12 15:28:21 chi01-Fibernetservers-1 kernel: ide: failed opcode was: unknown
Dec 12 15:28:21 chi01-Fibernetservers-1 kernel: hdb: drive not ready for command
Dec 12 15:28:26 chi01-Fibernetservers-1 kernel: hdb: status timeout: status=0xd0 { Busy }
Dec 12 15:28:26 chi01-Fibernetservers-1 kernel: ide: failed opcode was: unknown
Dec 12 15:28:26 chi01-Fibernetservers-1 kernel: hdb: drive not ready for command
Dec 12 15:28:31 chi01-Fibernetservers-1 kernel: hdb: status timeout: status=0xd0 { Busy }
Dec 12 15:28:31 chi01-Fibernetservers-1 kernel: ide: failed opcode was: unknown
Dec 12 15:28:31 chi01-Fibernetservers-1 kernel: hdb: drive not ready for command
Dec 12 15:28:31 chi01-Fibernetservers-1 shutdown[8754]: shutting down for system reboot
Dec 12 15:28:31 chi01-Fibernetservers-1 init: Switching to runlevel: 6
Dec 12 15:28:32 chi01-Fibernetservers-1 smartd[4057]: smartd received signal 15: Terminated
Dec 12 15:28:32 chi01-Fibernetservers-1 smartd[4057]: smartd is exiting (exit status 0)
Dec 12 15:28:33 chi01-Fibernetservers-1 avahi-daemon[3971]: Got SIGTERM, quitting.
Dec 12 15:28:33 chi01-Fibernetservers-1 avahi-daemon[3971]: Leaving mDNS multicast group on interface eth1.IPv6 with address fe80::215:17ff:fe6a:779.
Dec 12 15:28:33 chi01-Fibernetservers-1 avahi-daemon[3971]: Leaving mDNS multicast group on interface eth1.IPv4 with address 208.100.1.1.
Dec 12 15:28:37 chi01-Fibernetservers-1 hcid[3638]: Got disconnected from the system message bus
Dec 12 15:28:38 chi01-Fibernetservers-1 rpc.statd[3500]: Caught signal 15, un-registering and exiting.
Dec 12 15:28:38 chi01-Fibernetservers-1 auditd[3395]: Error sending signal_info request etc.....
===============================================================

- Do you run Logwatch to keep a tab on all things reported?
I believe I do not have logwatch on. Is this something I might need? Is this a default pack on the OS or how do I install it?

- Do you run anything like Dstat or Collectl? Atsar or SAR? With one of the first two you would have a macro view of the resource usage on the box. And if you want to run 'top' I would choose Atop instead: it can save process stats which you can replay later on.

I try to use Atop, but it look like it is not install
# Atop
-bash: Atop: command not found

- Do you use screen and do you log in over SSH as root or user account user?

Yes, I do use screen to run mult servers and no, have a user acct that I used if needed.

- Keep in mind that Centos only supports the most current dot release. So 5.2 has not had any support since 5.3 came out, and 5.4 is current(two years without an update?). This could be a flaw that is fixed in the later releases. Upgrading from 5.X to 5.X+1 on Centos is generally (read release notes first) a simple yum update away.

I might update to centos 5.x (current) if I can not find what is causing this issue to see if it helps but I am not sure if that will change my current settings around or impact other servers running on this box.

- Check /var/log/sa for sar
I check this but I am not sure how to read this and there is lots of info in this file.

- Have you ran "top" it will show you real time how busy your cpu's are.
This is top on my server when it is good.

]# top
top - 22:39:29 up 3 days, 6:34, 1 user, load average: 1.20, 1.58, 1.63
Tasks: 273 total, 5 running, 268 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.5%us, 5.5%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8090436k total, 5147900k used, 2942536k free, 211696k buffers
Swap: 8385912k total, 0k used, 8385912k free, 1976912k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25805 eco 20 0 405m 307m 12m S 13.7 3.9 26:53.94 srcds_i686
5338 demochi 20 0 387m 273m 12m S 11.7 3.5 600:40.17 srcds_i686
4294 Calvary 20 0 195m 106m 12m S 9.8 1.4 513:05.63 srcds_i686
4988 delta 20 0 574m 486m 12m R 9.8 6.2 496:33.05 srcds_i686
5978 delta 20 0 527m 431m 12m S 9.8 5.5 426:02.02 srcds_i686
20680 delta 20 0 456m 358m 12m R 9.8 4.5 79:00.87 srcds_i686
4524 jhart17 20 0 407m 310m 12m S 7.8 3.9 405:27.87 srcds_i686
4870 delta 20 0 271m 176m 12m R 7.8 2.2 382:54.18 srcds_i686
5714 aquapod 20 0 206m 111m 12m R 7.8 1.4 320:28.07 srcds_i686
25074 jok3r100 20 0 203m 106m 12m S 7.8 1.4 18:28.32 srcds_i686
5592 captthun 20 0 203m 106m 12m S 5.9 1.4 320:18.74 srcds_i686
19 root -51 -5 0 0 0 S 2.0 0.0 6:27.21 sirq-timer/1
58 root -51 -5 0 0 0 S 2.0 0.0 6:50.14 sirq-timer/4
26678 root 20 0 12740 1116 720 R 2.0 0.0 0:00.01 top
1 root 20 0 10352 688 572 S 0.0 0.0 0:04.29 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 posixcputmr/0
5 root -51 -5 0 0 0 S 0.0 0.0 0:00.00 sirq-high/0
6 root -51 -5 0 0 0 S 0.0 0.0 7:43.71 sirq-timer/0
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to troubleshoot configuration errors on dns master server ? Revathi17 Linux - Newbie 1 05-21-2009 07:53 AM
Should I be updating my CentOS 4.6 server to CentOS 5.3? Ujjain Linux - Newbie 3 04-25-2009 09:00 AM
CentOS 4 or CentOS 5 for Production Server shamimzaki Linux - Server 6 10-05-2008 02:35 AM
troubleshoot: my server goes down. Comes back up after reboot twlilinux Linux - Server 4 08-26-2008 07:18 PM
LXer: CentOS Directory Server On CentOS 5.2 LXer Syndicated Linux News 0 08-06-2008 10:20 PM


All times are GMT -5. The time now is 03:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration