Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
04-17-2011, 11:55 AM
|
#1
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Rep:
|
Server unresponsive, dont see anything in logs
I had a server that wasnt pingable, and the console was just black so had to be hard powered down.
Looking in the messages files, these were the last three things, 1154am was when i brought it back up.
Apr 17 01:04:26 servername ntpd[25230]: kernel time discipline status change 1
Apr 17 04:02:04 servername syslogd 1.4.1: restart.
Apr 17 11:54:46 servername syslogd 1.4.1: restart.
should i be looking somewhere else to figure out what happened?
|
|
|
04-18-2011, 02:49 AM
|
#2
|
Moderator
Registered: May 2001
Posts: 29,415
|
// Since you have posted many threads by now regarding one or more problematic servers it would be good to add references if this applies to one of those mentioned before IMO.
That said your log looks too clean unless the machine saw no activity at all and no subsystems and services log to syslog or syslog was dead for some time. If the syslog restarts correspond with any log rotation (check /etc/cron.*, /var/log/cron, /etc/logrotate.d?, /var/lib/logrotate.status) do check the /var/log/[logfilename].1(.gz?) in addition to /var/log/[logfilename]. A quick way to get an overview and find any anomalies could be to run Logwatch with the --archives and --range Yesterday on all logs. Do check 'lastlog; lastb; last' for any logins just in case. Apart from /var/log/{kernel,messages,secure,cron,maillog} whatever logs you check depends on what services the machine provides. Be sure to reply verbosely if you find anything "interesting".
|
|
|
04-18-2011, 06:28 AM
|
#3
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
hey unSpawn, it is one of those servers, it just has samba shares pushing 2TB of data.
The server is down now running file system checks, so i'll check those logs back and see if i can find anything "interesting" when its back up.
|
|
|
04-19-2011, 04:52 AM
|
#4
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
The server is finally back up, going to look at the logs now.
I do notice when looking in /proc/mdstat, it shows the following. I'm guessing when my system went down hard, something got messed up, but I dont understand what exactly its doing by resyncing. All my data on md0 and md1 appears to be there. Does anyone know what its doing exactly?
Personalities : [raid1]
read_ahead 1024 sectors
Event: 2
md0 : active raid1 sdb1[1] sda1[0]
859549696 blocks [2/2] [UU]
resync=DELAYED
md1 : active raid1 sdb2[1] sda2[0]
859533632 blocks [2/2] [UU]
[>....................] resync = 2.1% (18808576/859533632) finish=1404.8min speed=9971K/sec
unused devices: <none>
---------- Post added 04-19-11 at 05:53 AM ----------
Also, what happens if people start adding data to the server before these resyncings finish? will that cause problems?
|
|
|
04-19-2011, 09:30 AM
|
#5
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
looking through my cron log, i see activity (i have a script that runs every minute) that leads me to believe the server was up till Apr 17 04:29:01, as thats the last event till after i brought it back up at 11:54. nothing in the samba log other than samba restarting around 11:54, nothing in secure, dont have a kernel log to look in, maillog doesn't show anything useful from what i can tell.
|
|
|
04-19-2011, 05:48 PM
|
#6
|
Moderator
Registered: May 2001
Posts: 29,415
|
As you haven't confirmed cron schedule, log rotation, existence of and contents of logrotated logs, anything Logwatch could have reported, auth database records, nor any nfo wrt services the machine provides there is literally nothing for me to diagnose or help you with. Bummer. I suggest you set up watches like atop process logging, syslog markers to at least get a "ping" of it being alive and maybe run Monit or Nagios to keep a watch on the machine and or services.
|
|
|
04-20-2011, 09:52 AM
|
#7
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
The only things cron'd on that server that would have ran around that time is the task i mentioned in that log post. I checked all the cron files as you suggested. Doesnt the log rotate just rotate the log files within /var/log, which i looked through them all already in my previous post.
As I stated in my first reply to you when you asked what server, this server is just a samba server, which also has mail to send some stuff out on occasion. Guess you were looking for more specifics as to what all I checked? Also, I've never used logwatch before so not sure how to go about doing that. But I did look through every log in /var/log manually for any "interesting" stuff as you first suggested. I was the last person to log onto the server before it happened, and all I did was a df on it.
When the server was just at a black screen, it wouldn't respond to keystrokes at the console, to SSH connections, or to pings. I forgot to mention SSH in my OP.
I guess I'm not understanding what info I haven't provided, other than whatever logwatch can do. I still only use linux minimally, i'm not trying to be difficult, sorry if this is bumming you out. It really just looks like "something" died and the server stopped logging events, at least in the places I know of to check.
I'm really not trying to be a pain here, which based on your last response i'm guessing I am.
|
|
|
04-20-2011, 09:54 AM
|
#8
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
I'll just open another thread to see if someone can help me understand that /proc/mdstat stuff, as I dont get what its actually resync'ing, dont feel like you need to reply any more unspawn, you've already gone above and beyond.
|
|
|
04-20-2011, 04:32 PM
|
#9
|
Moderator
Registered: May 2001
Posts: 29,415
|
No, that's not it, and I missed your comment about the basic marks cron leaves for the per-minute cronjob, sorry. Problem is Linux doesn't come with extensive auditing features enabled out of the box to enable you to troubleshoot less common problems easily. If the machine is comparable with another one in terms of HW / distro / release / userland SW then you could diff them and see if there's any clue in that else you may need additional tools to help you troubleshoot the issue. Watching service processes and logging connectivity from and towards the machine could be a first step.
|
|
|
04-20-2011, 04:52 PM
|
#10
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
Yeah, its weird, its like the system just stopped doing everything, at least from what i can tell. its an older server, was just hoping there might be something somewhere that might lead me to believe if its hardware failure or what. I appreciate all the help with this.
|
|
|
04-24-2011, 10:29 AM
|
#11
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
unspawn, it just happened again, 2nd week in a row. can you give me some advice on how to check that other stuff i dont know. this is way over my head at this point. :'(
|
|
|
04-25-2011, 08:03 AM
|
#12
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
OK, looking in messages and messages.1, this is the last stuff that shows up before I brought the server back up shortly after noon. This time looking in my cron log though, which logs an event every minute, the last event is at 9:02, then at 12:09:30 which is from when cron started. So I guess this time it didn't actually die till after 9:02, which is different then last time based on the cron logs.
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: [2011/04/24 04:02:04, 0] nmbd/nmbd.c  rocess(542)
Apr 24 04:02:04 ServerNameReplaced syslogd 1.4.1: restart.
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: Got SIGHUP dumping debug info.
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: [2011/04/24 04:02:04, 0] nmbd/nmbd_workgroupdb.c:dump_workgroups(284)
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: dump_workgroups()
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: dump workgroup on subnet 10.1.2.34: netmask= 255.255.254.0:
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: US(1) current master browser = UNKNOWN
Apr 24 04:02:04 ServerNameReplaced nmbd[2212]: ServerNameReplaced 40009a03 (Samba_photo1)
Apr 24 12:09:31 ServerNameReplaced syslogd 1.4.1: restart.
|
|
|
04-25-2011, 12:47 PM
|
#13
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
ok, think i figured out what you meant in post #2. if i look at /etc/crontab, the cron.daily runs at 4:02 every day, and within cron.daily is a logrotate. I dont see anything unusual in the cron log as i mentioned in my last post, other than the timing is later than it was last sunday.
Does anyone have any ideas where to go next with this? I'm totally at a loss.
|
|
|
04-25-2011, 01:21 PM
|
#14
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
and it just happened again, guess it isnt an every sunday thing then...
|
|
|
04-26-2011, 05:07 AM
|
#15
|
Senior Member
Registered: Jun 2009
Posts: 1,795
Original Poster
Rep:
|
ok, actually found something out, i was able to look at the console as it was in the middle of going down, it keeps repeating these messages on the screen
aacraid: Host adapter reset request
percraid: Host adapter dead -3
|
|
|
All times are GMT -5. The time now is 10:22 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|