LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices


Reply
  Search this Thread
Old 12-30-2014, 02:21 AM   #1
Bouki
LQ Newbie
 
Registered: May 2011
Posts: 24

Rep: Reputation: Disabled
Analyze Boot logs


Hello guys.
about 4 days ago one of our servers had trouble in rebooting progress.
i captured all message and dmesg logs.
how can in analyze this files?
i also attach the dmesg log file and hope someone can help.
Attached Files
File Type: log boot_messages.log (120.9 KB, 42 views)
 
Old 12-30-2014, 04:11 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by Bouki View Post
about 4 days ago one of our servers had trouble in rebooting progress.
Please describe the symptoms and what steps were taken to assess the problem at the time in detail.


Quote:
Originally Posted by Bouki View Post
i captured all message and dmesg logs.
how can in analyze this files?
By reading them.


Quote:
Originally Posted by Bouki View Post
i also attach the dmesg log file and hope someone can help.
Because you have not (yet) posted anything you would like us to look at it makes no sense to read it for you.
*If you're worried about the "TCP: Peer unexpectedly shrunk window" line then note this is an informational level message (not debug,err,warn,crit) as the kernel fixed things itself: hence the "(repaired)" part.
 
1 members found this post helpful.
Old 12-30-2014, 05:07 AM   #3
Bouki
LQ Newbie
 
Registered: May 2011
Posts: 24

Original Poster
Rep: Reputation: Disabled
thank you so much dear
let me explain what exactly happened:
at first the server freeze and we lost the server ping.
we tried to connect with ssh but we couldn't. then we reboot the server. after about 15 minutes (nothing happened), i pressed the reset button on server.
server powered off and start booting. but again the booting process took so long(about 30 minutes!!!!) i pressed the reset button again but this time server booted normally. it took about 5 or 6 minutes.
this happened exactly about 6 month ago.

would you please explain about TCP: Peer unexpectedly shrunk window error?

i also have the message logs. if you need any thing else just tell me.
 
Old 12-30-2014, 06:18 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by Bouki View Post
let me explain what exactly happened:
at first the server freeze and we lost the server ping.
we tried to connect with ssh but we couldn't.
then we reboot the server.
Sometimes processes may take an unusual amount of system resources. This makes a server unresponsive. Depending on what remote monitoring is available you can decide to act when you see values rising or leave it be and face the consequences. When you decide to reboot a server it would come in handy to access it locally and attach a screen or gain access via any Out of Band methods (IPMI, console server, KVM, etc, etc) to try and see if messages are logged to the console.


Quote:
Originally Posted by Bouki View Post
after about 15 minutes (nothing happened), i pressed the reset button on server.
server powered off and start booting.
but again the booting process took so long(about 30 minutes!!!!)
When you hard reset a server you offer it no chance to close off any processes, finalize writing files and resetting a file systems "dirty" flag. This means (or should mean) that on reboot a file system check could (should) be forced to ensure integrity of the file system. When you have not configured the server beforehand to take care of file system checks in an automated way then if you do not access the server when checking file systems you may not have seen what the cause for the lengthy boot process was. It may have been trying to access resources it could not find anymore, it may have been waiting for an answer or it simply may have been slow checking file systems due to the size of disks.


Quote:
Originally Posted by Bouki View Post
i pressed the reset button again but this time server booted normally.
it took about 5 or 6 minutes.
this happened exactly about 6 month ago.
When you performed a hard reset of the server again you offered it no chance to close off any processes, finalize writing files, finishing file system checks and resetting a file systems "dirty" flag. If you have not investigated the cause of the problem and if you have not verified the integrity of the system after it booted up you have neglected basic admin duties. 6 month ago. So I do hope this is "just" some expendable personal machine without any valuable data on it.


Quote:
Originally Posted by Bouki View Post
would you please explain about TCP: Peer unexpectedly shrunk window error?
Simply put when two networked machines make contact the first time they decide on the maximum amount of data they will be able to send to each other in one transmission. For example network stack specifics and networked devices along the route may influence what the maximum amount of data will be. Sometimes a device exhibits "odd" behaviour and when the Linux kernel encounters that it tries to combat or even out things smoothly. As shown from your log. The only time this is worth investigating AFAIK is when the message returns frequently or when you experience unacceptable network throughput degradation.
 
1 members found this post helpful.
Old 12-30-2014, 12:28 PM   #5
Bouki
LQ Newbie
 
Registered: May 2011
Posts: 24

Original Poster
Rep: Reputation: Disabled
Thank you so much dear unSpawn.
my last questions:
1- what is your professional opinion about this case?(are you sure about "TCP: Peer unexpectedly shrunk window" or i should read more logs?)
2- how can prevent this kind of problems?

Thank you again.
 
Old 12-30-2014, 01:34 PM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by Bouki View Post
what is your professional opinion about this case?(are you sure about "TCP: Peer unexpectedly shrunk window" or i should read more logs?)
I am not a professional. Like I said before: the only time you investigating is when that message returns frequently or when you experience unacceptable network throughput degradation.


Quote:
Originally Posted by Bouki View Post
how can prevent this kind of problems?
Which problems? Responding in 6 months time or what?
 
1 members found this post helpful.
Old 12-30-2014, 06:19 PM   #7
GaWdLy
Member
 
Registered: Feb 2013
Location: San Jose, CA
Distribution: RHEL/CentOS/Fedora
Posts: 457

Rep: Reputation: Disabled
Impossible to determine RCA 6 months later...

Also, be sure that you have sosreport installed, systat, and kdump configured. Then call Red Hat when it happens again. Provide a sosreport and a vmcore file.
 
1 members found this post helpful.
Old 12-31-2014, 06:10 AM   #8
Bouki
LQ Newbie
 
Registered: May 2011
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by unSpawn View Post
I am not a professional. Like I said before: the only time you investigating is when that message returns frequently [I]or when you experience
Which problems? Responding in 6 months time or what?
no. server rebooting problems.
 
Old 12-31-2014, 02:37 PM   #9
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Ah, then what GaWdLy said.
 
1 members found this post helpful.
Old 12-31-2014, 11:42 PM   #10
Bouki
LQ Newbie
 
Registered: May 2011
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by GaWdLy View Post
Impossible to determine RCA 6 months later...

Also, be sure that you have sosreport installed, systat, and kdump configured. Then call Red Hat when it happens again. Provide a sosreport and a vmcore file.
Thank you dear GaWdLy.
i collected the sosreport. is there any way to analyze them by my self?
i dont want send them to the support representative. last time it took about 2 month!!! i should report the problem next weak.
Thanks.
 
Old 01-01-2015, 12:58 AM   #11
GaWdLy
Member
 
Registered: Feb 2013
Location: San Jose, CA
Distribution: RHEL/CentOS/Fedora
Posts: 457

Rep: Reputation: Disabled
Red Hat has an SLA to meet, so it won't take 2 months to review. As long as you are a premium, or standard subscriber, you should get some info back within a day or three.

Have you had a failed boot incident recently? Sosreports are great, but they are only so good at determining RCA. Especially for boot-time issues. Sosreports are a bare minimum for troubleshooting, but are still nearly impossible get get a clearcut RCA.
 
1 members found this post helpful.
Old 01-01-2015, 01:00 AM   #12
GaWdLy
Member
 
Registered: Feb 2013
Location: San Jose, CA
Distribution: RHEL/CentOS/Fedora
Posts: 457

Rep: Reputation: Disabled
Bouki, if you put your Sosreport in a secure place where you can share it with me, contact me in PMs and I can look through them real quick.
 
2 members found this post helpful.
Old 01-01-2015, 01:06 AM   #13
GaWdLy
Member
 
Registered: Feb 2013
Location: San Jose, CA
Distribution: RHEL/CentOS/Fedora
Posts: 457

Rep: Reputation: Disabled
BTW, in your dmesg, I see some machine check events logged. That's usually a cpu hardware error, but I don't think that would cause startup issues, per se.

Check /var/log/mcelog for details.
 
1 members found this post helpful.
Old 01-01-2015, 01:38 AM   #14
Bouki
LQ Newbie
 
Registered: May 2011
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by GaWdLy View Post
Bouki, if you put your Sosreport in a secure place where you can share it with me, contact me in PMs and I can look through them real quick.
Thank you GaWdLy.
i will send the Sosreport to you.
i really appreciate your help.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: How to analyze Squid logs with SARG log analyzer on CentOS LXer Syndicated Linux News 0 07-14-2014 11:00 AM
Analyze squid log files for analyze pattern harshaabba Linux - Software 1 10-13-2011 09:21 PM
How to analyze logs? Maarten_Holland Linux - Newbie 4 05-31-2006 03:45 PM
what program do you use to analyze your logs ? ddaas Linux - General 3 03-06-2005 09:21 AM
MYSQL analyze logs mikeshn Programming 0 11-13-2003 03:48 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat

All times are GMT -5. The time now is 07:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration