Red HatThis forum is for the discussion of Red Hat Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hello guys.
about 4 days ago one of our servers had trouble in rebooting progress.
i captured all message and dmesg logs.
how can in analyze this files?
i also attach the dmesg log file and hope someone can help.
about 4 days ago one of our servers had trouble in rebooting progress.
Please describe the symptoms and what steps were taken to assess the problem at the time in detail.
Quote:
Originally Posted by Bouki
i captured all message and dmesg logs.
how can in analyze this files?
By reading them.
Quote:
Originally Posted by Bouki
i also attach the dmesg log file and hope someone can help.
Because you have not (yet) posted anything you would like us to look at it makes no sense to read it for you.
*If you're worried about the "TCP: Peer unexpectedly shrunk window" line then note this is an informational level message (not debug,err,warn,crit) as the kernel fixed things itself: hence the "(repaired)" part.
thank you so much dear
let me explain what exactly happened:
at first the server freeze and we lost the server ping.
we tried to connect with ssh but we couldn't. then we reboot the server. after about 15 minutes (nothing happened), i pressed the reset button on server.
server powered off and start booting. but again the booting process took so long(about 30 minutes!!!!) i pressed the reset button again but this time server booted normally. it took about 5 or 6 minutes.
this happened exactly about 6 month ago.
would you please explain about TCP: Peer unexpectedly shrunk window error?
i also have the message logs. if you need any thing else just tell me.
let me explain what exactly happened:
at first the server freeze and we lost the server ping.
we tried to connect with ssh but we couldn't.
then we reboot the server.
Sometimes processes may take an unusual amount of system resources. This makes a server unresponsive. Depending on what remote monitoring is available you can decide to act when you see values rising or leave it be and face the consequences. When you decide to reboot a server it would come in handy to access it locally and attach a screen or gain access via any Out of Band methods (IPMI, console server, KVM, etc, etc) to try and see if messages are logged to the console.
Quote:
Originally Posted by Bouki
after about 15 minutes (nothing happened), i pressed the reset button on server.
server powered off and start booting.
but again the booting process took so long(about 30 minutes!!!!)
When you hard reset a server you offer it no chance to close off any processes, finalize writing files and resetting a file systems "dirty" flag. This means (or should mean) that on reboot a file system check could (should) be forced to ensure integrity of the file system. When you have not configured the server beforehand to take care of file system checks in an automated way then if you do not access the server when checking file systems you may not have seen what the cause for the lengthy boot process was. It may have been trying to access resources it could not find anymore, it may have been waiting for an answer or it simply may have been slow checking file systems due to the size of disks.
Quote:
Originally Posted by Bouki
i pressed the reset button again but this time server booted normally.
it took about 5 or 6 minutes.
this happened exactly about 6 month ago.
When you performed a hard reset of the server again you offered it no chance to close off any processes, finalize writing files, finishing file system checks and resetting a file systems "dirty" flag. If you have not investigated the cause of the problem and if you have not verified the integrity of the system after it booted up you have neglected basic admin duties. 6 month ago. So I do hope this is "just" some expendable personal machine without any valuable data on it.
Quote:
Originally Posted by Bouki
would you please explain about TCP: Peer unexpectedly shrunk window error?
Simply put when two networked machines make contact the first time they decide on the maximum amount of data they will be able to send to each other in one transmission. For example network stack specifics and networked devices along the route may influence what the maximum amount of data will be. Sometimes a device exhibits "odd" behaviour and when the Linux kernel encounters that it tries to combat or even out things smoothly. As shown from your log. The only time this is worth investigating AFAIK is when the message returns frequently or when you experience unacceptable network throughput degradation.
Thank you so much dear unSpawn.
my last questions:
1- what is your professional opinion about this case?(are you sure about "TCP: Peer unexpectedly shrunk window" or i should read more logs?)
2- how can prevent this kind of problems?
what is your professional opinion about this case?(are you sure about "TCP: Peer unexpectedly shrunk window" or i should read more logs?)
I am not a professional. Like I said before: the only time you investigating is when that message returns frequently or when you experience unacceptable network throughput degradation.
Quote:
Originally Posted by Bouki
how can prevent this kind of problems?
Which problems? Responding in 6 months time or what?
Also, be sure that you have sosreport installed, systat, and kdump configured. Then call Red Hat when it happens again. Provide a sosreport and a vmcore file.
I am not a professional. Like I said before: the only time you investigating is when that message returns frequently [I]or when you experience
Which problems? Responding in 6 months time or what?
Also, be sure that you have sosreport installed, systat, and kdump configured. Then call Red Hat when it happens again. Provide a sosreport and a vmcore file.
Thank you dear GaWdLy.
i collected the sosreport. is there any way to analyze them by my self?
i dont want send them to the support representative. last time it took about 2 month!!! i should report the problem next weak.
Thanks.
Red Hat has an SLA to meet, so it won't take 2 months to review. As long as you are a premium, or standard subscriber, you should get some info back within a day or three.
Have you had a failed boot incident recently? Sosreports are great, but they are only so good at determining RCA. Especially for boot-time issues. Sosreports are a bare minimum for troubleshooting, but are still nearly impossible get get a clearcut RCA.
BTW, in your dmesg, I see some machine check events logged. That's usually a cpu hardware error, but I don't think that would cause startup issues, per se.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.