Red HatThis forum is for the discussion of Red Hat Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Having a major problem with RedHat AS 3 running on an HP DL760 G2 with 8 CPUs. Every night I do a CPIO file backup to tape. Before the CPIO starts I generally have around 50%+ free memory. 5g out of about 10g is being used. After the CPIO starts a sar -r shows the free memory just getting sucked down until 99% of it is used. When the CPIO finishes free memory is still at 99% used and shortly after that the box starts generating these cannot fork messages. Once I start seeing those cannot fork messages the box becomes unstable. Once it becomes unstable my only recourse is to power cycle the box as I can't run any commands at all. I spoke to HP who contacted RedHat and they first told me to upgrade to the latest SMP kernel so I did and ran a complete up2date with the same results. I was then told I should switch to the hugemem kernel because I had so much ram but that didn't help either. When that didn't work I was told that even though a sar -r showed all the memory as being used it wasn't in fact used and was available to the kernel if it needed it. Now if that were the case how come the box becomes unstable afterwards? Rather than doing a CPIO I also tried a tar to tape with the same results so it almost seems to be IO related. Below is a sar -r output that I captured right before the CPIO that I ran at 12:35. You can see as soon as the CPIO runs the free memory just gets sucked away. I've been working on this problem since Dec with HP and RedHat and we are still no where. Anyone else have any ideas?
Problem with DL760 after CPIO - did you find a resolution?
Hi - did you ever find a resolution to this?
Quote:
Originally Posted by sgamble2
Having a major problem with RedHat AS 3 running on an HP DL760 G2 with 8 CPUs. Every night I do a CPIO file backup to tape. Before the CPIO starts I generally have around 50%+ free memory. 5g out of about 10g is being used. After the CPIO starts a sar -r shows the free memory just getting sucked down until 99% of it is used. When the CPIO finishes free memory is still at 99% used and shortly after that the box starts generating these cannot fork messages. Once I start seeing those cannot fork messages the box becomes unstable. Once it becomes unstable my only recourse is to power cycle the box as I can't run any commands at all. I spoke to HP who contacted RedHat and they first told me to upgrade to the latest SMP kernel so I did and ran a complete up2date with the same results. I was then told I should switch to the hugemem kernel because I had so much ram but that didn't help either. When that didn't work I was told that even though a sar -r showed all the memory as being used it wasn't in fact used and was available to the kernel if it needed it. Now if that were the case how come the box becomes unstable afterwards? Rather than doing a CPIO I also tried a tar to tape with the same results so it almost seems to be IO related. Below is a sar -r output that I captured right before the CPIO that I ran at 12:35. You can see as soon as the CPIO runs the free memory just gets sucked away. I've been working on this problem since Dec with HP and RedHat and we are still no where. Anyone else have any ideas?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.