Linux Cluster - Random Node Crash !
i have a peculiar problem with my linux cluster...i have an application that crashes my nodes randomly...the problem is i cannot identify the source of frequent crashes...so let me start by explaining my setup...
i have a 16 node 32 cpu linux cluster with redhat 7.2, which runs an application called lsdyna thru batch software...this application causes the crash of my nodes...initially i had 1 gb of swap for 2gb of ram..i increased that to 2gb....which i thot wud solve the issue...but still it continues crashing...i tried to look up in log files for any signs..but cudnt come up with anything...
is there any way i can find out the reason for the frequent crashes...some commands or some log files..
any advice, suggestion or comment will be highly helpful...
thanks in advance..