LinuxQuestions.org - named timeouts, just kill/start helps, can't find reason

Hi All!

I use 2 Dell PE650 servers on 2 independent local network as mail/DNS/DHCP servers.
OS is RedHat Linux release 9 (Shrike),
Linux **** 2.4.20-31.9 #1

bind-9.2.1-16,
dhcp-3.0pl1-23,
sendmail-8.12.8-9.90
mailman-2.1.1-5
imap-2001a-18
ipop3d

It works with one interface, one IP address on it.
60 - 80 users on a server,
60 - 80 client machines on the localnet.

They are working without errors for months/years. But there are times, when for some weeks comes a strange error ( not in one time on the 2 servers...):

connections timing out, named stops serving, I can ssh in only with very long connection time, clients can not get their mails, network freezes.
This time I can't use 'service' command to stop named, so I stop it with kill -9, then I try to restart it. Sometimes have to do repeatedly 2-3 times till named starts to answer normally. There are days when it happens only once, but there are days when it happens 3 - 4 times, half an hour, hours or half days between them.

I tried temporarily a script which kills the named then restarts it from crontab if it is cooked, but found crontab doesn't work well when this error occurs. If I try a
'crontab -l' it can't answer to.

Maybe some resource problem, but where to search ?

I collected datas when it doesn't functioning, but I dont find any reason for it.
If anybody have met error like this, please help, I have no more idea.

Thanks,

Geza

- there is nothing strange in the named or other logs
- load is 0.5 - 5.0 the upper value is very rare

free
total used free shared buffers cached
Mem: 255252 251704 3548 0 69012 129780
-/+ buffers/cache: 52912 202340
Swap: 2104432 42080 2062352

vmstat
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 42080 4784 64616 131696 0 0 0 0 101 16 0 0 100
0 0 0 42080 4796 64632 131692 0 0 0 41 190 37 0 3 97
0 0 0 42080 4796 64632 131692 0 0 0 0 103 18 0 0 100

sometimes cs can go up to 180-200.

how many sockets are in each connection state:
netstat -a -n|grep -E "^(tcp)"| cut -c 68-|sort|uniq -c|sort -n
4 LAST_ACK
16 LISTEN
33 ESTABLISHED