LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Tutorial for understanding TOP Command. (https://www.linuxquestions.org/questions/linux-general-1/tutorial-for-understanding-top-command-822366/)

Raghu140 07-27-2010 01:20 AM

Tutorial for understanding TOP Command.
 
Dear All,

Can anyone suggest me any good tutorial for understanding the top. I have searched on net but still not sure that i understand it. Moreover what are other commands to analyse memory usgae at runtime so that we can detect memory leak problem. I am facing some very serious memory related issues and not able to detect the excat reason for it.

Please help.

Regards,
Raghu

Aquarius_Girl 07-27-2010 01:24 AM

Quote:

Originally Posted by Raghu140 (Post 4046516)
Can anyone suggest me any good tutorial for understanding the top. I have searched on net but still not sure that i understand it.

Check out the following:

http://tldp.org/LDP/sag/html/system-resources.html
and
http://www.thegeekstuff.com/2010/01/...mand-examples/

Aquarius_Girl 07-27-2010 01:37 AM

Quote:

Originally Posted by Raghu140 (Post 4046516)
Moreover what are other commands to analyse memory usgae at runtime so that we can detect memory leak problem. I am facing some very serious memory related issues and not able to detect the excat reason for it.

http://www.faqs.org/docs/Linux-HOWTO...ind-HOWTO.html
and
http://www.cyberciti.biz/faq/linux-check-memory-usage/
and
http://www.ibm.com/developerworks/li...brary/l-debug/

salasi 07-27-2010 06:09 AM

Quote:

Originally Posted by Raghu140 (Post 4046516)
Can anyone suggest me any good tutorial for understanding the top.

The top? You mean not atop, htop, etc, etc. As no one has mentioned 'man top', I have to. You have probably read it, but you start there first.

Quote:

Originally Posted by Raghu140 (Post 4046516)
Moreover what are other commands to analyse memory usgae at runtime so that we can detect memory leak problem. I am facing some very serious memory related issues and not able to detect the excat reason for it.

Again, I'll have to mention vmstat, even though I suspect that you want something different.

I think you probably should have read this before posting, and then should be thinking about how to use the tools which don't directly do what exactly you want, to get useful information.

johnsfine 07-27-2010 07:50 AM

Quote:

Originally Posted by Raghu140 (Post 4046516)
Can anyone suggest me any good tutorial for understanding the top.

Surprising with all the links people posted in this thread, no one posted this one:

http://www.linuxatemyram.com/

Quote:

analyse memory usgae at runtime so that we can detect memory leak problem. I am facing some very serious memory related issues
First read the above link. Most people who think they are seeing the symptoms of a serious memory leak are really just misinterpreting normal behavior of Linux. That link might help you understand whether the "serious memory related issues" you think you have are real.

If the problem is real, there are lots of tools for digging into the details. But none of that is simple. If you post the info that makes you believe you have a memory problem, that may make it easier for us to tell you specific tools and/or documentation to understand the problem.

Aquarius_Girl 07-27-2010 07:53 AM

Quote:

Originally Posted by johnsfine (Post 4046802)
Surprising with all the links people posted in this thread, no one posted this one:

http://www.linuxatemyram.com/

That was a nice link. Thanks

Raghu140 07-29-2010 04:49 AM

All of sudden i get this error message in the var log:

Jul 26 20:01:02 localhost kernel: mercd_write: unable to allocate memory 16128
Jul 26 20:01:02 localhost kernel: mercd_write: Unmatching Message Class 16128 and 52 35
Jul 26 20:01:02 localhost kernel: mercd_write: Current Message Class 0xfc0 Id 0x1
Jul 26 20:01:02 localhost kernel: mercd_write: Unmatching Message Class 16128 and 52 35

The system runs fine for 3 days. On the 3rd day when the load is at its peak it gives the above error. I took all kinds of logs at this particular time:-

top - 20:01:18 up 5 days, 10:10, 9 users, load average: 0.13, 0.03, 0.01
Tasks: 128 total, 1 running, 127 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1% us, 0.2% sy, 0.0% ni, 99.1% id, 0.6% wa, 0.0% hi, 0.0% si
Mem: 4151264k total, 4130204k used, 21060k free, 138644k buffers
Swap: 6144820k total, 4k used, 6144816k free, 3756412k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25537 root 16 0 197m 27m 4548 S 1 0.7 10:58.00 SdpMedia
25413 root 16 0 239m 46m 6200 S 0 1.2 3:45.55 MGCv1.1
25773 root 15 0 4536 1728 1140 S 0 0.0 1:44.56 SDP_IVR
23695 root 16 0 16048 14m 14m S 0 0.4 0:08.20 rsi_lnk
3633 root 15 0 0 0 0 S 0 0.0 0:28.33 smbiod
25762 root 16 0 4416 1524 1272 S 0 0.0 0:30.97 DB_MR
1 root 16 0 2880 544 468 S 0 0.0 0:02.33 init


Date&Time: 2010-07-26-20:01
total used free shared buffers cached
Mem: 4053 4023 30 0 132 3653
-/+ buffers/cache: 237 3816
Swap: 6000 0 6000

My appliaction are SdpMedia, MGCv1.1, SDP_IVR and DB_MR.

Disk space is also ok. The application shows normal memory usage. I am not able to find the problem. The system hangs after some time.

Please help.

johnsfine 07-29-2010 05:52 AM

Quote:

Originally Posted by Raghu140 (Post 4048809)
localhost kernel: mercd_write: unable to allocate memory 16128

I don't know what that means. I can only explain why it doesn't mean a shortage of physical memory.

Quote:

I took all kinds of logs at this particular time:-
I assume you mean right after the failure.

There is very little free memory, so either the failing process didn't fully abort or it wasn't using much anonymous memory.

There was very little swap space used, which tends to indicate there was no recent memory pressure.

There are very high buffer and cache levels, also tending to indicate no recent memory pressure.

Top and similar tools tell you about CPU use, which is irrelevant to your problem, and about memory use which also seems to be irrelevant to the problem. So I think you need to be looking elsewhere.

Raghu140 07-29-2010 06:04 AM

My Problem is that i am not able to locate the problem. Binary seems to be doing fine both in terms of memory and CPU. But the telephony card driver (mercd) is saying its unable to find the memory. Free says that enough memory is availabe. Var log messgae does not say any other thing. I am lost. How to crack this problem.

johnsfine 07-29-2010 06:57 AM

Quote:

Originally Posted by Raghu140 (Post 4048874)
the telephony card driver (mercd) is saying its unable to find the memory. Free says that enough memory is availabe.

That error message is more likely to mean the driver is either unable to allocate kernel virtual memory or unable to allocate a particular range of low memory for some form of DMA. It does not mean a shortage of the kind of memory reported by free.

Is this a 32 bit or 64 bit system and what distribution and version of Linux is it?

In 32 bit Linux, you might be legitimately exceeding the 1GB limit on kernel virtual memory. There might also be some small resource leak in the mercd or other driver that quickly exhausts the 1GB virtual space before it even becomes obvious that there is a resource leak.

If you were using 64 bit Linux, the kernel virtual memory is nearly unlimited (you'll run out of something else before you run out of kernel virtual memory). So if it is legitimately using over 1GB of kernel virtual, 64 bit would just work. If it is leaking a kernel resource, 64 bit would delay the crash almost indefinitely and certainly long enough to make the leak obvious.

If it is really impractical to switch to 64 bit but you think kernel virtual memory is the problem, you can probably build a new 32 bit kernel with using the option that gives the kernel 2GB virtual.

One of your other posts said RHEL4. RHEL4 had a 32 bit kernel option for 4GB kernel virtual. If you have that kernel, then I'm pretty sure you're not exhausting kernel virtual memory. But that option is an ugly kludge and likely to trigger driver bugs that wouldn't occur in other kernels. So if you have that RHEL4 kernel, it would be better to switch to something else.

Unfortunately, I don't know which tools you would use to investigate the status of kernel virtual memory. The following command gives a lot of info about kernel memory use (you should post its results) but I'm not sure it covers enough uses of kernel virtual memory to see a driver problem with kernel virtual memory.
Code:

cat /proc/slabinfo

Aquarius_Girl 07-29-2010 07:03 AM

Quote:

Originally Posted by johnsfine (Post 4048929)
Unfortunately, I don't know which tools you would use to investigate the status of kernel virtual memory.

Will this be of some help to him ?
http://docs.sun.com/app/docs/doc/816...=en&n=1&a=view

johnsfine 07-29-2010 07:07 AM

Quote:

Originally Posted by anishakaul (Post 4048941)

I don't think so. That seems to report something about "kernel threads" and a lot about non kernel virtual and physical memory. But it doesn't seem to report anything about kernel virtual memory.

Kernel virtual memory is just my wild guess at where the problem might be. I having nothing to support that.

But non kernel virtual and physical memory is the topic already investigated well earlier in this thread and pretty much ruled out as a relevant factor.

Aquarius_Girl 07-29-2010 07:09 AM

Quote:

Originally Posted by johnsfine (Post 4048948)
I don't think so. That seems to report something about "kernel threads" and a lot about non kernel virtual memory. But it doesn't seem to report anything about kernel virtual memory.

Thanks for looking!

Raghu140 07-29-2010 05:31 PM

I am using RHEL 4 update 5.
uname -a:
Linux localhost.localdomain 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 i686 i386 GNU/Linux.
It's a 32 bit system.
vmstat:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 208 15956 20116 3886644 0 0 4 28 31 7 1 2 97 1

cat /proc/slabinfo:
slabinfo - version: 2.0
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchco> : slabdata <active_slabs> <num_slabs> <sharedavail>
smb_request 18 30 256 15 1 : tunables 120 60 8 : slabdata 2 2
smb_inode_cache 480 517 368 11 1 : tunables 54 27 8 : slabdata 47 47
rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 : slabdata 4 4
rpc_tasks 8 20 192 20 1 : tunables 120 60 8 : slabdata 1 1
rpc_inode_cache 6 7 512 7 1 : tunables 54 27 8 : slabdata 1 1
msi_cache 3 3 3840 1 1 : tunables 24 12 8 : slabdata 3 3
fib6_nodes 5 119 32 119 1 : tunables 120 60 8 : slabdata 1 1
ip6_dst_cache 4 15 256 15 1 : tunables 120 60 8 : slabdata 1 1
ndisc_cache 1 20 192 20 1 : tunables 120 60 8 : slabdata 1 1
rawv6_sock 6 11 704 11 2 : tunables 54 27 8 : slabdata 1 1
udpv6_sock 0 0 704 11 2 : tunables 54 27 8 : slabdata 0 0
tcpv6_sock 9 15 1216 3 1 : tunables 24 12 8 : slabdata 5 5
ip_fib_alias 10 226 16 226 1 : tunables 120 60 8 : slabdata 1 1
ip_fib_hash 10 119 32 119 1 : tunables 120 60 8 : slabdata 1 1
dm_tio 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0
dm_io 0 0 20 185 1 : tunables 120 60 8 : slabdata 0 0
dm-bvec-(256) 0 0 3072 2 2 : tunables 24 12 8 : slabdata 0 0
dm-bvec-128 0 0 1536 5 2 : tunables 24 12 8 : slabdata 0 0
dm-bvec-64 0 0 768 5 1 : tunables 54 27 8 : slabdata 0 0
dm-bvec-16 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0
dm-bvec-4 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0
dm-bvec-1 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0
dm-bio 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
ext3_inode_cache 4528 6090 552 7 1 : tunables 54 27 8 : slabdata 870 870
ext3_xattr 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0
journal_handle 101 135 28 135 1 : tunables 120 60 8 : slabdata 1 1
journal_head 628 1458 48 81 1 : tunables 120 60 8 : slabdata 18 18
revoke_table 12 290 12 290 1 : tunables 120 60 8 : slabdata 1 1
revoke_record 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0
scsi_cmd_cache 101 110 384 10 1 : tunables 54 27 8 : slabdata 11 11
uhci_urb_priv 0 0 44 88 1 : tunables 120 60 8 : slabdata 0 0
sgpool-128 32 33 2560 3 2 : tunables 24 12 8 : slabdata 11 11
sgpool-64 32 33 1280 3 1 : tunables 24 12 8 : slabdata 11 11
sgpool-32 35 36 640 6 1 : tunables 54 27 8 : slabdata 6 6
sgpool-16 36 36 320 12 1 : tunables 54 27 8 : slabdata 3 3
sgpool-8 177 180 192 20 1 : tunables 120 60 8 : slabdata 9 9
unix_sock 64 112 512 7 1 : tunables 54 27 8 : slabdata 16 16
ip_mrt_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
tcp_bind_bucket 63 226 16 226 1 : tunables 120 60 8 : slabdata 1 1
tcp_open_request 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
inet_peer_cache 2 61 64 61 1 : tunables 120 60 8 : slabdata 1 1
secpath_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0
ip_dst_cache 18 60 256 15 1 : tunables 120 60 8 : slabdata 4 4
arp_cache 4 20 192 20 1 : tunables 120 60 8 : slabdata 1 1
raw_sock 5 7 576 7 1 : tunables 54 27 8 : slabdata 1 1
udp_sock 20 28 576 7 1 : tunables 54 27 8 : slabdata 4 4
tcp_sock 97 98 1152 7 2 : tunables 24 12 8 : slabdata 14 14
flow_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
mqueue_inode_cache 1 7 576 7 1 : tunables 54 27 8 : slabdata 1 1
relayfs_inode_cache 0 0 348 11 1 : tunables 54 27 8 : slabdata 0
isofs_inode_cache 0 0 372 10 1 : tunables 54 27 8 : slabdata 0 0
hugetlbfs_inode_cache 1 11 344 11 1 : tunables 54 27 8 : slabdata 1
ext2_inode_cache 0 0 488 8 1 : tunables 54 27 8 : slabdata 0 0
ext2_xattr 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0
dquot 0 0 144 27 1 : tunables 120 60 8 : slabdata 0 0
eventpoll_pwq 1 107 36 107 1 : tunables 120 60 8 : slabdata 1 1
eventpoll_epi 1 31 128 31 1 : tunables 120 60 8 : slabdata 1 1
kioctx 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0
kiocb 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0
dnotify_cache 2 185 20 185 1 : tunables 120 60 8 : slabdata 1 1
fasync_cache 1 226 16 226 1 : tunables 120 60 8 : slabdata 1 1
shmem_inode_cache 338 351 444 9 1 : tunables 54 27 8 : slabdata 39 39
posix_timers_cache 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0
uid_cache 5 61 64 61 1 : tunables 120 60 8 : slabdata 1 1
cfq_pool 107 119 32 119 1 : tunables 120 60 8 : slabdata 1 1
crq_pool 192 192 40 96 1 : tunables 120 60 8 : slabdata 2 2
deadline_drq 0 0 52 75 1 : tunables 120 60 8 : slabdata 0 0
as_arq 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0
blkdev_ioc 49 185 20 185 1 : tunables 120 60 8 : slabdata 1 1
blkdev_queue 20 32 488 8 1 : tunables 54 27 8 : slabdata 4 4
blkdev_requests 200 200 160 25 1 : tunables 120 60 8 : slabdata 8 8
biovec-(256) 256 256 3072 2 2 : tunables 24 12 8 : slabdata 128 128
biovec-128 256 260 1536 5 2 : tunables 24 12 8 : slabdata 52 52
biovec-64 256 260 768 5 1 : tunables 54 27 8 : slabdata 52 52
biovec-16 256 260 192 20 1 : tunables 120 60 8 : slabdata 13 13
biovec-4 256 305 64 61 1 : tunables 120 60 8 : slabdata 5 5
biovec-1 462 904 16 226 1 : tunables 120 60 8 : slabdata 4 4
bio 440 527 128 31 1 : tunables 120 60 8 : slabdata 17 17
file_lock_cache 8 82 96 41 1 : tunables 120 60 8 : slabdata 2 2
sock_inode_cache 202 234 448 9 1 : tunables 54 27 8 : slabdata 26 26
skbuff_head_cache 538 920 192 20 1 : tunables 120 60 8 : slabdata 46 46
sock 11 30 384 10 1 : tunables 54 27 8 : slabdata 3 3
proc_inode_cache 769 1133 360 11 1 : tunables 54 27 8 : slabdata 103 103
sigqueue 9 54 148 27 1 : tunables 120 60 8 : slabdata 2 2
radix_tree_node 21294 24388 276 14 1 : tunables 54 27 8 : slabdata 1742 1742
bdev_cache 40 42 512 7 1 : tunables 54 27 8 : slabdata 6 6
mnt_cache 34 62 128 31 1 : tunables 120 60 8 : slabdata 2 2
audit_watch_cache 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0
inode_cache 845 1199 344 11 1 : tunables 54 27 8 : slabdata 109 109
dentry_cache 5149 20332 152 26 1 : tunables 120 60 8 : slabdata 782 782
filp 1763 2080 192 20 1 : tunables 120 60 8 : slabdata 104 104
names_cache 47 47 4096 1 1 : tunables 24 12 8 : slabdata 47 47
avc_node 12 600 52 75 1 : tunables 120 60 8 : slabdata 8 8
key_jar 10 31 128 31 1 : tunables 120 60 8 : slabdata 1 1
idr_layer_cache 84 116 136 29 1 : tunables 120 60 8 : slabdata 4 4
buffer_head 703043 902925 52 75 1 : tunables 120 60 8 : slabdata 12039 12039
mm_struct 90 231 704 11 2 : tunables 54 27 8 : slabdata 21 21
vm_area_struct 3969 4815 88 45 1 : tunables 120 60 8 : slabdata 107 107
fs_cache 92 427 64 61 1 : tunables 120 60 8 : slabdata 7 7
files_cache 93 225 448 9 1 : tunables 54 27 8 : slabdata 25 25
signal_cache 145 440 192 20 1 : tunables 120 60 8 : slabdata 22 22
sighand_cache 155 183 1344 3 1 : tunables 24 12 8 : slabdata 61 61
task_struct 309 330 1408 5 2 : tunables 24 12 8 : slabdata 66 66
anon_vma 1513 2034 16 226 1 : tunables 120 60 8 : slabdata 9 9
pgd 90 476 32 119 1 : tunables 120 60 8 : slabdata 4 4
pmd 278 296 4096 1 1 : tunables 24 12 8 : slabdata 278 296
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0
size-131072 1 1 131072 1 32 : tunables 8 4 0 : slabdata 1 1
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0
size-65536 3 3 65536 1 16 : tunables 8 4 0 : slabdata 3 3
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0
size-32768 7 7 32768 1 8 : tunables 8 4 0 : slabdata 7 7
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0
size-16384 12 12 16384 1 4 : tunables 8 4 0 : slabdata 12 12
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0
size-8192 24 27 8192 1 2 : tunables 8 4 0 : slabdata 24 27
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0
size-4096 776 776 4096 1 1 : tunables 24 12 8 : slabdata 776 776
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0
size-2048 140 140 2048 2 1 : tun
size-1620(DMA) 0 0 1664 4 2 : tun
size-1620 32 32 1664 4 2 : tun
size-1024(DMA) 0 0 1024 4 1 : tun
size-1024 331 356 1024 4 1 : tun
size-512(DMA) 0 0 512 8 1 : tun
size-512 910 2472 512 8 1 : tun
size-256(DMA) 0 0 256 15 1 : tun
size-256 708 1920 256 15 1 : tun
size-128(DMA) 0 0 128 31 1 : tun
size-128 2048 4712 128 31 1 : tun
size-64(DMA) 0 0 64 61 1 : tun
size-64 7998 11468 64 61 1 : tun
size-32(DMA) 0 0 32 119 1 : tun
size-32 13668 18921 32 119 1 : tun
kmem_cache 165 165 256 15 1 : tun

The problem ocuurs after three days of continuous running. I have also observed that the problem occurance time is also same. When suddenly load increases and the telephony card plays the wave file on high load (happens on the third day from restart) then it faces the mentioned problem. when problem starts, it fails to play some request and it also successfully plays some requests. Then number of failures cases keeps on increasing. Finally system hangs and kernel panic error message is generated by the system. Then we do the restart and it works fine for next two days and the problem repeates on third day when suddenly load increases. As per card capacity call peak load is only 30-50% which is ok. if we restart the card driver only (without system restart) then also it the system runs fine for a day.

How to see that what is total size of kernel memory availabe?

How to increase the kernel memory size?

Shall migrating to higher version of RHEL (5.3 etc) will help?

Aquarius_Girl 07-30-2010 12:23 AM

Raghu,

Kindly put you codes in code tags, it will be easier for others to read:
http://www.linuxquestions.org/questi...do=bbcode#code

johnsfine 07-30-2010 05:57 AM

Quote:

Originally Posted by Raghu140 (Post 4049533)
cat /proc/slabinfo:

A few things in there were larger than I expected, but nothing was extreme.

When was that run? You seem to have an idea of when the failure is approaching and looking at the slabinfo when the failure is about to happen or starting to happen would be more informative than when the system is healthy.

Quote:

How to see that what is total size of kernel memory availabe?
I'm not sure. I did a web search for info about the file /proc/kcore and everything I found says that file represents a binary image of physical memory. But when I look at that file on various systems, it seems to represent a binary image of kernel virtual memory, so its size is the limit of the size of kernel virtual memory.

So what is the output of
ls -l /proc/kcore

Quote:

How to increase the kernel memory size?
It is a build time option when you recompile the kernel. Do you know how to recompile a kernel?

Quote:

Shall migrating to higher version of RHEL (5.3 etc) will help?
I know nothing about your system and the applications you run. I especially know nothing about the mercd driver that seems to be at the center of your problem.

Do you pay for support for this RHEL system? If you do, you should be asking Red Hat for some support. If you don't you ought to be using Centos instead of RHEL.

Maybe Switching to RHEL or Centos version 5 would help. Maybe what you're seeing is an old bug that was fixed long ago in RHEL itself or in the mercd driver. I don't know any of that stuff.

Is your hardware 64 bit capable? Do you have a good reason for running 32 bit RHEL rather than 64 bit? I think switching to 64 bit is more likely to fix the problem than switching just to version 5.

Raghu140 07-31-2010 04:55 PM

I have collected the vital stats when problem occured. Please have a look at it. You may able to deduce something out of it:-

Code:

freem
     
              total      used      free    shared    buffers    cached
Mem:          4053      4037        16          0        158      3670
-/+ buffers/cache:        209      3844
Swap:        6000          0      6000

*************************************************************************************
df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb6              20G  619M  18G  4% /
/dev/sdb1              99M  12M  82M  13% /boot
none                  2.0G    0  2.0G  0% /dev/shm
/dev/sdb2              62G  16G  43G  27% /home
/dev/sdb7            9.7G  68M  9.1G  1% /opt
/dev/sdb5              20G  3.0G  16G  17% /usr
/dev/sdb3              20G  268M  18G  2% /var

************************************************************************************
ifconfig

eth2      Link encap:Ethernet  HWaddr 00:30:64:08:C1:A6
          inet addr:10.100.108.69  Bcast:10.100.108.95  Mask:255.255.255.224
          inet6 addr: fe80::230:64ff:fe08:c1a6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1942724 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2163264 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:196009128 (186.9 MiB)  TX bytes:637743127 (608.1 MiB)
          Base address:0xb880 Memory:fda80000-fdaa0000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:5220447 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5220447 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1693155991 (1.5 GiB)  TX bytes:1693155991 (1.5 GiB)
*************************************************************************************

route -n

Kernel IP routing table
Destination    Gateway        Genmask        Flags Metric Ref    Use Iface
10.100.108.64  0.0.0.0        255.255.255.224 U    0      0        0 eth2
169.254.0.0    0.0.0.0        255.255.0.0    U    0      0        0 eth2
0.0.0.0        10.100.108.65  0.0.0.0        UG    0      0        0 eth2
************************************************************************************

iptables -L -n -v

Chain INPUT (policy ACCEPT 10 packets, 2696 bytes)
 pkts bytes target    prot opt in    out    source              destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target    prot opt in    out    source              destination

Chain OUTPUT (policy ACCEPT 10 packets, 2696 bytes)
 pkts bytes target    prot opt in    out    source              destination

**************************************************************************************
cat /proc/cpuinfo

processor      : 0
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 0
siblings        : 2
core id        : 0
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dtss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4269.90

processor      : 1
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 0
siblings        : 2
core id        : 1
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dtss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4266.07

processor      : 2
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 3
siblings        : 2
core id        : 6
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dtss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4266.13
processor      : 3
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 3
siblings        : 2
core id        : 7
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dtss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4266.14

**************************************************************************************

cat /proc/meminfo

MemTotal:      4151264 kB
MemFree:        16284 kB
Buffers:        161828 kB
Cached:        3757932 kB
SwapCached:          0 kB
Active:        3316776 kB
Inactive:      719684 kB
HighTotal:    3276544 kB
HighFree:        1024 kB
LowTotal:      874720 kB
LowFree:        15260 kB
SwapTotal:    6144820 kB
SwapFree:      6144820 kB
Dirty:            7944 kB
Writeback:          0 kB
Mapped:        143432 kB
Slab:            74596 kB
CommitLimit:  8220452 kB
Committed_AS:  1914288 kB
PageTables:      3556 kB
VmallocTotal:  106488 kB
VmallocUsed:      6420 kB
VmallocChunk:    99316 kB
HugePages_Total:    0
HugePages_Free:      0
Hugepagesize:    2048 kB

*********************************************************************************************************************

cat /proc/net/dev

Inter-|  Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:1702919443 5247802    0    0    0    0          0        0 1702919443 5247802    0    0    0    0      0          0
  eth0:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0
  eth1:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0
  eth2:196945288 1952838    0    0    0    0          0        0 647852030 2177928    0    0    0    0      0          0
  eth3:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0
  sit0:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0

*********************************************************************************************************************

cat /proc/interrupts

          CPU0      CPU1      CPU2      CPU3
  0:  32714438  32721191  32714201  32713570    IO-APIC-edge  timer
  1:          2          3          3          1    IO-APIC-edge  i8042
  8:          0          1          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0  IO-APIC-level  acpi
 12:        17        12        12        17    IO-APIC-edge  i8042
 14:          0          0          0          0    IO-APIC-edge  libata
 15:        25          3          5          4    IO-APIC-edge  ide1
169:          0          0          0          0  IO-APIC-level  uhci_hcd, uhci_hcd
177:          0          0          0          0  IO-APIC-level  uhci_hcd
185:      1822    2524227    2846000      5483  IO-APIC-level  ehci_hcd, uhci_hcd, mercdintr
193:    455854      75408      3196    525536  IO-APIC-level  ioc0
201:    109601      20956      6495    124159  IO-APIC-level  ehci_hcd, uhci_hcd
209:      11773      2448      2324      12701  IO-APIC-level  uhci_hcd, mercdintr
233:    2982459          0          0          0        PCI-MSI  eth2
NMI:          0          0          0          0
LOC:  129134190  129134430  129137856  129137000
ERR:          0
MIS:          0

*************************************************************************************

vmstat

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b  swpd  free  buff  cache  si  so    bi    bo  in    cs us sy id wa
 0  0      0  16636 161352 3758148    0    0    2    28  23    11  1  1 97  1

*************************************************************************************

cat /proc/devices
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
 10 misc
 13 input
 29 fb
 36 netlink
 89 i2c
128 ptm
136 pts
162 raw
180 usb
253 mercd
254 ctimod

Block devices:
  1 ramdisk
  8 sd
  9 md
 22 ide1
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
253 device-mapper
254 mdp

*************************************************************************************

top - 22:13:42 up 1 day, 12:23,  7 users,  load average: 0.22, 0.20, 0.18
Tasks: 118 total,  1 running, 116 sleeping,  0 stopped,  1 zombie
Cpu(s):  1.1% us,  1.9% sy,  0.0% ni, 96.1% id,  0.8% wa,  0.0% hi,  0.0% si
Mem:  4151264k total,  4135452k used,    15812k free,  161296k buffers
Swap:  6144820k total,        0k used,  6144820k free,  3758984k cached

*************************************************************************************

cat /proc/stat
cpu  408911 0 773353 50217252 330331 2915 0
cpu0 87787 0 184459 12542431 118181 202 0
cpu1 109435 0 214128 12547286 60982 1206 0
cpu2 124588 0 217934 12558696 30879 1274 0
cpu3 87099 0 156830 12568837 120288 231 0
intr 140817575 131081868 9 0 13 7 0 0 0 1 0 6 6 58 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5388483 0 0 0 0 0 0 0 1062535 0 0 0 0 0 0 0 261647 0 0 0 0 0 0 0 29282 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2993619 0 0 0 0 0
ctxt 134827990
btime 1280479789
processes 4263951
procs_running 1
procs_blocked 0

**************************************************************************************

I took the stats when the problem occured. Free suggests that physical memory is ok. But the telephony card driver is saying unable to allocate memory. I am not able to decide where the issue is? Please help.

Raghu140 07-31-2010 05:57 PM

Quote:

When was that run? You seem to have an idea of when the failure is approaching and looking at the slabinfo when the failure is about to happen or starting to happen would be more informative than when the system is healthy.
It generally happens 3 days after system restart, when load is about at its peak (30% - normal load for IVR). If i only restart the card driver then it runs for 8-9 hrs then again the problem starts. I will provide slabinfo next time when the problem occurs.Its an voice IVR application.

Code:

ls -lh /proc/kcore
-r-------- 1 root root 897M Jul 31 23:06 /proc/kcore

Quote:

It is a build time option when you recompile the kernel. Do you know how to recompile a kernel?
No. But i can do it. I have been working on linux for past 4 yrs. I am more of application designer/developer with telecom network understanding.

Quote:

Do you pay for support for this RHEL system? If you do, you should be asking Red Hat for some support. If you don't you ought to be using Centos instead of RHEL.
No. we dont. Yup i will consider it before putting such system in production. But the problem is that the current system is at remote loaction and any possibility of hardware/software change is very difficult.So my entire priority is in identify the root cause.

I have faced this issue earlier as well. But last time it was running RHEL 4.3 and my vender suggested to upgarde the O.S to RHEL 4.5. I did and we also replaced the chassis(server). The problem was resolved. But now we have RHEL 4.5 and we are again facing the issue. I am looking more for root cause to kill this issue once for all.


Quote:

Is your hardware 64 bit capable? Do you have a good reason for running 32 bit RHEL rather than 64 bit? I think switching to 64 bit is more likely to fix the problem than switching just to version 5.
I dont think so. Will consider 64 bit for now onwards.


I have also take then similar logs of system as posted earlier. But this time the system was working fine after the card driver restart. this is just for comparsion purpose. See if u can find anything:-

Code:

free -m
            total      used      free    shared    buffers    cached
Mem:          4053      4038        15          0        152      3685
-/+ buffers/cache:        200      3853
Swap:        6000          0      6000
**************************************************************************************
df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb6              20G  619M  18G  4% /
/dev/sdb1              99M  12M  82M  13% /boot
none                  2.0G    0  2.0G  0% /dev/shm
/dev/sdb2              62G  16G  43G  28% /home
/dev/sdb7            9.7G  68M  9.1G  1% /opt
/dev/sdb5              20G  3.0G  16G  17% /usr
/dev/sdb3              20G  267M  18G  2% /var

**************************************************************************************

ifconfig
eth2      Link encap:Ethernet  HWaddr 00:30:64:08:C1:A6
          inet addr:10.100.108.69  Bcast:10.100.108.95  Mask:255.255.255.224
          inet6 addr: fe80::230:64ff:fe08:c1a6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2014749 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2269242 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:202452852 (193.0 MiB)  TX bytes:716216325 (683.0 MiB)
          Base address:0xb880 Memory:fda80000-fdaa0000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:5478564 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5478564 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1769831977 (1.6 GiB)  TX bytes:1769831977 (1.6 GiB)

**************************************************************************************

route -n
Kernel IP routing table
Destination    Gateway        Genmask        Flags Metric Ref    Use Iface
10.100.108.64  0.0.0.0        255.255.255.224 U    0      0        0 eth2
169.254.0.0    0.0.0.0        255.255.0.0    U    0      0        0 eth2
0.0.0.0        10.100.108.65  0.0.0.0        UG    0      0        0 eth2

**************************************************************************************
iptables -L -n -v

Chain INPUT (policy ACCEPT 323K packets, 79M bytes)
 pkts bytes target    prot opt in    out    source              destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target    prot opt in    out    source              destination

Chain OUTPUT (policy ACCEPT 356K packets, 149M bytes)
 pkts bytes target    prot opt in    out    source              destination

**************************************************************************************

cat /proc/cpuinfo

processor      : 0
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 0
siblings        : 2
core id        : 0
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4269.90

processor      : 1
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 0
siblings        : 2
core id        : 1
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4266.07

processor      : 2
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 3
siblings        : 2
core id        : 6
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4266.13

processor      : 3
vendor_id      : GenuineIntel
cpu family      : 6
model          : 15
model name      : Intel(R) Xeon(R) CPU            5138  @ 2.13GHz
stepping        : 11
cpu MHz        : 2133.765
cache size      : 4096 KB
physical id    : 3
siblings        : 2
core id        : 7
cpu cores      : 2
fdiv_bug        : no
hlt_bug        : no
f00f_bug        : no
coma_bug        : no
fpu            : yes
fpu_exception  : yes
cpuid level    : 10
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 4266.14

****************************************************************************************************************************

cat /proc/meminfo

MemTotal:      4151264 kB
MemFree:        15196 kB
Buffers:        155624 kB
Cached:        3775052 kB
SwapCached:          4 kB
Active:        3262304 kB
Inactive:      776124 kB
HighTotal:    3276544 kB
HighFree:        1024 kB
LowTotal:      874720 kB
LowFree:        14172 kB
SwapTotal:    6144820 kB
SwapFree:      6144816 kB
Dirty:            1812 kB
Writeback:          0 kB
Mapped:        133872 kB
Slab:            73616 kB
CommitLimit:  8220452 kB
Committed_AS:  1907636 kB
PageTables:      3496 kB
VmallocTotal:  106488 kB
VmallocUsed:      6420 kB
VmallocChunk:    99316 kB
HugePages_Total:    0
HugePages_Free:      0
Hugepagesize:    2048 kB

*************************************************************************************************************************

cat /proc/net/dev

Inter-|  Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:1772920337 5488333    0    0    0    0          0        0 1772920337 5488333    0    0    0    0      0          0
  eth0:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0
  eth1:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0
  eth2:202792040 2018629    0    0    0    0          0        0 720153624 2274779    0    0    0    0      0          0
  eth3:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0
  sit0:      0      0    0    0    0    0          0        0        0      0    0    0    0    0      0          0

*************************************************************************************************************************

cat /proc/interrupts
          CPU0      CPU1      CPU2      CPU3
  0:  33404688  33413743  33414438  33407356    IO-APIC-edge  timer
  1:          2          3          3          1    IO-APIC-edge  i8042
  8:          0          1          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0  IO-APIC-level  acpi
 12:        17        12        12        17    IO-APIC-edge  i8042
 14:          0          0          0          0    IO-APIC-edge  libata
 15:        25          3          5          4    IO-APIC-edge  ide1
169:          0          0          0          0  IO-APIC-level  uhci_hcd, uhci_hcd
177:          0          0          0          0  IO-APIC-level  uhci_hcd
185:      1838    2579661    2905017      6736  IO-APIC-level  ehci_hcd, uhci_hcd, mercdintr
193:    466468      76480      3226    537463  IO-APIC-level  ioc0
201:    111765      21576      6855    126563  IO-APIC-level  ehci_hcd, uhci_hcd
209:      12700      2699      2822      13971  IO-APIC-level  uhci_hcd, mercdintr
233:    3072471          0          0          0        PCI-MSI  eth2
NMI:          0          0          0          0
LOC:  131873870  131874041  131877241  131876386
ERR:          0
MIS:          0

****************************************************************************************************************************

vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b  swpd  free  buff  cache  si  so    bi    bo  in    cs us sy id wa
 0  0    4 16292 155228 3774408    0    0    2    28  28    17  1  1 97  1

***************************************************************************************************************************

cat /proc/devices
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
 10 misc
 13 input
 29 fb
 36 netlink
 89 i2c
128 ptm
136 pts
162 raw
180 usb
253 mercd
254 ctimod

Block devices:
  1 ramdisk
  8 sd
  9 md
 22 ide1
65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
253 device-mapper
254 mdp

*************************************************************************************************************************

Pls notice the Bolded words.

Code:

vm.percpu_pagelist_fraction = 0
vm.max_queue_depth = 0
vm.oom-kill = 1
vm.legacy_va_layout = 0
vm.vfs_cache_pressure = 100
vm.block_dump = 0
vm.laptop_mode = 0
vm.max_map_count = 65536
vm.min_free_kbytes = 949
vm.lower_zone_protection = 0
vm.hugetlb_shm_group = 0
vm.nr_hugepages = 0
vm.swappiness = 60
vm.nr_pdflush_threads = 2
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500
vm.dirty_ratio = 30
vm.dirty_background_ratio = 7
vm.page-cluster = 3
vm.overcommit_ratio = 50
vm.overcommit_memory = 0

I was considering to vm.overcommit_memory = 2. I read in Redhat optimazation that it increases the ram availablity to the system. I dont know wheather it will help or not.

salasi 08-01-2010 02:58 PM

Quote:

Originally Posted by Raghu140 (Post 4051259)
I have faced this issue earlier as well. But last time it was running RHEL 4.3 and my vender suggested to upgarde the O.S to RHEL 4.5. I did and we also replaced the chassis(server). The problem was resolved.

Also consider the possibility that you didn't so much resolve the problem as delay its occurrence.

Quote:

I have collected the vital stats when problem occured
I don't believe that you recorded this at exactly the time that the problem started; was it just before or just after the problem actually started?

Quote:

Code:

vmstat

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b  swpd  free  buff  cache  si  so    bi    bo  in    cs us sy id wa
 0  0      0  16636 161352 3758148    0    0    2    28  23    11  1  1 97  1


I'm not sure that the output of vmstat is helping much but, if vmstat were to help, you'd have to do something other than this. The first line of vmstat probably only deceives about what is currently going on, so you need the multi-line output.

Quote:

Code:

top - 22:13:42 up 1 day, 12:23,  7 users,  load average: 0.22, 0.20, 0.18
Tasks: 118 total,  1 running, 116 sleeping,  0 stopped,  1 zombie
Cpu(s):  1.1% us,  1.9% sy,  0.0% ni, 96.1% id,  0.8% wa,  0.0% hi,  0.0% si
Mem:  4151264k total,  4135452k used,    15812k free,  161296k buffers
Swap:  6144820k total,        0k used,  6144820k free,  3758984k cached


Do you know what that zombie process was? not the driver for your telephony card, by any chance?

At this point, I have a suspicion that there is simply a bug, or maybe an incompatibility in the card driver (did you install it from a repo, did you build it yourself from a tarball or something else?), but I have no idea how to proceed further without more information.


All times are GMT -5. The time now is 04:06 PM.