LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   system up and pingable but not able to login (https://www.linuxquestions.org/questions/linux-newbie-8/system-up-and-pingable-but-not-able-to-login-4175430077/)

linuxandtsm 10-02-2012 10:41 AM

system up and pingable but not able to login
 
Hi all,
This is a CentOs 5 server. The server was up and running and was pingable but unable to login to the machine (unable to ssh or unable to login to a direct console).
Reboot of the machine solved the issue but want to check on why this was happened.

Yesterday i got zabbix messages about this system that there is a lack of free swap space. Will this be the reason ?. How to check more on this?.
Now everything with swap space looks fine as given below

Code:

free -m
            total      used      free    shared    buffers    cached
Mem:        32183      3132      29050          0        87      2384
-/+ buffers/cache:        660      31522
Swap:        32767          0      32767

/var/log/messages file is huge so trying to see how can i extract info about this error from it. Any suggestions ?

thanks in advance!

linuxandtsm 10-02-2012 10:50 AM

There are below comments in /var/log/messages file

Code:

Oct  2 09:57:23 lnx12 kernel: Node 0 Normal: 159*4kB 26*8kB 207*16kB 30*32kB 5*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 8764kB
Oct  2 09:57:23 lnx12 kernel: Node 0 HighMem: empty
Oct  2 09:57:23 lnx12 kernel: Node 1 DMA: empty
Oct  2 09:57:23 lnx12 kernel: Node 1 DMA32: empty
Oct  2 09:57:23 lnx12 kernel: Node 1 Normal: 23*4kB 120*8kB 289*16kB 33*32kB 6*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 11468kB
Oct  2 09:57:23 lnx12 kernel: Node 1 HighMem: empty
Oct  2 09:57:23 nx02 kernel: 820706 pagecache pages
Oct  2 09:57:23 lnx12 kernel: Swap cache: add 90291007, delete 90294335, find 27784036/36516487, race 6+9512
Oct  2 09:57:23 lnx12 kernel: Free swap  = 0kB
Oct  2 09:57:23 lnx12 kernel: Total swap = 33554424kB
Oct  2 09:57:23 lnx12 kernel: Free swap:            0kB
Oct  2 09:57:23 lnx12 kernel: 8519679 pages of RAM
Oct  2 09:57:23 lnx12 kernel: 280819 reserved pages
Oct  2 09:57:23 nx02 kernel: 31957 pages shared
Oct  2 09:57:23 lnx12 kernel: 906 pages swap cached
Oct  2 09:57:23 lnx12 kernel: Out of memory: Killed process 5998, UID 502, (oracle).
Oct  2 10:11:51 lnx12 kernel: INFO: task automount:26380 blocked for more than 120 seconds.

and also lot of below comments about cpu

Code:

Oct  2 10:25:21 lnx12 kernel: cpu 0 hot: high 186, batch 31 used:27
Oct  2 10:25:21 lnx12 kernel: cpu 0 cold: high 62, batch 15 used:56
Oct  2 10:25:21 lnx12 kernel: cpu 1 hot: high 186, batch 31 used:0
Oct  2 10:25:21 lnx12 kernel: cpu 1 cold: high 62, batch 15 used:0
Oct  2 10:25:26 lnx12 kernel: cpu 2 hot: high 186, batch 31 used:65
Oct  2 10:25:26 lnx12 kernel: cpu 2 cold: high 62, batch 15 used:15
Oct  2 10:25:26 lnx12 kernel: cpu 3 hot: high 186, batch 31 used:0
Oct  2 10:25:26 lnx12 kernel: cpu 3 cold: high 62, batch 15 used:0

Code:

Oct  2 10:25:26 lnx12 kernel: Node 1 HighMem per-cpu: empty
Oct  2 10:25:26 lnx12 kernel: Free pages:      83180kB (0kB HighMem)
Oct  2 10:25:26 lnx12 kernel: Active:2123222 inactive:5973160 dirty:0 writeback:0 unstable:0 free:20795 slab:32938 mapped-file:1198 mapped-anon:7280824 pagetables:66147
Oct  2 10:25:26 lnx12 kernel: Node 0 DMA free:10036kB min:4kB low:4kB high:4kB active:0kB inactive:0kB present:9636kB pages_scanned:0 all_unreclaimable? yes
Oct  2 10:25:26 lnx12 kernel: lowmem_reserve[]: 0 3502 16127 16127
Oct  2 10:25:26 lnx12 kernel: Node 0 DMA32 free:52916kB min:2492kB low:3112kB high:3736kB active:1739256kB inactive:1681096kB present:3586464kB pages_scanned:6470097 all_u
nreclaimable? yes

Code:

Oct  2 10:43:59 lnx12 kernel: Node 0 HighMem: empty
Oct  2 10:43:59 lnx12 kernel: Node 1 DMA: empty
Oct  2 10:43:59 lnx12 kernel: Node 1 DMA32: empty
Oct  2 10:43:59 lnx12 kernel: Node 1 Normal: 41*4kB 98*8kB 289*16kB 33*32kB 6*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 11364kB
Oct  2 10:43:59 lnx12 kernel: Node 1 HighMem: empty
Oct  2 10:43:59 lnx12 kernel: 820884 pagecache pages
Oct  2 10:43:59 lnx12 kernel: Swap cache: add 90296484, delete 90299815, find 27784756/36517837, race 6+9520
Oct  2 10:43:59 lnx12 kernel: Free swap  = 0kB
Oct  2 10:43:59 lnx12 kernel: Total swap = 33554424kB
Oct  2 10:43:59 lnx12 kernel: Free swap:            0kB
Oct  2 10:43:59 lnx12 kernel: 8519679 pages of RAM
Oct  2 10:43:59 lnx12 kernel: 280819 reserved pages
Oct  2 10:43:59 lnx12 kernel: 28519 pages shared
Oct  2 10:43:59 lnx12 kernel: 903 pages swap cached
Oct  2 10:43:59 lnx12 kernel: audispd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Oct  2 10:43:59 lnx12 kernel:
Oct  2 10:43:59 lnx12 kernel: Call Trace:
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff800c95e5>] out_of_memory+0x8e/0x2f3
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff8002e261>] __wake_up+0x38/0x4f
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff8000f569>] __alloc_pages+0x27f/0x308
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff80012f32>] __do_page_cache_readahead+0x96/0x179
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff8001386d>] filemap_nopage+0x14c/0x360
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff8000895e>] __handle_mm_fault+0x1fb/0x1039
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff8006720b>] do_page_fault+0x4cb/0x874
Oct  2 10:43:59 lnx12 kernel:  [<ffffffff8005dde9>] error_exit+0x0/0x84


etech3 10-02-2012 12:27 PM

Did you set this computer up?

How much swap space do you have?

Maybe need to increase swap space.

Has it happened before?

How long has this server been running?

linuxandtsm 10-02-2012 01:11 PM

Hi etech3,

yes, the server is up and running fine now.
from below, looks there is a 31GB of swap space
Code:

# free -g
            total      used      free    shared    buffers    cached
Mem:            31          6        24          0          0          5
-/+ buffers/cache:          0        30
Swap:          31          0        31

This server is been running from Feb this year (more than 8 months) and this is the first time we got this issue.

etech3 10-02-2012 01:16 PM

What software you got on this server?

linuxandtsm 10-02-2012 02:11 PM

only Oracle db

chrism01 10-03-2012 04:38 AM

Could be a memory leak in the App code eg Java, C whatever.
As the RAM fills up, the kernel will use more & more swap as an extension of RAM.
When it runs out of swap, it will attempt to keep running by invoking the OOM-Killer process, which kills random processes to recoup some memory.

If you look carefully through the logs, you may be able to figure out what program filled the memory.
The on-going thing to do is to setup some kind of monitoring to notify you when it starts digging into swap significantly.


All times are GMT -5. The time now is 10:02 PM.