Hi,
I am using Ubuntu 10.04 LTS server for running Hadoop cluster, in recent days server hangs and it throws error message in screen but nothing in log.
when server hangs couldn't login thru local or remote.
Error message in local monitor:
Msg 1
[641677.044778] FS: 00007f61ea646700(0000) GS:ffff8800100c0000(0000) kn1GS:000
0000000000000
[641677.081636] CS: 0010 DS: 0000 ES: 0000 CRO: 000000008005003b
[641677.100839] CR2: 000000000061ade0 CR3: 000000042f758000 CR4: 00000000000006
0
[641677.137188] DRO: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000
op
[641677.174348] DR3: 0000000000000000 DR6: 000000001fff0ff0 DR7: 00000000000004
op
[641677.212210]Call Trace:
[641677.230768]Uffffffff810fbba0>l? drain_local_pages+0x0/0x20
[641677.249676]Uffffffff8109a9c2>l? smp_cal1_function+Ox22/0x30
[641677.268199]Uffffffff8106def4A ? on_each_cpu+Ox24/0x50
[641677.286434]Uffffffff8101a0bc>1? drain_all_pages+Oxlc/0x20
[641677.304433]Uffffffff8101a585>l? __alloc_pages_slowpatb+Ox3c5/0x580
[641677.322223]Uffffffff8101a8b1>l? __alloc_pages_nodemask+Ox171/0x180
[641677.339797]Uffffffff81089512>l? brtimer_cance1+0x22/0x30
[641677.357173]Uffffffff8112d6c7>l? alloc_pages_current+Ox87/0xdO
[641677.374510]Uffffffff810197ae>1? __get_free_pages+Oxe/Ox50
[641677.391581]Uffffffff81064056>l? dup_task_struct+Ox46/0x170
[641677.408413]Uffffffff81065241>l? copy_process+Oxbl/Oxe90
[641677.424899]Uffffffff810660b4>l? do_fork+Ox94/0x430
[641677.441042]Uffffffff81098b38>l? do_futex+Oxb8/0x1b0
[641677.456781]Uffffffff81098cab>1? sys_futex+0x7b/0x170
[641677.472137]Uffffffff81011558>l? sys_clone+Ox28/0x30
[641677.487142]Uffffffff810134d3A ? stub_clone+Ox13/0x20
[641677.501801]Uffffffff810131b2>l? system_cal1_fastpath+Ox16/0x1b
Msg 2
000
(2842010 .855788] DR3: 0000000000000000 DR6: 0000000011110110 DR7: 0000000000000
-00
(2842010 .855794] Process kondemand/2 (pid: 135, threadinfo ffff88044b8c2000, to
,k ffff88 044b1d44d0)
(2842010 .855797] Stack:
(2842010 .855800] 111188044b8c3cc0 ffff88044b8c3c30 ffff88044b8c3d68 0000000000
001111
(2842010 .855805] <0> ffff88001008fb20 000000028155d896 00000000111111ff 0000000
000000008
(2842010 .855812] <0> 0000000000015bc0 0000000000015bc0 1111880010181c30 0000000
000015bc0
(2842010 .855818] Call Trace:
(2842010 .855827] Uffffffff8105c888>1 load_balance_newidle+Oxa8/0x310
(2842010 .855835] (<111111118155894a>] thread_return+0x352/0x418
(2842010 .855843] [<ffffffff810806aa>] worker_thread+Oxda/Ox110
(2842010 .855849] [<ffffffff81085090>] ? autoremove_wake_function+Ox0/0x40
(2842010 .855856] (<1111111181080540>] ? worker_thread+Ox0/0x110
(2842010 .855862] (<1111111181084416>] kthread+0x96/0xa0
(2842010 .855868] [<ffffffff810141ea>] child_rip+Oxa/0x20
(2842010 .855875] (<1111111181084c80>] ? kthread+Ox0/0xa0
(2842010 .855880] fdiffifff810141e0>] ? child_rip+Ox0/0x20
(2842010 .855883] Code: 06 89 85 c0 fe ff ff c7 85 c4 fe ff ff 01 00 00 00 e9 97
fb ff ff 90 48 8b 95 e0 fe ff ff 48 8b 45 a8 8b 72 08 48 c1 e0 Oa 31 d2 <48> f7
f6 48 8b 75 b0 48 89 45 a0 31 c0 48 85 16 74 Oc 48 8b 45
(2842010 .855922] RIP (<1111111181056284>] find_busiest_group+Ox634/0x8f0
[2842010 .855929] RSP <ffff88044b8c3bb0>
[2842010 .855933] ---( end trace 81a1739d978369cb ]---
below link contains snapshot of error msg when server hang.
https://drive.google.com/file/d/0BxC...it?usp=sharing
https://drive.google.com/file/d/0BxC...it?usp=sharing
https://drive.google.com/file/d/0BxC...it?usp=sharing
https://drive.google.com/file/d/0BxC...it?usp=sharing
https://drive.google.com/file/d/0BxC...it?usp=sharing
https://drive.google.com/file/d/0BxC...it?usp=sharing
https://drive.google.com/file/d/0BxC...it?usp=sharing
Please let me know any suggestion and idea where to look.