LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 08-22-2017, 05:33 AM   #1
abhisheks77
Member
 
Registered: Apr 2014
Posts: 63

Rep: Reputation: Disabled
Why my application is waiting at sync_page for long time ?


Hi,

We have RHEL 6.4 and there is a tool installed on this server, bip_server.
Whenever this is getting started, it is taking very long time of around 20-30 minutes and it waits at sync_page and in D1 state. There is no load on server, sufficient memory exists on server. ulimit is good (for that user which starts application).
Can somebody help me to understand, what else I need to check ?
Code:
[root@apps_db34 ~]# ps -ef | grep -i bip
xms       2594     1 23 03:13 pts/2    00:00:01 /export/xms/kle/bin/bip_server -f /export/xms/config/config.ini
root      2610  1807  0 03:13 pts/1    00:00:00 grep -i bip
[root@apps_db34 ~]# 
[root@apps_db34 ~]# ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep -i bip
    1  2821 xms      Dl    9.7 bip_server     sync_page
[root@apps_db34 ~]# cat /proc/2821/stack
[<ffffffff81119ded>] sync_page+0x3d/0x50
[<ffffffff81119e0e>] sync_page_killable+0xe/0x40
[<ffffffff81119d17>] __lock_page_killable+0x67/0x70
[<ffffffff8111ba64>] generic_file_aio_read+0x4b4/0x700
[<ffffffff81180eba>] do_sync_read+0xfa/0x140
[<ffffffff811817a5>] vfs_read+0xb5/0x1a0
[<ffffffff811818e1>] sys_read+0x51/0x90
[<ffffffff8104d810>] sysenter_dispatch+0x7/0x2e
[<ffffffffffffffff>] 0xffffffffffffffff
[root@apps_db34 ~]#
[root@apps_db34 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         15951       1671      14280          0         18       1228
-/+ buffers/cache:        424      15527
Swap:         7999          0       7999
[root@apps_db34 ~]# su - xms
( 03:31:19 ) apps_db34:xms ~>ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127476
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 271360
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
( 03:31:23 ) apps_db34:xms ~>
Regards
 
Old 08-22-2017, 06:40 AM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,354
Blog Entries: 13

Rep: Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411Reputation: 4411
Process status is uninterruptible sleep https://unix.stackexchange.com/quest...state-indicate

It is likely performing I/O, however considering that it is asleep, it should not be destructively affecting the operating system.

Besides this information, did you have any particular reason to feel that the server is not fulfilling the function you have it performing?

I'd read the Oracle documentation about it or search their website for further information.
 
Old 08-22-2017, 07:22 AM   #3
!!!
Member
 
Registered: Jan 2017
Location: Fremont, CA, USA
Distribution: Trying any&ALL on old/minimal
Posts: 759

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Try strace, to see what it was doing, just before it hung.
There's probably other things in its /proc to dig into, for more clues, but Idk offhand.
Has it ever worked? Any oddities, like nfs swapspace? Vendor support?
I assume original RHEL kernel. Idk which kernel source file (anyone?)
 
Old 08-24-2017, 04:40 PM   #4
abhisheks77
Member
 
Registered: Apr 2014
Posts: 63

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by !!! View Post
Try strace, to see what it was doing, just before it hung.
There's probably other things in its /proc to dig into, for more clues, but Idk offhand.
Has it ever worked? Any oddities, like nfs swapspace? Vendor support?
I assume original RHEL kernel. Idk which kernel source file (anyone?)
Not sure, if I am using strace correctly, as I am not much knowledgeable on this. Does it give some clue ?
Code:
[root@apps_db34]# strace -s 99 -ffp 2654
Process 2654 attached with 12 threads - interrupt to quit
[ Process PID=2665 runs in 32 bit mode. ]
[pid  2665] futex(0x8d449cc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2664] select(6, [5], NULL, NULL, {0, 135099} <unfinished ...>
[pid  2663] futex(0x8cd31cc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2661] futex(0x8cd1b6c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2658] futex(0x8cc4118, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
[pid  2657] futex(0x8cc4118, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
[pid  2662] futex(0x8cd2f50, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2660] select(5, [4], NULL, NULL, {228, 5309} <unfinished ...>
[pid  2659] futex(0x8cc4118, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
[pid  2655] futex(0x8cc4118, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
[pid  2656] futex(0x8cc4118, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
[pid  2654] _llseek(9, 126976, [126976], SEEK_SET) = 0
[pid  2654] read(9, "\1\21\350:\221\0\2i\4\0\1\21\350=\335\0\2i\22\0\1\21\350B#\0\2i\34\0\1\21\351\22\201\0\2i&\0\1\21\351 \267\0\2i3\0\1\21\351\2307\0\2i?\0\1\21\351\276\255\0\2iI\0\1\21\351\322\313\0\2iT\0\1\21\352)\263\0\2i]\0\1\21\352=3\0\2i]"..., 4096) = 4096
[pid  2654] read(9, "\2\223\37\0\1\22eM\257\0\2\223.\0\1\22eh\205\0\2\223D\0\1\22ex1\0\2\223]\0\1\22e\207\t\0\2\223{\0\1\22e\215\351\0\2\223\223\0\1\22e\231\347\0\2\223\254\0\1\22e\2661\0\2\223\307\0\1\22e\310\211\0\2\223\343\0\1\22e\323u\0\2\223\377\0\1\22e\346\23"..., 4096) = 4096
[pid  2654] _llseek(9, 308625408, [308625408], SEEK_SET) = 0
[pid  2654] read(9, "\7\0\0\2\211\32idZ\213\256\0\25\0\0\2\211\32\274dZ\213\254\0\30\0\0\2\211\33\17dZ\214\240\0\6\0\0\2\211\33bdZ\213\235\0\20\0\0\2\211\33bdZ\214\221\0\3\0\0\2\211\33bdZ\214\221\0\4\0\0\2\211\35\tdZ\214\221\0\5\0\0\2\211\31VdZ\214\222\0\2\0\0"..., 4096) = 4096
[pid  2654] _llseek(9, 131072, [131072], SEEK_SET) = 0
[pid  2654] read(9, "\2\223\37\0\1\22eM\257\0\2\223.\0\1\22eh\205\0\2\223D\0\1\22ex1\0\2\223]\0\1\22e\207\t\0\2\223{\0\1\22e\215\351\0\2\223\223\0\1\22e\231\347\0\2\223\254\0\1\22e\2661\0\2\223\307\0\1\22e\310\211\0\2\223\343\0\1\22e\323u\0\2\223\377\0\1\22e\346\23"..., 4096) = 4096
[pid  2654] _llseek(9, 308633600, [308633600], SEEK_SET) = 0
[pid  2654] read(9, "\257d\373\33\32\0\17\0\0\2\243\360\231d\373\33\32\0\17\0\0\0\323\367Pd\373\33\32\0\17\0\0\0\35\336\365d\373\33-\0\34\0\0\0001\303\325d\373\33-\0\4\0\0\0N\224\352d\373\33(\0\25\0\0\0\226\257\213d\373\33-\0\34\0\0\0\235\37Sd\373\33-\0\25\0\0\0\352\261+d\373"..., 4096) = 4096
[pid  2654] _llseek(9, 131072, [131072], SEEK_SET) = 0
[pid  2654] read(9, "\2\223\37\0\1\22eM\257\0\2\223.\0\1\22eh\205\0\2\223D\0\1\22ex1\0\2\223]\0\1\22e\207\t\0\2\223{\0\1\22e\215\351\0\2\223\223\0\1\22e\231\347\0\2\223\254\0\1\22e\2661\0\2\223\307\0\1\22e\310\211\0\2\223\343\0\1\22e\323u\0\2\223\377\0\1\22e\346\23"..., 4096) = 4096
[pid  2654] _llseek(9, 308637696, [308637696], SEEK_SET) = 0
[pid  2654] read(9,  <unfinished ...>
[pid  2664] <... select resumed> )      = 0 (Timeout)
[pid  2664] select(6, [5], NULL, NULL, {5, 0} <unfinished ...>
[pid  2654] <... read resumed> "Ze\n\371y\0008\0\0\0032\310Ge\n\371y\0008\0\0\0032\310Ze\n\371y\0008\0\0\0032\310Ge\n\371y\0008\0\0\0\341!+e\n\371y\0\31\0\0\2\363\322\245e\n\371y\0\26\0\0\2\363\322\221e\n\371y\0\26\0\0\0{m\363e\n\371y\0\34\0\0\0\341!+e\n"..., 4096) = 4096
[pid  2654] _llseek(9, 131072, [131072], SEEK_SET) = 0
[pid  2654] read(9, "\2\223\37\0\1\22eM\257\0\2\223.\0\1\22eh\205\0\2\223D\0\1\22ex1\0\2\223]\0\1\22e\207\t\0\2\223{\0\1\22e\215\351\0\2\223\223\0\1\22e\231\347\0\2\223\254\0\1\22e\2661\0\2\223\307\0\1\22e\310\211\0\2\223\343\0\1\22e\323u\0\2\223\377\0\1\22e\346\23"..., 4096) = 4096
[pid  2654] _llseek(9, 308641792, [308641792], SEEK_SET) = 0
[pid  2654] read(9, "\0\0\31\367;d\353\27\216\0\n\0\0\0%.\342d\353\27\212\0\f\0\0\1I\"\1d\353\27\212\0\21\0\0\1I!\360d\353\27\212\0\21\0\0\0005>*d\353\27\216\0\n\0\0\1T\201\203d\353\27\212\0\f\0\0\1T\201sd\353\27\212\0\f\0\0\0\31\367;d\353\27|\0\10\0\0\0005"..., 4096) = 4096
[pid  2654] _llseek(9, 131072, [131072], SEEK_SET) = 0
.......................................................
.......................................................
[pid  2654] _llseek(9, 309293056, [309293056], SEEK_SET) = 0
[pid  2654] read(9, ^C <unfinished ...>
Process 2654 detached
Process 2655 detached
Process 2656 detached
Process 2657 detached
Process 2658 detached
Process 2659 detached
Process 2660 detached
Process 2661 detached
Process 2662 detached
Process 2663 detached
Process 2664 detached
Process 2665 detached
 
Old 08-24-2017, 05:28 PM   #5
!!!
Member
 
Registered: Jan 2017
Location: Fremont, CA, USA
Distribution: Trying any&ALL on old/minimal
Posts: 759

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
#2 probably right: just 'normal' performance issue

Now I think (not sure tho) that: the D state might be:
'normal' reading-disk-in-progress occurance!

Can you repeat that ps ... | grep <pid of process found in D state> LOTS of times, to see IF it is ALWAYS in D state?
I'm guessing MAYBE it's NOT!!!

I use just: strace -f -o myfile -p <#>

Do your two lines of dots, just mean: more of same omitted? (I'm guessing: Yes)
**I'm guessing it keeps on doing the seek&read(9,... Which would mean:
**It's not hung at all!!! A 'false-positive' of a hang, maybe! #2 simply right
What 'certainty/proof' do you see, that some actual activity/processing has hung?
Could it be that bip application simply does a huge amounts of reading to startup?
That would be just a performance issue, like dd reading TeraBytes, bs=1K each.
(Don't be upset IF my guess on this is wrong!!!)

Interesting dozen threads; ps has a -T; any ideas on them?

I haven't web-researched FUTEX_WAIT_PRIVATE (yet). Any other LQ'er have ideas? (I couldn't find "D1" state, just D)

Last edited by !!!; 08-24-2017 at 05:58 PM.
 
Old 08-24-2017, 05:54 PM   #6
abhisheks77
Member
 
Registered: Apr 2014
Posts: 63

Original Poster
Rep: Reputation: Disabled
It is still running and trying to come in operational state, as you can see in below output. In other servers, it takes 2-3 minutes to come up, but here, it is already close to 2 hours.
Quote:
[root@apps_db34 ~]# ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep -i bip
1 2654 xms Dl 0.2 bip_server sync_page
[root@apps_db34 ~]# ps -ef | grep -i bip
xms 2654 1 0 13:52 pts/2 00:00:17 /export/xms/kle/bin/bip_server -f /export/xms/config/config.ini
root 6078 6053 0 15:47 pts/4 00:00:00 grep -i bip
[root@apps_db34 ~]# date
Thu Aug 24 15:47:28 PDT 2017
[root@apps_db34 ~]#
[root@apps_db34 ~]# strace -f -o myfile -p 2654
Process 2654 attached with 13 threads - interrupt to quit
[ Process PID=4842 runs in 32 bit mode. ]
^CProcess 2654 detached
Process 2655 detached
Process 2656 detached
Process 2657 detached
Process 2658 detached
Process 2659 detached
Process 2660 detached
Process 2661 detached
Process 2662 detached
Process 2663 detached
Process 2664 detached
Process 2665 detached
Process 4842 detached
[root@apps_db34 ~]#
 
Old 08-24-2017, 06:03 PM   #7
!!!
Member
 
Registered: Jan 2017
Location: Fremont, CA, USA
Distribution: Trying any&ALL on old/minimal
Posts: 759

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Ah! I see! Then, I'd look into disk performance tools, to compare systems.
iostat, plus newer ones which I'm not familiar with. Web-search... Best wishes.

p.s. Figure out (lsof maybe) what file that fd 9 is, and time cp it /dev/null
Compare with good systems. ALL system&app configs same? dmesg errors?

Last edited by !!!; 08-24-2017 at 06:15 PM.
 
Old 08-24-2017, 07:17 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,780

Rep: Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573
That's "Dl", not "D1".
What kernel is this running ?.
Is it under KVM ?.
Is it using a NFS mount ?.
 
1 members found this post helpful.
Old 08-24-2017, 08:26 PM   #9
abhisheks77
Member
 
Registered: Apr 2014
Posts: 63

Original Poster
Rep: Reputation: Disabled
It is not in KVM. This VM is running on VMWare. There is no NFS mount on server.
Quote:
[root@apps_db34 ~]# uname -a
Linux apps_db34 2.6.32-358.6.2.el6.x86_64 #1 SMP Tue May 14 15:48:21 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@apps_db34 ~]#
[root@apps_db34 ~]# cat /etc/*elea*
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Red Hat Enterprise Linux Server release 6.4 (Santiago)
cpe:/o:redhat:enterprise_linux:6server:ga:server
[root@apps_db34 ~]#
 
Old 08-25-2017, 01:19 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,780

Rep: Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573Reputation: 3573
That code was a bit of a problem child apparently, and was yanked from the kernel at 2.6.39. That would be May 2011.
Even allowing for RH backporting, I'd reckon it's time to upgrade rather than chasing these sort of things. Else raise a ticket with RH if still under support.
 
Old 08-25-2017, 01:36 AM   #11
abhisheks77
Member
 
Registered: Apr 2014
Posts: 63

Original Poster
Rep: Reputation: Disabled
We do not have RH support for these servers (hosted on VMWare), so need to figure ourselves.
We can go with upgrade, but keeping that as secondary option. Because, this is not working in one of server out of other 3 nodes, while all are on same kernel level.
Were you able to find similar known issue related to this kernel level (Any link or bug file) ? If we have evidence that this kernel is problematic, we should be able to upgrade all nodes.

Last edited by abhisheks77; 08-25-2017 at 01:38 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Long time member, long time *nix user, about time to say hi. stuman LinuxQuestions.org Member Intro 0 10-15-2013 07:35 AM
update long time waiting Pedroski Fedora 4 04-22-2012 10:06 AM
Waiting for a long time :( kidsin LQ Suggestions & Feedback 6 08-27-2011 05:04 PM
waiting for long time ahmed gamal Slackware 2 11-05-2008 01:37 PM
Mandrake 10.1 - long waiting before login Belegdol Mandriva 15 11-17-2004 07:27 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration