LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Bedrock Linux
User Name
Password
Bedrock Linux This forum is for the discussion of Bedrock Linux.

Notices


Reply
  Search this Thread
Old 11-04-2021, 10:16 PM   #1
jr_bob_dobbs
Member
 
Registered: Mar 2009
Distribution: Bedrock, Devuan, Slackware, Linux From Scratch, Void
Posts: 602
Blog Entries: 119

Rep: Reputation: 176Reputation: 176
Bedrock on a Void system: Worker freezes for a while


Several weeks ago I set up a Void Linux install. Some programs, such as the Worker file manager, I had to compile because they are not in the Void repos. This is routine: no problems.

Today I installed Bedrock on Void. Rebooted. No obvious problems. Being cautious, I decided to not add any strata right away, just take a few days, on and off, to run things and see how well it all works. Sound, network, xorg, all seem fine.

I also have a partition with Devuan on it and a partition which is currently used as /home on the Devuan OS. On the Bedrock OS, that same partition is mounted as /other_home. Those partitions are ext4.

From within Worker on Bedrock, I navigated to /other_home to pick a video to play and ... Worker froze for a good half-minute. It hasn't done this before, ever.

In a terminal, I ran top and, as expected, saw the CPU was uncharacteristically high for worker: it was the topmost, using the most CPU. Unexpectedly, the second-highest CPU using process was ... etcfs.

My understanding is that etcfs is part of bedrock, and is unique to bedrock?

Worker had two directories open, one was the small home directory of my Void non-root user, and the other was the directory on /other_home. Worker was not touching /etc. The only thing unusual is that the /other_home directory has roughly twelve thousand files in my non-root user's home directory. Normally Worker loading that directory gives one to two seconds of pause.

I'm wondering what is going on and if anyone can clue me in please? Thank you.

p.s. Intel 64bit

p.p.s. after the long freeze, Worker and etcfs returned to the usual near-zero cpu usage.

Last edited by jr_bob_dobbs; 11-04-2021 at 10:18 PM.
 
Old 11-05-2021, 07:14 AM   #2
ParadigmComplex
Bedrock Linux Founder
 
Registered: Feb 2016
Distribution: Bedrock Linux
Posts: 173

Rep: Reputation: Disabled
Quote:
Originally Posted by jr_bob_dobbs View Post
From within Worker on Bedrock, I navigated to /other_home to pick a video to play and ... Worker froze for a good half-minute. It hasn't done this before, ever.
Was this a one-off or is it consistently reproducible? If the former, no idea. If the latter, run it through strace to see what it was doing.

Quote:
Originally Posted by jr_bob_dobbs View Post
In a terminal, I ran top and, as expected, saw the CPU was uncharacteristically high for worker: it was the topmost, using the most CPU. Unexpectedly, the second-highest CPU using process was ... etcfs.

My understanding is that etcfs is part of bedrock, and is unique to bedrock?
Yes.

Userland programs can't do stuff with/to files through the virtual filesystem layer directly; instead, they ask the kernel to do it on their behalf. This is how permissions work - the kernel can check if the program has sufficient permissions and, if not, refuse to do the requested operation. On Bedrock, requests to /etc get redirected through a Bedrock program called etcfs which does work on the calling program's behalf.

This adds a tiny bit of overhead to request into /etc. In practice this is negligible for a reasonable number of requests to /etc; on the order of a hundredth or thousandth of a second per request. However, if something makes hundreds or thousands of requests to /etc, the overhead does add up to something noticeable.

On a typical system, CPU usage tracking software such as top usually understands this and categorizes both the userland program CPU usage and the kernel CPU usage that happens on the program's behalf under just the program. However, such software isn't aware of the redirect stuff and tends to categorize work etcfs is doing on behalf of another program as etcfs' own work.

Reproduce the issue, watching top before, during, and after the event. If etcfs CPU usage is consistently higher than expected, something else is probably spamming /etc and this is unrelated to Worker. If both Worker and etcfs spike at the same time and quiet at the same time, then it's likely Worker that's spamming /etc.

Quote:
Originally Posted by jr_bob_dobbs View Post
Worker was not touching /etc.
Did you actually check for this - strace logs, working thorugh the source code - or are you just guessing here?

If you're just guessing, try actually checking. Again, I recommend strace for this. If you're unfamiliar with strace, try running Worker with:

Code:
strace -tvf -s999 -o /tmp/log <command to start Worker>
Do whatever you need to do to make Worker reproduce the issue then immediately after it stops also stop Worker. If you send me the resulting logs at /tmp/log I can help you work through them. Note strace logging slows programs down; it will take even longer to complete than it was before.

It's possible to get logging from etcfs' side, and I can walk you through that if you insist, but it's kind of a pain. I'm completely rewriting etcfs for 0.8.0 Naga which should make getting quick statistics about processes that spam etcfs much easier, but that's won't be ready for a while.
 
Old 11-05-2021, 10:25 PM   #3
jr_bob_dobbs
Member
 
Registered: Mar 2009
Distribution: Bedrock, Devuan, Slackware, Linux From Scratch, Void
Posts: 602

Original Poster
Blog Entries: 119

Rep: Reputation: 176Reputation: 176
It's reproducible. Navigating away from that directory and back, or even just clicking "reload" will cause it to happen. Each time, both Worker and etcfs have high CPU usage. This settles down, on both processes, after that directory is loaded.

I ran strace & worker. That was a very good idea. Thank you.

Oh my. Using strace was most revealing. I am very surprised at the results. It's 39 meg so I won't be pasting it here in its entirety. However, this excerpt should clarify matters considerably
Code:
6731  12:51:37 geteuid()                = 1000
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 geteuid()                = 1000
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 geteuid()                = 1000
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
6731  12:51:37 stat("/etc/localtime", {st_dev=makedev(0, 0x1b), st_ino=4465, st_mode=S_IFREG|0444, st_nlink=3, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=3536, st_atime=1636048524 /* 2021-11-04T13:55:24.366086749-0400 */, st_atime_nsec=366086749, st_mtime=1633310589 /* 2021-10-03T21:23:09-0400 */, st_mtime_nsec=0, st_ctime=1633717395 /* 2021-10-08T14:23:15.032507724-0400 */, st_ctime_nsec=32507724}) = 0
It's that, over and over again, for almost all of the 39 meg. Worker is accessing /etc/localtime over and over again, presumably for every file it finds in the current directory.

Code:
$ cat /tmp/log | grep 'stat("/etc/localtime' | wc
  53985 1544450 21682206
$ ls /other_home/bob | wc
  12548   12611  260724
$
53000 accesses of /etc/localtime by Worker in 30 (ish) seconds. Yeesh!

Last edited by jr_bob_dobbs; 11-05-2021 at 10:26 PM.
 
Old 11-06-2021, 07:59 AM   #4
ParadigmComplex
Bedrock Linux Founder
 
Registered: Feb 2016
Distribution: Bedrock Linux
Posts: 173

Rep: Reputation: Disabled
While I have some ideas to the improve performance of etcfs (including lessening the overhead of accessing /etc/localtime) in 0.8.x to a degree, I don't foresee a road where I can get it to the point where it can sustain that kind of workload without noticeable slowdown.

However, I'm doubtful Worker is similarly constrained. I suspect they could refactor things to avoid this, such fetching the time once and caching it for the duration of whatever loop is occurring here. Moreover, I wouldn't be surprised if doing so would improve performance even on non-Bedrock systems with normal /etc/localtime access overhead, although less significantly. Consider reporting your finding of excessive /etc/localtime queries (presumably caused by some current time query) to the Worker devs.
 
Old 11-15-2021, 08:57 AM   #5
jr_bob_dobbs
Member
 
Registered: Mar 2009
Distribution: Bedrock, Devuan, Slackware, Linux From Scratch, Void
Posts: 602

Original Poster
Blog Entries: 119

Rep: Reputation: 176Reputation: 176
That makes sense. Thank you.
 
Old 11-28-2021, 07:16 PM   #6
jr_bob_dobbs
Member
 
Registered: Mar 2009
Distribution: Bedrock, Devuan, Slackware, Linux From Scratch, Void
Posts: 602

Original Poster
Blog Entries: 119

Rep: Reputation: 176Reputation: 176
So I contacted the creator of Worker and after some questions and testing he made a surprisingly small change to Worker to get it to load large directories in Bedrock. The localtime function reads /etc/localtime every time whereas localtime_r() does not. That's a new one on me. Yay!
 
Old 12-07-2021, 06:56 PM   #7
ParadigmComplex
Bedrock Linux Founder
 
Registered: Feb 2016
Distribution: Bedrock Linux
Posts: 173

Rep: Reputation: Disabled
Happy to hear it
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] bedrock: bedrock poki remains on UTC no matter what jr_bob_dobbs Bedrock Linux 5 09-03-2019 06:10 AM
pthread giving error: invalid conversion from ‘void* (*)(int*)’ to ‘void* (*)(void*)’ knobby67 Programming 4 05-05-2017 11:54 AM
[SOLVED] bedrock not using brsh.conf, dropped to bedrock's /bin/sh Siljrath Linux - Distributions 2 08-25-2014 06:47 AM
[SOLVED] Warning worker http://192.168.1.5/ already used by another worker Apache problem tkmsr Linux - Server 2 06-30-2010 07:37 AM
void foo(void) and void foo() lackluster Programming 9 02-15-2003 11:57 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Bedrock Linux

All times are GMT -5. The time now is 11:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration