Bedrock on a Void system: Worker freezes for a while
Several weeks ago I set up a Void Linux install. Some programs, such as the Worker file manager, I had to compile because they are not in the Void repos. This is routine: no problems.
Today I installed Bedrock on Void. Rebooted. No obvious problems. Being cautious, I decided to not add any strata right away, just take a few days, on and off, to run things and see how well it all works. Sound, network, xorg, all seem fine. I also have a partition with Devuan on it and a partition which is currently used as /home on the Devuan OS. On the Bedrock OS, that same partition is mounted as /other_home. Those partitions are ext4. From within Worker on Bedrock, I navigated to /other_home to pick a video to play and ... Worker froze for a good half-minute. It hasn't done this before, ever. In a terminal, I ran top and, as expected, saw the CPU was uncharacteristically high for worker: it was the topmost, using the most CPU. Unexpectedly, the second-highest CPU using process was ... etcfs. My understanding is that etcfs is part of bedrock, and is unique to bedrock? Worker had two directories open, one was the small home directory of my Void non-root user, and the other was the directory on /other_home. Worker was not touching /etc. The only thing unusual is that the /other_home directory has roughly twelve thousand files in my non-root user's home directory. Normally Worker loading that directory gives one to two seconds of pause. I'm wondering what is going on and if anyone can clue me in please? Thank you. p.s. Intel 64bit p.p.s. after the long freeze, Worker and etcfs returned to the usual near-zero cpu usage. |
Quote:
Quote:
Userland programs can't do stuff with/to files through the virtual filesystem layer directly; instead, they ask the kernel to do it on their behalf. This is how permissions work - the kernel can check if the program has sufficient permissions and, if not, refuse to do the requested operation. On Bedrock, requests to /etc get redirected through a Bedrock program called etcfs which does work on the calling program's behalf. This adds a tiny bit of overhead to request into /etc. In practice this is negligible for a reasonable number of requests to /etc; on the order of a hundredth or thousandth of a second per request. However, if something makes hundreds or thousands of requests to /etc, the overhead does add up to something noticeable. On a typical system, CPU usage tracking software such as top usually understands this and categorizes both the userland program CPU usage and the kernel CPU usage that happens on the program's behalf under just the program. However, such software isn't aware of the redirect stuff and tends to categorize work etcfs is doing on behalf of another program as etcfs' own work. Reproduce the issue, watching top before, during, and after the event. If etcfs CPU usage is consistently higher than expected, something else is probably spamming /etc and this is unrelated to Worker. If both Worker and etcfs spike at the same time and quiet at the same time, then it's likely Worker that's spamming /etc. Quote:
If you're just guessing, try actually checking. Again, I recommend strace for this. If you're unfamiliar with strace, try running Worker with: Code:
strace -tvf -s999 -o /tmp/log <command to start Worker> It's possible to get logging from etcfs' side, and I can walk you through that if you insist, but it's kind of a pain. I'm completely rewriting etcfs for 0.8.0 Naga which should make getting quick statistics about processes that spam etcfs much easier, but that's won't be ready for a while. |
It's reproducible. Navigating away from that directory and back, or even just clicking "reload" will cause it to happen. Each time, both Worker and etcfs have high CPU usage. This settles down, on both processes, after that directory is loaded.
I ran strace & worker. That was a very good idea. Thank you. Oh my. Using strace was most revealing. I am very surprised at the results. It's 39 meg so I won't be pasting it here in its entirety. However, this excerpt should clarify matters considerably Code:
6731 12:51:37 geteuid() = 1000 Code:
$ cat /tmp/log | grep 'stat("/etc/localtime' | wc |
While I have some ideas to the improve performance of etcfs (including lessening the overhead of accessing /etc/localtime) in 0.8.x to a degree, I don't foresee a road where I can get it to the point where it can sustain that kind of workload without noticeable slowdown.
However, I'm doubtful Worker is similarly constrained. I suspect they could refactor things to avoid this, such fetching the time once and caching it for the duration of whatever loop is occurring here. Moreover, I wouldn't be surprised if doing so would improve performance even on non-Bedrock systems with normal /etc/localtime access overhead, although less significantly. Consider reporting your finding of excessive /etc/localtime queries (presumably caused by some current time query) to the Worker devs. |
That makes sense. Thank you.
|
So I contacted the creator of Worker and after some questions and testing he made a surprisingly small change to Worker to get it to load large directories in Bedrock. The localtime function reads /etc/localtime every time whereas localtime_r() does not. That's a new one on me. Yay!
|
Happy to hear it :)
|
All times are GMT -5. The time now is 07:23 PM. |