ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
> If my command launches 15 processes, are 14 of them sleeping at any given time?
The number of CPU-cores limits the number of paralel running processes, also the fact that the reader of a pipe has to wait for the writer of that pipe. (And vice verse, as the pipe has a limited capacity.)
My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
this is the official documentation: https://docs.kernel.org/scheduler/index.html
The people who created that kernel code made a lot of work to optimize this, so it is definitely not sub-optimal. Anyway, if you think you can do a better job just do it (or at least explain how can it be improved).
Yes. The short answer is yes, and it's been explained to you exactly how that works. How many times do you need to hear "yes" before it gets through to you?
Actually, the short answer is "no". Given a pipeline of 15 stages, 14 won't always be sleeping. The in/out data rates of the individual stages of the pipeline will determine which individual components are blocking on input or output — owing to full or empty fifo buffers — at any given time.
Assuming there is no competition for cpu resource, the overall performance of the pipeline will be dependent upon data-flow: constrained by the rate of its slowest component.
You can't just take a single example and use it as a basis for a performance discussion, unless the command sequence is one you specifically wish to optimize. But when you expand the question to cite "if 15 processes ...", then that's a different, but still specific and singular question.
Recommend reviewing the link provided by pan64 about the scheduler.
Put another way: You don't have a performance issue until you can demonstrate a measurable issue.
Whether the answer to your question is yes or no, what difference is it going to make?
If it runs quickly enough, nobody cares what core it executes on.
Imagine if there was a terrabyte of data. Would it be worthwhile to consider optimizations?
Quote:
If it doesn't run quickly enough, switching to forced parallel execution is going to make the code less maintainable, and very likely having less of an impact than optimising whatever algorithm(s) might be involved and/or using a lower-level language for the task.
Sometimes there's a better way to do something, and I don't know about it.
Quote:
It's really not.
The prior script was IO bound. The new one should be CPU bound.
Quote:
Even if you add the missing -z argument to sed, the backslash shouldn't be escaped and the group is unnecessary, but it's far simpler to use tr to replace newlines.
The backslash is needed to escape backslashes.
Quote:
But if we pretend you did use tr, you still have to consider that jq (unlike grep/sed) will wait for stdin to complete before parsing the object, so it doesn't demonstrate any meaningful simultaneous execution.
When jq is processing data, bunzip (theoretically at least) can read more data from disk and decompress it. Likewise, when sed is processing data, jq can read from stdin and process that.
In other words, if the programs do not execute simultaneously, that would be like having assembly line with one car. In the (newer) code example, there's clearly room for eight cars.
You can't just take a single example and use it as a basis for a performance discussion, unless the command sequence is one you specifically wish to optimize. But when you expand the question to cite "if 15 processes ...", then that's a different, but still specific and singular question.
Recommend reviewing the link provided by pan64 about the scheduler.
The link looks potentially useful. Although it's a lot to digest for a question that I had intended to be a broadly generalizable about Bash programming.
The link looks potentially useful. Although it's a lot to digest for a question that I had intended to be a broadly generalizable about Bash programming.
But yet you expanded your question beyond bash.
That's all I can help with here, sorry but the question continues to change scope and it's unclear what you're looking for.
yes, it is completely unclear what is it all about?
Using a pipe chains like that mentioned bzip| jq | cut | rev | whatever mean: all these commands will be started in the same time, will be executed independently from each other, only the output of one will be sent to the next in the chain. They will not wait for completion of any other member of the chain, but will wait for something to work with (=input data).
And it is actually completely independent from bash, this is handled, executed, processed and driven by the kernel. bash is just a language where you can construct pipe chains like this, but not the only one.
(therefore the speed of the execution again does not depend on bash at all)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.