ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
It's clear to me that each command will work before the others have completed, but will they block each other too? Will curl, grep, and sed run simultaneously on different logical cores?
Last edited by halfpower; 08-18-2022 at 04:24 PM.
Reason: Replace code example to better illustrate the problem
What does it matter how a CPU chooses to run processes?
When the primary factor limiting execution time is CPU power, then, it can affect total runtime. Theoretically, it should become a big issue when there's a large amount of data, a lot of commands chained together, and many unutilized CPU cores.
And it goes without saying that processing HTML without a proper parser can be quite brittle. An unexpected space or line break, although valid html, with choke the sed script.
When the primary factor limiting execution time is CPU power, then, it can affect total runtime. Theoretically, it should become a big issue when there's a large amount of data, a lot of commands chained together, and many unutilized CPU cores.
No. Obviously the computing capabilities will limit the speed of execution, but linux (the kernel) will be able to utilize all the cores (not only one) and therefore there will be no unutilized cores (don't forget, all the system is running, containing at least several hundred processes).
Are you actually having a performance issue that you’re trying to solve here, or was this meant to be a model?
Keep in mind that IPC isn’t fast. The typical performance-centric approach is to to break up the data, distribute each chunk to a program that does not know about the others and which processes its part of the data to a central location, and then wait for all of the individual programs to finish.
BTW, you do understand that the Linux kernel puts processes to "sleep" (in quotes because it's a technical word) when they're waiting for input, right?
Neither example would benefit from multiple cores. The reasons have been said many times, but jq in particular needs to wait for the entire curl or bunzip command to finish before it even starts.
Also, you can use bzcat instead of bunzip2 --stdout.
Quote:
Originally Posted by halfpower
When the primary factor limiting execution time is CPU power
It's not. End of story.
Or have you actually seen top/iostat/sar output showing that?
Are you actually having a performance issue that you’re trying to solve here, or was this meant to be a model?
The question is more theoretical in nature. The code (which has been edited) is only intended to illustrate the issue.
Quote:
Keep in mind that IPC isn’t fast. The typical performance-centric approach is to to break up the data, distribute each chunk to a program that does not know about the others and which processes its part of the data to a central location, and then wait for all of the individual programs to finish.
Some data is stored in a monolithic format. At the present time, I have no method or on-the-fly splitting.
Quote:
Originally Posted by dugan
BTW, you do understand that the Linux kernel puts processes to "sleep" (in quotes because it's a technical word) when they're waiting for input, right?
My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
Put another way: You don't have a performance issue until you can demonstrate a measurable issue.
Whether the answer to your question is yes or no, what difference is it going to make?
If it runs quickly enough, nobody cares what core it executes on.
If it doesn't run quickly enough, switching to forced parallel execution is going to make the code less maintainable, and very likely having less of an impact than optimising whatever algorithm(s) might be involved and/or using a lower-level language for the task.
Even if you add the missing -z argument to sed, the backslash shouldn't be escaped and the group is unnecessary, but it's far simpler to use tr to replace newlines.
But if we pretend you did use tr, you still have to consider that jq (unlike grep/sed) will wait for stdin to complete before parsing the object, so it doesn't demonstrate any meaningful simultaneous execution.
My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
Yes. The short answer is yes, and it's been explained to you exactly how that works. How many times do you need to hear "yes" before it gets through to you?
I'm starting to think you're deliberately ignoring the actual answers.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.