Do chained Bash commands block each other?
Below is an example that extracts links from a web page.
Code:
bunzip2 really_big_file.bz2 --stdout\ |
The pipes take care of it.
The output of the first one is the input to the next one. And so on. Yes they will run at the same time but the pipes feed the following two where they depend on the earlier process. What does it matter how a CPU chooses to run processes? |
They run simultaneously, and each one blocks the one to its right.
Sed blocks until it gets a line from grep, and grep blocks until it gets a line from curl. |
Quote:
|
Right here the network speed will be the limiting factor.
|
And it goes without saying that processing HTML without a proper parser can be quite brittle. An unexpected space or line break, although valid html, with choke the sed script.
Code:
curl https://example.com \ PS. clownflare tries to block me from posting the 'concat' part above. >:( |
Quote:
|
Quote:
Asking about CPU core allocation is a micro optimization; if you have a performance issue, you have bigger factors to address first. Is there a larger problem that prompted this line of thinking, or is it just a "what if" thought? |
Depends on the web page, grep will do that by itself. What the OP is trying to do anyway. Although a web browser or html/xml parser will do it better.
Code:
agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0" |
Are you actually having a performance issue that you’re trying to solve here, or was this meant to be a model?
Keep in mind that IPC isn’t fast. The typical performance-centric approach is to to break up the data, distribute each chunk to a program that does not know about the others and which processes its part of the data to a central location, and then wait for all of the individual programs to finish. BTW, you do understand that the Linux kernel puts processes to "sleep" (in quotes because it's a technical word) when they're waiting for input, right? |
Quote:
Code:
bunzip2 really_big_file.bz2 --stdout\ |
Neither example would benefit from multiple cores. The reasons have been said many times, but jq in particular needs to wait for the entire curl or bunzip command to finish before it even starts.
Also, you can use bzcat instead of bunzip2 --stdout. Quote:
Or have you actually seen top/iostat/sar output showing that? |
Quote:
Quote:
Quote:
|
Quote:
Put another way: You don't have a performance issue until you can demonstrate a measurable issue. Whether the answer to your question is yes or no, what difference is it going to make? If it runs quickly enough, nobody cares what core it executes on. If it doesn't run quickly enough, switching to forced parallel execution is going to make the code less maintainable, and very likely having less of an impact than optimising whatever algorithm(s) might be involved and/or using a lower-level language for the task. Quote:
Even if you add the missing -z argument to sed, the backslash shouldn't be escaped and the group is unnecessary, but it's far simpler to use tr to replace newlines. But if we pretend you did use tr, you still have to consider that jq (unlike grep/sed) will wait for stdin to complete before parsing the object, so it doesn't demonstrate any meaningful simultaneous execution. |
Quote:
I'm starting to think you're deliberately ignoring the actual answers. |
All times are GMT -5. The time now is 04:32 AM. |