Do chained Bash commands block each other?

halfpower · 08-20-2022, 12:15 PM

Quote:

Originally Posted by pan64

yes, it is completely unclear what is it all about?

I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.

Quote:

They will not wait for completion of any other member of the chain, but will wait for something to work with (=input data). Using a pipe chains like that mentioned bzip| jq | cut | rev | whatever mean: all these commands will be started in the same time, will be executed independently from each other, only the output of one will be sent to the next in the chain.

So every car on the assembly line has work done on it at the same time. I'm not sure where this information is coming from (the link?), but it appears to be the answer to my question.

Quote:

And it is actually completely independent from bash, this is handled, executed, processed and driven by the kernel. bash is just a language where you can construct pipe chains like this, but not the only one.
(therefore the speed of the execution again does not depend on bash at all)

This is I did not know. I had been under the impression that there was more abstraction between the shell and the kernel. In many other languages, performing a comparable operation would involve multiple threads/processes and data queues. It's much more verbose.

NevemTeve · 08-20-2022, 12:19 PM

Yes, there are processes here and message queues (called pipes) as well.

dugan · 08-20-2022, 12:33 PM

Quote:

Originally Posted by halfpower

I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.

And you have already received the answer many times:

Blocking. They literally and technically do blocking I/O.

Each process tries to read from the one on its left, sleeps (yes, literally and technically) until it can, wakes up, does its CPU-bound thing on the data that it just read, and then tries to read again. Dot dot dot. I'm sure you can recognize this as a restatement of what pan64 just said. And what multiple others, including me, have told you many times before that.

Obviously, this is synchronous. Not asynchronous. It's also obvious that these are not "CPU-bound". Especially considering the fact that they probably read and write line by line.

I don't know why you're incapable of processing this reality, or why you're so determined to be told something else.

pan64 · 08-21-2022, 02:56 AM

Quote:

Originally Posted by halfpower

I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.

You need to specify what do you mean by that. The processes started in the same time, they are running/executed independently from each other, therefore they do not block each other. But they send data to the next one (in the chain), so they can block the execution if "currently" no data to send or cannot receive more data.
And if one of them crashed all the others will be stopped too (usually). see sigpipe about it.

Quote:

Originally Posted by halfpower

So every car on the assembly line has work done on it at the same time. I'm not sure where this information is coming from (the link?), but it appears to be the answer to my question.

That is the nature of the operating system. bash as a simple program does not have that kind of process control. It is just handled by the kernel.

Quote:

Originally Posted by halfpower

This is I did not know. I had been under the impression that there was more abstraction between the shell and the kernel. In many other languages, performing a comparable operation would involve multiple threads/processes and data queues. It's much more verbose.

Even multithreading cannot be handled by the process itself, but the kernel (that means how are threads executed).

EdGr · 08-21-2022, 09:29 AM

Quote:

Originally Posted by halfpower

I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.

I believe that pipes are both blocking and buffered.

The receiver is blocked when the buffer is empty. The sender is blocked when the buffer is full. At all other times, they run in parallel. The buffer is around 64KB.
Ed

sundialsvcs · 08-21-2022, 08:08 PM

When you use the "pipe" operator, here is what happens:

All of the processes are created simultaneously as child-processes of the shell.
The STDOUT of one process is connected, via a "pipe file," to the STDIN of the next.
(The 2> construct can be used to redirect the STDERR output stream, as well.)
Other options exist, such as tee.

Now, the shell simply waits for all of the children to finish.

A "pipe file" is a virtual file that acts as a FIFO (first in, first out) queue. It buffers the data that is put into it, up to a point: a writer can be blocked if the pipe is full, and a reader will be blocked if the pipe is empty.

The Unix/Linux world is filled with various commands which are designed to "filter" or otherwise process whatever they read from STDIN before writing it to STDOUT. They are specifically intended to be used with this "piping" arrangement, and they are often comparatively special-purpose.

rnturn · 08-21-2022, 11:05 PM

Quote:

Originally Posted by halfpower

If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.

(Emph. mine)

Is it really "sub-optimal" for a process to be sitting in a state waiting on input from a pipe? Data needs to trickle through the pipes and the fifteenth process can't do anything until the fourteen predecessors have examined the data and passed something into the pipeline.

dugan · 08-22-2022, 09:03 AM

Quote:

Originally Posted by pan64

yes, it is completely unclear what is it all about?

He wants to be told that it's faster to do something with 15 programs in a pipeline, than to just do it with a single program compiled from C.

Quote:

Originally Posted by halfpower

My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.

Pipelines are indeed very sub-optimal, and you would never use them if you actually need performance.

You can see this (and I mean literally: you can see the performance difference) if you write three programs using a slow language like Python and pipe them, and then write another Python program that does the same thing. (Yes I've done this).

sundialsvcs · 08-22-2022, 09:14 AM

I think that the essential idea here is that "you can do a lot of things without 'writing a single program compiled from C.'" You can do it from the command-line. And, in a very short time, this becomes quite natural. The early designers of Unix®, having entirely rejected the Multics project that they were once part of, hit upon a very good idea that has withstood the pragmatic tests of time.

The concept of "little-bitty special purpose processes, 'piped' together," turns out to be extremely useful. And the Unix/Linux shell programs make this trivially easy to do.

For instance: "Find all of the files in directory X which contain the phrase 'xyz' in their filename, and then remove them." Why should I have to "write a 'C' program from scratch" in order to do such a very simple thing? I just now did that, and it took me all of ten seconds by "piping" the output of "find" into "xargs rm." Any other solution would have required the implementors of "rm" to vastly increase the complexity of their command program to anticipate all possible needs such as mine.

EdGr · 08-22-2022, 09:46 AM

Multi-threading and multiple processes are both useful techniques in different contexts.

Multi-threading provides fast communication via shared memory. Shared memory also means that the overall memory footprint is smaller. The downside is that shared memory does not scale to large numbers of CPU cores.

Multi-computers can scale to very large numbers (as in supercomputers and datacenters). The downside is that communication is slow. The remedy is to duplicate some work on each node.

Both techniques are in use: large multi-computers contain tens of thousands of nodes, and each node is a shared memory multiprocessor.
Ed

halfpower · 08-22-2022, 01:06 PM

Quote:

Originally Posted by rnturn

(Emph. mine)

Is it really "sub-optimal" for a process to be sitting in a state waiting on input from a pipe? Data needs to trickle through the pipes and the fifteenth process can't do anything until the fourteen predecessors have examined the data and passed something into the pipeline.

Take the symbolic command "cmd1 | cmd2". Say that "cmd1" produces a bit of data and passes it to "cmd2". "cmd2" is now working on the data, but what is "cmd1" doing? It could sit there and wait for "cmd2" to finish. Alternatively, it could get to work on producing more data, and, thereby pass the next bit of data to "cmd2" somewhat sooner. If there are 15 commands instead of two, then the performance impact is likely to be more dramatic.

halfpower · 08-22-2022, 01:25 PM

Quote:

Originally Posted by sundialsvcs

A "pipe file" is a virtual file that acts as a FIFO (first in, first out) queue. It buffers the data that is put into it, up to a point: a writer can be blocked if the pipe is full, and a reader will be blocked if the pipe is empty.

Okay, so named pipes have a FIFO queue. Am I correct in thinking that this FIFO queue is absent with anonymous pipes?

dugan · 08-22-2022, 02:55 PM

What an incredibly strange thing to say.

A pipe (named otherwise) is a FIFO queue.

rknichols · 08-22-2022, 02:56 PM

Quote:

Originally Posted by sundialsvcs

For instance: "Find all of the files in directory X which contain the phrase 'xyz' in their filename, and then remove them." Why should I have to "write a 'C' program from scratch" in order to do such a very simple thing? I just now did that, and it took me all of ten seconds by "piping" the output of "find" into "xargs rm."

It's even faster, and less typing, to use "-delete" within the find command itself.

sundialsvcs · 08-22-2022, 03:13 PM

Quote:

Originally Posted by rknichols

It's even faster, and less typing, to use "-delete" within the find command itself.

"TMTOWTDI = There's More Than One Way To Do It!™"