LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-20-2022, 12:15 PM   #31
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 241

Original Poster
Rep: Reputation: 31

Quote:
Originally Posted by pan64 View Post
yes, it is completely unclear what is it all about?
I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.

Quote:
They will not wait for completion of any other member of the chain, but will wait for something to work with (=input data). Using a pipe chains like that mentioned bzip| jq | cut | rev | whatever mean: all these commands will be started in the same time, will be executed independently from each other, only the output of one will be sent to the next in the chain.
So every car on the assembly line has work done on it at the same time. I'm not sure where this information is coming from (the link?), but it appears to be the answer to my question.

Quote:
And it is actually completely independent from bash, this is handled, executed, processed and driven by the kernel. bash is just a language where you can construct pipe chains like this, but not the only one.
(therefore the speed of the execution again does not depend on bash at all)
This is I did not know. I had been under the impression that there was more abstraction between the shell and the kernel. In many other languages, performing a comparable operation would involve multiple threads/processes and data queues. It's much more verbose.
 
Old 08-20-2022, 12:19 PM   #32
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,869
Blog Entries: 1

Rep: Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870
Yes, there are processes here and message queues (called pipes) as well.
 
Old 08-20-2022, 12:33 PM   #33
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,241

Rep: Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322
Quote:
Originally Posted by halfpower View Post
I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.
And you have already received the answer many times:

Blocking. They literally and technically do blocking I/O.

Each process tries to read from the one on its left, sleeps (yes, literally and technically) until it can, wakes up, does its CPU-bound thing on the data that it just read, and then tries to read again. Dot dot dot. I'm sure you can recognize this as a restatement of what pan64 just said. And what multiple others, including me, have told you many times before that.

Obviously, this is synchronous. Not asynchronous. It's also obvious that these are not "CPU-bound". Especially considering the fact that they probably read and write line by line.

I don't know why you're incapable of processing this reality, or why you're so determined to be told something else.

Last edited by dugan; 08-21-2022 at 02:22 PM.
 
Old 08-21-2022, 02:56 AM   #34
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,930

Rep: Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321Reputation: 7321
Quote:
Originally Posted by halfpower View Post
I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.
You need to specify what do you mean by that. The processes started in the same time, they are running/executed independently from each other, therefore they do not block each other. But they send data to the next one (in the chain), so they can block the execution if "currently" no data to send or cannot receive more data.
And if one of them crashed all the others will be stopped too (usually). see sigpipe about it.
Quote:
Originally Posted by halfpower View Post
So every car on the assembly line has work done on it at the same time. I'm not sure where this information is coming from (the link?), but it appears to be the answer to my question.
That is the nature of the operating system. bash as a simple program does not have that kind of process control. It is just handled by the kernel.

Quote:
Originally Posted by halfpower View Post
This is I did not know. I had been under the impression that there was more abstraction between the shell and the kernel. In many other languages, performing a comparable operation would involve multiple threads/processes and data queues. It's much more verbose.
Even multithreading cannot be handled by the process itself, but the kernel (that means how are threads executed).
 
Old 08-21-2022, 09:29 AM   #35
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Rep: Reputation: 471Reputation: 471Reputation: 471Reputation: 471Reputation: 471
Quote:
Originally Posted by halfpower View Post
I'm trying to ask whether programs, when piped together as a series at the Bash prompt, are block or non-blocking.
I believe that pipes are both blocking and buffered.

The receiver is blocked when the buffer is empty. The sender is blocked when the buffer is full. At all other times, they run in parallel. The buffer is around 64KB.
Ed
 
Old 08-21-2022, 08:08 PM   #36
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
When you use the "pipe" operator, here is what happens:
  • All of the processes are created simultaneously as child-processes of the shell.
  • The STDOUT of one process is connected, via a "pipe file," to the STDIN of the next.
  • (The 2> construct can be used to redirect the STDERR output stream, as well.)
  • Other options exist, such as tee.
Now, the shell simply waits for all of the children to finish.

A "pipe file" is a virtual file that acts as a FIFO (first in, first out) queue. It buffers the data that is put into it, up to a point: a writer can be blocked if the pipe is full, and a reader will be blocked if the pipe is empty.

The Unix/Linux world is filled with various commands which are designed to "filter" or otherwise process whatever they read from STDIN before writing it to STDOUT. They are specifically intended to be used with this "piping" arrangement, and they are often comparatively special-purpose.

Last edited by sundialsvcs; 08-22-2022 at 09:12 AM.
 
Old 08-21-2022, 11:05 PM   #37
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,803

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by halfpower View Post
If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
(Emph. mine)

Is it really "sub-optimal" for a process to be sitting in a state waiting on input from a pipe? Data needs to trickle through the pipes and the fifteenth process can't do anything until the fourteen predecessors have examined the data and passed something into the pipeline.
 
Old 08-22-2022, 09:03 AM   #38
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,241

Rep: Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322
Quote:
Originally Posted by pan64 View Post
yes, it is completely unclear what is it all about?
He wants to be told that it's faster to do something with 15 programs in a pipeline, than to just do it with a single program compiled from C.

Quote:
Originally Posted by halfpower View Post
My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
Pipelines are indeed very sub-optimal, and you would never use them if you actually need performance.

You can see this (and I mean literally: you can see the performance difference) if you write three programs using a slow language like Python and pipe them, and then write another Python program that does the same thing. (Yes I've done this).

Last edited by dugan; 08-22-2022 at 09:22 AM.
 
Old 08-22-2022, 09:14 AM   #39
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
I think that the essential idea here is that "you can do a lot of things without 'writing a single program compiled from C.'" You can do it from the command-line. And, in a very short time, this becomes quite natural. The early designers of Unix®, having entirely rejected the Multics project that they were once part of, hit upon a very good idea that has withstood the pragmatic tests of time.

The concept of "little-bitty special purpose processes, 'piped' together," turns out to be extremely useful. And the Unix/Linux shell programs make this trivially easy to do.

For instance: "Find all of the files in directory X which contain the phrase 'xyz' in their filename, and then remove them." Why should I have to "write a 'C' program from scratch" in order to do such a very simple thing? I just now did that, and it took me all of ten seconds by "piping" the output of "find" into "xargs rm." Any other solution would have required the implementors of "rm" to vastly increase the complexity of their command program to anticipate all possible needs such as mine.

Last edited by sundialsvcs; 08-22-2022 at 09:22 AM.
 
1 members found this post helpful.
Old 08-22-2022, 09:46 AM   #40
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Rep: Reputation: 471Reputation: 471Reputation: 471Reputation: 471Reputation: 471
Multi-threading and multiple processes are both useful techniques in different contexts.

Multi-threading provides fast communication via shared memory. Shared memory also means that the overall memory footprint is smaller. The downside is that shared memory does not scale to large numbers of CPU cores.

Multi-computers can scale to very large numbers (as in supercomputers and datacenters). The downside is that communication is slow. The remedy is to duplicate some work on each node.

Both techniques are in use: large multi-computers contain tens of thousands of nodes, and each node is a shared memory multiprocessor.
Ed
 
Old 08-22-2022, 01:06 PM   #41
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 241

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by rnturn View Post
(Emph. mine)

Is it really "sub-optimal" for a process to be sitting in a state waiting on input from a pipe? Data needs to trickle through the pipes and the fifteenth process can't do anything until the fourteen predecessors have examined the data and passed something into the pipeline.
Take the symbolic command "cmd1 | cmd2". Say that "cmd1" produces a bit of data and passes it to "cmd2". "cmd2" is now working on the data, but what is "cmd1" doing? It could sit there and wait for "cmd2" to finish. Alternatively, it could get to work on producing more data, and, thereby pass the next bit of data to "cmd2" somewhat sooner. If there are 15 commands instead of two, then the performance impact is likely to be more dramatic.

Last edited by halfpower; 08-22-2022 at 01:26 PM. Reason: typo
 
Old 08-22-2022, 01:25 PM   #42
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 241

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by sundialsvcs View Post
A "pipe file" is a virtual file that acts as a FIFO (first in, first out) queue. It buffers the data that is put into it, up to a point: a writer can be blocked if the pipe is full, and a reader will be blocked if the pipe is empty.
Okay, so named pipes have a FIFO queue. Am I correct in thinking that this FIFO queue is absent with anonymous pipes?

Last edited by halfpower; 08-22-2022 at 02:46 PM.
 
Old 08-22-2022, 02:55 PM   #43
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,241

Rep: Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322Reputation: 5322
What an incredibly strange thing to say.

A pipe (named otherwise) is a FIFO queue.

Last edited by dugan; 08-22-2022 at 02:56 PM.
 
Old 08-22-2022, 02:56 PM   #44
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,781

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
Quote:
Originally Posted by sundialsvcs View Post
For instance: "Find all of the files in directory X which contain the phrase 'xyz' in their filename, and then remove them." Why should I have to "write a 'C' program from scratch" in order to do such a very simple thing? I just now did that, and it took me all of ten seconds by "piping" the output of "find" into "xargs rm."
It's even faster, and less typing, to use "-delete" within the find command itself.
 
Old 08-22-2022, 03:13 PM   #45
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
Quote:
Originally Posted by rknichols View Post
It's even faster, and less typing, to use "-delete" within the find command itself.
"TMTOWTDI = There's More Than One Way To Do It!™"
 
  


Reply

Tags
asynchronous task, blocking, command line, concurrency, pipes



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
What route to access daisy chained 2d router 192.168.1.1 after 192.168.0.1 (subnets?) Emmanuel_uk Linux - Networking 6 05-05-2006 01:47 AM
how to create a chained js web form ? graziano1968 Programming 2 11-12-2004 03:55 AM
mounting a daisy chained firewire drive jamida Linux - Newbie 1 05-30-2004 09:08 PM
Daisy Chained Parallel Devices in Linux? JockVSJock Linux - Hardware 2 03-29-2004 08:58 PM
Daisy-chained || printer beckwith Linux - Hardware 0 08-28-2003 02:50 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration