LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-17-2022, 06:05 PM   #1
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 234

Rep: Reputation: 30
Question Do chained Bash commands block each other?


Below is an example that extracts links from a web page.
Code:
bunzip2 really_big_file.bz2 --stdout\
| jq .text\ 
| cut -c2- | rev | cut -c2- | rev\
| sed -E 's:(\\n): :g;'\
| grep -P "[a-zA-Z]"
It's clear to me that each command will work before the others have completed, but will they block each other too? Will curl, grep, and sed run simultaneously on different logical cores?

Last edited by halfpower; 08-18-2022 at 04:24 PM. Reason: Replace code example to better illustrate the problem
 
Old 08-17-2022, 06:18 PM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,707
Blog Entries: 13

Rep: Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769Reputation: 4769
The pipes take care of it.

The output of the first one is the input to the next one. And so on.

Yes they will run at the same time but the pipes feed the following two where they depend on the earlier process.

What does it matter how a CPU chooses to run processes?

Last edited by rtmistler; 08-17-2022 at 06:25 PM.
 
Old 08-17-2022, 07:15 PM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,632

Rep: Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023
They run simultaneously, and each one blocks the one to its right.

Sed blocks until it gets a line from grep, and grep blocks until it gets a line from curl.
 
Old 08-17-2022, 08:31 PM   #4
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 234

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by rtmistler View Post
What does it matter how a CPU chooses to run processes?
When the primary factor limiting execution time is CPU power, then, it can affect total runtime. Theoretically, it should become a big issue when there's a large amount of data, a lot of commands chained together, and many unutilized CPU cores.
 
Old 08-17-2022, 11:50 PM   #5
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,352
Blog Entries: 1

Rep: Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656
Right here the network speed will be the limiting factor.
 
1 members found this post helpful.
Old 08-18-2022, 12:35 AM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 6,414
Blog Entries: 3

Rep: Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347Reputation: 3347
And it goes without saying that processing HTML without a proper parser can be quite brittle. An unexpected space or line break, although valid html, with choke the sed script.

Code:
curl https://example.com \
| tidy -numeric -asxml \
| xmlstarlet sel -N xhtml="http://www.w3.org/1999/xhtml" \
        -t -m '//xhtml:a[@href]'  -v 'concat(@href," ",.)' -n
The xmlstarlet utility is just one option. The are parsers for Perl, Python3, and other scripting languages.

PS. clownflare tries to block me from posting the 'concat' part above. >:(
 
1 members found this post helpful.
Old 08-18-2022, 01:29 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 18,979

Rep: Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447Reputation: 6447
Quote:
Originally Posted by halfpower View Post
When the primary factor limiting execution time is CPU power, then, it can affect total runtime. Theoretically, it should become a big issue when there's a large amount of data, a lot of commands chained together, and many unutilized CPU cores.
No. Obviously the computing capabilities will limit the speed of execution, but linux (the kernel) will be able to utilize all the cores (not only one) and therefore there will be no unutilized cores (don't forget, all the system is running, containing at least several hundred processes).
 
1 members found this post helpful.
Old 08-18-2022, 06:52 AM   #8
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 2,784

Rep: Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048
Quote:
Originally Posted by halfpower View Post
Below is an example that extracts links from a web page.
Which is ugly, flawed, and unnecessarily slow.

Asking about CPU core allocation is a micro optimization; if you have a performance issue, you have bigger factors to address first.

Is there a larger problem that prompted this line of thinking, or is it just a "what if" thought?

 
Old 08-18-2022, 08:03 AM   #9
teckk
Senior Member
 
Registered: Oct 2004
Distribution: Arch
Posts: 4,366
Blog Entries: 5

Rep: Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505Reputation: 1505
Depends on the web page, grep will do that by itself. What the OP is trying to do anyway. Although a web browser or html/xml parser will do it better.

Code:
agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0"

url1="https://www.linuxquestions.org/questions/programming-9/do-chained-bash-commands-block-each-other-4175715802/"

lynx -useragent="$agent" -dump -source "$url1" | grep -Eo "(http|https).*" > file1.txt

lynx -useragent="$agent" -dump -listonly "$url1" > file2.txt


url2="https://en.m.wikipedia.org/wiki/Carbon"

curl -LA "$agent" "$url2" | grep -oE 'href="([^"#]+)"' > file3.txt
The more pipes the more subshells.
 
Old 08-18-2022, 10:52 AM   #10
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,632

Rep: Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023
Are you actually having a performance issue that you’re trying to solve here, or was this meant to be a model?

Keep in mind that IPC isn’t fast. The typical performance-centric approach is to to break up the data, distribute each chunk to a program that does not know about the others and which processes its part of the data to a central location, and then wait for all of the individual programs to finish.

BTW, you do understand that the Linux kernel puts processes to "sleep" (in quotes because it's a technical word) when they're waiting for input, right?

Last edited by dugan; 08-18-2022 at 11:50 AM.
 
Old 08-18-2022, 04:16 PM   #11
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 234

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by NevemTeve View Post
Right here the network speed will be the limiting factor.
I was trying demonstrate the issue. This might be a better example:
Code:
bunzip2 really_big_file.bz2 --stdout\
| jq .text\ 
| sed -E 's:(\\n): :g;'
 
Old 08-18-2022, 04:37 PM   #12
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,632

Rep: Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023
Neither example would benefit from multiple cores. The reasons have been said many times, but jq in particular needs to wait for the entire curl or bunzip command to finish before it even starts.

Also, you can use bzcat instead of bunzip2 --stdout.

Quote:
Originally Posted by halfpower View Post
When the primary factor limiting execution time is CPU power
It's not. End of story.

Or have you actually seen top/iostat/sar output showing that?

Last edited by dugan; 08-18-2022 at 04:54 PM.
 
Old 08-18-2022, 04:57 PM   #13
halfpower
Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 234

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by dugan View Post
Are you actually having a performance issue that youíre trying to solve here, or was this meant to be a model?
The question is more theoretical in nature. The code (which has been edited) is only intended to illustrate the issue.

Quote:
Keep in mind that IPC isnít fast. The typical performance-centric approach is to to break up the data, distribute each chunk to a program that does not know about the others and which processes its part of the data to a central location, and then wait for all of the individual programs to finish.
Some data is stored in a monolithic format. At the present time, I have no method or on-the-fly splitting.

Quote:
Originally Posted by dugan View Post
BTW, you do understand that the Linux kernel puts processes to "sleep" (in quotes because it's a technical word) when they're waiting for input, right?
My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
 
Old 08-18-2022, 05:00 PM   #14
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 2,784

Rep: Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048Reputation: 2048
Quote:
Originally Posted by halfpower View Post
I was trying demonstrate the issue.
What issue?

Put another way: You don't have a performance issue until you can demonstrate a measurable issue.

Whether the answer to your question is yes or no, what difference is it going to make?

If it runs quickly enough, nobody cares what core it executes on.

If it doesn't run quickly enough, switching to forced parallel execution is going to make the code less maintainable, and very likely having less of an impact than optimising whatever algorithm(s) might be involved and/or using a lower-level language for the task.


Quote:
This might be a better example:
Code:
bunzip2 really_big_file.bz2 --stdout\
| jq .text\ 
| sed -E 's:(\\n): :g;'
It's really not.

Even if you add the missing -z argument to sed, the backslash shouldn't be escaped and the group is unnecessary, but it's far simpler to use tr to replace newlines.

But if we pretend you did use tr, you still have to consider that jq (unlike grep/sed) will wait for stdin to complete before parsing the object, so it doesn't demonstrate any meaningful simultaneous execution.


Last edited by boughtonp; 08-18-2022 at 05:02 PM.
 
Old 08-18-2022, 05:22 PM   #15
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,632

Rep: Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023Reputation: 5023
Quote:
Originally Posted by halfpower View Post
My question was whether or not the processes would block the execution of the other processes. In other words: If my command launches 15 processes, are 14 of them sleeping at any given time? If they are, it is very sub-optimal.
Yes. The short answer is yes, and it's been explained to you exactly how that works. How many times do you need to hear "yes" before it gets through to you?

I'm starting to think you're deliberately ignoring the actual answers.

Last edited by dugan; 08-18-2022 at 07:58 PM.
 
  


Reply

Tags
asynchronous task, blocking, command line, concurrency, pipes


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
What route to access daisy chained 2d router 192.168.1.1 after 192.168.0.1 (subnets?) Emmanuel_uk Linux - Networking 6 05-05-2006 01:47 AM
how to create a chained js web form ? graziano1968 Programming 2 11-12-2004 03:55 AM
mounting a daisy chained firewire drive jamida Linux - Newbie 1 05-30-2004 09:08 PM
Daisy Chained Parallel Devices in Linux? JockVSJock Linux - Hardware 2 03-29-2004 08:58 PM
Daisy-chained || printer beckwith Linux - Hardware 0 08-28-2003 02:50 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration