LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-14-2012, 12:51 PM   #16
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943

With subshells, each subshell inherits its state from the parent; changes never propagate back. With command lists, the state is shared with the parent but only if the command list is not part of a pipe.
Code:
x=5 ; { x=6 ;} ; echo $x
outputs 6, but
Code:
x=5 ; { x=6 ;} | { x=7 ;} ; echo $x
outputs 5.

I'm not sure if this is properly documented anywhere. I believe a future version of Bash might well output 7 in the latter case: running the last command list of a pipe in the original shell state might be a worthwhile optimization.

(Just to be clear: x=5;(x=6);echo $x will always output 5, as will x=5;(x=6)|(x=7);echo $x .)

Last edited by Nominal Animal; 01-14-2012 at 12:52 PM.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 01-14-2012, 02:59 PM   #17
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by Nominal Animal View Post
Code:
x=5 ; { x=6 ;} | { x=7 ;} ; echo $x
outputs 5.

I'm not sure if this is properly documented anywhere.
It is in the authoritative documentation, but the behavior is sometimes unexpected. Frankly I don't quite get the rationale of creating separate subshells for both the LHS and RHS expressions.

Pipelines - Bash Reference Manual

Quote:
Originally Posted by Bashref
Each command in a pipeline is executed in its own subshell (see Command Execution Environment).
Command Execution Environment - Bash Reference Manual

Quote:
Originally Posted by Bashref
Command substitution, commands grouped with parentheses, and asynchronous commands are invoked in a subshell environment that is a duplicate of the shell environment, except that traps caught by the shell are reset to the values that the shell inherited from its parent at invocation. Builtin commands that are invoked as part of a pipeline are also executed in a subshell environment. Changes made to the subshell environment cannot affect the shell's execution environment.
Code:
$ unset a; a="parent_shell"
$ { declare -p a > /dev/stderr; a="pipe_LHS"; } |
> { declare -p a > /dev/stderr; a="pipe_RHS"; }
declare -- a="parent_shell"
declare -- a="parent_shell"
$ declare -p a
declare -- a="parent_shell"
$
  • The value of a is set to parent_shell in the parent shell.
  • A subshell environment is created on the LHS of the pipeline.
  • LHS of the pipeline inherits the value of a from the parent shell.
  • LHS of the pipeline sets a new value of a in its own environment.
  • LHS of the pipeline ends and its environment is destroyed.
  • A subshell environment is created on the RHS of the pipeline.
  • RHS of the pipeline inherits the value of a from the parent shell.
  • RHS of the pipeline sets a new value of a in its own environment.
  • RHS of the pipeline ends and its environment is destroyed.

As each command group is enclosed in {;} (curly braces), one might expect the entire command line to share the same value of a. Instead, three separate shell environments each contain their own unique a. It is the pipeline which creates the subshells and propagates a into them.

That's how I think it works, but as I said it seems tricky to me.

Last edited by Telengard; 01-14-2012 at 03:13 PM. Reason: break example into multiple lines for easier reading
 
Old 01-15-2012, 01:12 AM   #18
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Hmm, I was always under the impression that the part before the first pipe ran in the current environment. But it looks like I was mistaken.

In any case, this should also mean that any time you use a (..) subshell in a pipeline you end up spawning two sub-shells for it, correct?
 
Old 01-15-2012, 01:38 AM   #19
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Quote:
Originally Posted by Telengard View Post
Frankly I don't quite get the rationale of creating separate subshells for both the LHS and RHS expressions.
Exactly. I kind of expect the Bash developers to eventually drop the subshell for the rightmost expression, because it is such an obvious optimization, and I believe some other shells do it already. Unfortunately, this means the final expression in a pipe sequence might, in the future, affect the parent shell state, if using command lists.

Quote:
Originally Posted by Telengard View Post
It is the pipeline which creates the subshells
Yes, exactly.

I decided to do some tests, and the results are a bit startling.
Code:
strace -qf bash -c '  date    |   cat    |   cat   ' 2>&1 | grep -ce 'clone('
strace -qf bash -c '( date  ) | ( cat  ) | ( cat  )' 2>&1 | grep -ce 'clone('
strace -qf bash -c '{ date ;} | { cat ;} | { cat ;}' 2>&1 | grep -ce 'clone('
These output the number of child processes created by Bash. In the first two cases it is 3 as one would expect; both date and cat are external commands, not Bash built-ins. However, Bash creates 6 child processes for the command list case!

(I believe this is related to the way Bash creates the implicit subshells. Normally, if there is only one command to run in a subshell, Bash exec's it, avoiding the unnecessary fork()/clone().)

Timing tests,
Code:
time bash -c 'for ((i=0; i<1000; i++)); do   date    |   cat    |   cat    ; done' 2>&1 >/dev/null
time bash -c 'for ((i=0; i<1000; i++)); do ( date  ) | ( cat  ) | ( cat  ) ; done' 2>&1 >/dev/null
time bash -c 'for ((i=0; i<1000; i++)); do { date ;} | { cat ;} | { cat ;} ; done' 2>&1 >/dev/null
has similar results. On my workstation, plain commands and explicit subshells produce consistently the same real time results, 1.45s to 1.51s, while command lists is definitely slower, about 2.30s real time.

Using more complex pipelines there is no difference between subshells and command lists:
Code:
strace -qf bash -c '( date ; date  ) | ( date ; cat  ) | ( date ; cat  )' 2>&1 | grep -ce 'clone('
strace -qf bash -c '{ date ; date ;} | { date ; cat ;} | { date ; cat ;}' 2>&1 | grep -ce 'clone('
time bash -c 'for ((i=0; i<1000; i++)); do ( date ; date  ) | ( date ; cat  ) | ( date ; cat  ) ; done' 2>&1 >/dev/null
time bash -c 'for ((i=0; i<1000; i++)); do { date ; date ;} | { date ; cat ;} | { date ; cat ;} ; done' 2>&1 >/dev/null
Bash does create an extra process (subshell) for each pipe segment, forking total 9 child processes, for both above cases. I could not measure any real difference in the timings, either.

These tests show that at least on my workstation, using explicit subshells in Bash pipelines is definitely a good idea. They do not use any extra resources compared to the alternatives, no extra syntax requirements compared to normal shell syntax, and the semantics are clear.

Quote:
Originally Posted by Telengard View Post
That's how I think it works, but as I said it seems tricky to me.
I have exactly the same understanding.

You know, up to now I have avoided using command lists in Bash. Where one might use a command list, I've used a Bash function (subshell in a pipeline) instead. Without your posts in this thread, Telengard, I would still be relying on a hazy personal preference, instead of actual knowledge. I for one have learned something new, something that I probably would not have found out on my own alone; thank you!
 
Old 01-15-2012, 03:12 AM   #20
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Quote:
Originally Posted by Nominal Animal View Post
Exactly. I kind of expect the Bash developers to eventually drop the subshell for the rightmost expression, because it is such an obvious optimization, and I believe some other shells do it already.
ksh runs the final expression in the current environment, and bash 4.2 has partially implemented the same behavior. The new lastpipe shell option enables it, but it only works when job control is disabled, so it's kind of inconvenient to use in interactive shells.
 
Old 01-15-2012, 03:47 AM   #21
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by Telengard View Post
Frankly I don't quite get the rationale of creating separate subshells for both the LHS and RHS expressions.
The shell needs to retain it's file descriptors while the commands execute, so it's intuitive from an implementation perspective. The LHS has fd 1 replaced and the RHS has fd 0 replaced.
Quote:
Originally Posted by Nominal Animal View Post
Exactly. I kind of expect the Bash developers to eventually drop the subshell for the rightmost expression, because it is such an obvious optimization, and I believe some other shells do it already. Unfortunately, this means the final expression in a pipe sequence might, in the future, affect the parent shell state, if using command lists.
I don't quite understand how this would be implemented, unless the shell temporarily copied fd 0, executed the pipeline, then copied it back. This would obviously compromise background processing and Ctrl+Z of a foreground pipeline. The other alternative would be to do what you suggest only when the last grouping consists only of built-ins, which would be horribly inconsistent. Take these two lines, for example:
Code:
while true; do echo $((val++)); sleep 1; done
while true; do echo $((val++)); sleep 1; done | while true; do head -n1; done
If you Ctrl+Z the first line it will SIGSTOP the sleep process, but when you fg it will no longer be in the loop. Because of the subshells in the second line the process group can be SIGSTOPed, and if necessary, put in the background. Without subshells you couldn't do this because fg/bg is based on process groups. There are certainly advantages to be had in non-interactive mode (scripts) and when the shell isn't a session leader, but it can be a headache when something works on the command line and not in a script.

All of these things make a lot of sense if you look at how a shell is written in C, but the syntax of bash makes it appear as though this behavior is idiosyncratic. In my opinion, things like this irritate people because you don't need to understand the internal limitations of bash in order to use it. Unless bash starts using the "system" idiom to call external programs or it starts routing all IPC itself, it will never get away from extensive use of subshells.
Kevin Barry
 
Old 01-15-2012, 05:09 AM   #22
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Quote:
Originally Posted by ta0kira View Post
I don't quite understand how this would be implemented, unless [..snip..]
Based on your nice analysis, it does seem like my fears for that kind of optimization screwing up things later on is unfounded.

Quote:
Originally Posted by ta0kira View Post
Unless bash starts using the "system" idiom to call external programs or it starts routing all IPC itself, it will never get away from extensive use of subshells.
I seriously hope it does not!

We're getting terribly off-topic here, but C system() function is a major source of security problems (related to quoting and escaping), and adding yet another "framework" for IPC will severely restrict the usability of Bash. I'm severely tempted to rant about applying modularity instead of framework paradigm, but that would be completely off-topic, and serve no purpose here really.

I thought my tests above showed that the cost of subshells in pipelines is neglible; zero for all single-command pipe segments, and only one process per pipe segment for multi-command ones. In particular,
Code:
date | # First command in the pipe,
cat  | # second command,
cat    # third command.
and
Code:
( # First command in the pipe
  date
) | (
  # Second command in the pipe
  cat
) | (
  # Third command in the pipe
  cat
)
use the same (minimum!) number of processes, CPU time, and wall clock time. The equivalent code snippet using command lists uses three extra child processes on Bash-4.2.10.

The comment style for the first code example does work in Bash (and many other shells like tcsh, too), but I have not found it explicitly documented as working anywhere. I believe it is implicit, perhaps a side effect of the way commands are parsed, rather than anything intentional.

The second code snippet, the one using subshells, is explicitly documented. (In particular, the semantics are exactly the same at least in Bash, POSIX shells, and tcsh: the state is inherited from the parent process, and changes do not propagate outside the subshell.) There are no extra syntax quirks, unlike command lists in Bash (which require the final semicolon and is whitespace sensitive).

Let me put this in other words:

I claim that using explicit subshells in Bash pipelines, i.e. (command(s)...)|(command(s)...)|...|(command(s)...) when comments or long commands are used, makes the code easier to write and to understand, and has no extra computing cost (run time or processes). Therefore, for complex Bash pipelines, I recommend the style used in my second code example in this post.
 
Old 01-15-2012, 12:26 PM   #23
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,164

Original Poster
Rep: Reputation: 305Reputation: 305Reputation: 305Reputation: 305
This thread expanded into a more thorough exploration of the subject than anticipated.

Some languages (APL and REXX, for example) make it easy to comment in the desired fashion. Now I know it's not so easy in BASH. Okay, I can live with that. Thanks, and let's mark this one SOLVED!

Daniel B. Martin
 
Old 01-15-2012, 12:44 PM   #24
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by David the H. View Post
Hmm, I was always under the impression that the part before the first pipe ran in the current environment. But it looks like I was mistaken.
As I was saying, tricky eh? To me, it would seem more natural if all components of a pipeline shared a single subshell environment.

Quote:
Originally Posted by Nominal Animal View Post
These output the number of child processes created by Bash. In the first two cases it is 3 as one would expect; both date and cat are external commands, not Bash built-ins. However, Bash creates 6 child processes for the command list case!
...
On my workstation, plain commands and explicit subshells produce consistently the same real time results, 1.45s to 1.51s, while command lists is definitely slower, about 2.30s real time.
...
These tests show that at least on my workstation, using explicit subshells in Bash pipelines is definitely a good idea.
Code:
$ echo $BASH_VERSION 
3.2.39(1)-release
$ strace -qf bash -c '  date    |   cat    |   cat   ' 2>&1 | grep -ce 'clone('
3
$ strace -qf bash -c '( date  ) | ( cat  ) | ( cat  )' 2>&1 | grep -ce 'clone('
3
$ strace -qf bash -c '{ date ;} | { cat ;} | { cat ;}' 2>&1 | grep -ce 'clone('
6
$ time bash -c 'for ((i=0; i<1000; i++)); do   date    |   cat    |   cat    ; done' 2>&1 >/dev/null

real    0m18.341s
user    0m5.344s
sys     0m4.772s
$ time bash -c 'for ((i=0; i<1000; i++)); do ( date  ) | ( cat  ) | ( cat  ) ; done' 2>&1 >/dev/null

real    0m16.456s
user    0m5.820s
sys     0m4.920s
$ time bash -c 'for ((i=0; i<1000; i++)); do { date ;} | { cat ;} | { cat ;} ; done' 2>&1 >/dev/null

real    0m20.988s
user    0m5.460s
sys     0m6.148s
$
Astounding! Your test suggests that (at least in pipelines) { list; } is slower. I'm at a loss to explain why it spawns twice as many child processes.

Quote:
Originally Posted by Nominal Animal View Post
I claim that using explicit subshells in Bash pipelines, i.e. (command(s)...)|(command(s)...)|...|(command(s)...) when comments or long commands are used, makes the code easier to write and to understand, and has no extra computing cost (run time or processes). Therefore, for complex Bash pipelines, I recommend the style used in my second code example in this post.
Barring source code analysis and stringent benchmarks, I must concede that explicit subshells win on efficiency. Congrats, Nom. It would seem you've fully justified your practice. (Not that I doubted you, but I just can't explain it.)

Quote:
We're getting terribly off-topic here
I believe danielbmartin already got what s?he wanted from this thread, so IMHO no harm in exploring these tangential topics.

Quote:
The comment style for the first code example does work in Bash (and many other shells like tcsh, too), but I have not found it explicitly documented as working anywhere.
I don't know if it is documented anywhere, but it seems to be accepted practice in more places than just the shell.

Code:
$ awk 'BEGIN {print "one", #comment
> "two"}'
one two
$
Quote:
Originally Posted by ta0kira View Post
The LHS has fd 1 replaced and the RHS has fd 0 replaced.
...
All of these things make a lot of sense if you look at how a shell is written in C, but the syntax of bash makes it appear as though this behavior is idiosyncratic. In my opinion, things like this irritate people because you don't need to understand the internal limitations of bash in order to use it.
That's a fine explanation, but doesn't make the behavior more intuitive. Still, I'd rather not see Bash's default behavior stray too far from the traditional Bourne shell. While I do want a modern shell with standards, I see value in preserving compatibility with the past. If the day comes that Bash no longer meets my needs then I can choose a more advanced modern shell.
 
Old 01-15-2012, 01:36 PM   #25
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
Some languages (APL and REXX, for example) make it easy to comment in the desired fashion. Now I know it's not so easy in BASH.
I just started learning python and I was appalled to find out that I couldn't have blank lines within control structures. Each language has its own style, I suppose...
Kevin Barry
 
Old 01-16-2012, 08:14 AM   #26
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,164

Original Poster
Rep: Reputation: 305Reputation: 305Reputation: 305Reputation: 305
Quote:
Originally Posted by Telengard View Post
I believe danielbmartin already got what s?he wanted from this thread...
Yes, my question was answered. I am a he, always have been, have no intention of changing that.

Quote:
Originally Posted by Telengard View Post
...so IMHO no harm in exploring these tangential topics.
No harm at all, but please don't do so on my behalf.

Daniel B. Martin
 
  


Reply

Tags
bash, bash scripting, comment, continue, line


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script to delete all comments after '#'?? dr44mon Linux - Newbie 4 01-30-2009 04:12 PM
executing comments from a file in BASH? alirezan1 Linux - General 7 08-27-2008 09:15 AM
Your Comments, Please !! cousinlucky General 6 01-21-2006 09:28 PM
Bash, input validation: request for comments unSpawn Programming 3 07-25-2003 09:03 PM
Comments Please bigjohn General 9 11-16-2002 11:32 AM


All times are GMT -5. The time now is 05:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration