LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   BASH comments (https://www.linuxquestions.org/questions/programming-9/bash-comments-923522/)

Nominal Animal 01-14-2012 11:51 AM

With subshells, each subshell inherits its state from the parent; changes never propagate back. With command lists, the state is shared with the parent but only if the command list is not part of a pipe.
Code:

x=5 ; { x=6 ;} ; echo $x
outputs 6, but
Code:

x=5 ; { x=6 ;} | { x=7 ;} ; echo $x
outputs 5.

I'm not sure if this is properly documented anywhere. I believe a future version of Bash might well output 7 in the latter case: running the last command list of a pipe in the original shell state might be a worthwhile optimization.

(Just to be clear: x=5;(x=6);echo $x will always output 5, as will x=5;(x=6)|(x=7);echo $x .)

Telengard 01-14-2012 01:59 PM

Quote:

Originally Posted by Nominal Animal (Post 4574484)
Code:

x=5 ; { x=6 ;} | { x=7 ;} ; echo $x
outputs 5.

I'm not sure if this is properly documented anywhere.

It is in the authoritative documentation, but the behavior is sometimes unexpected. Frankly I don't quite get the rationale of creating separate subshells for both the LHS and RHS expressions.

Pipelines - Bash Reference Manual

Quote:

Originally Posted by Bashref
Each command in a pipeline is executed in its own subshell (see Command Execution Environment).

Command Execution Environment - Bash Reference Manual

Quote:

Originally Posted by Bashref
Command substitution, commands grouped with parentheses, and asynchronous commands are invoked in a subshell environment that is a duplicate of the shell environment, except that traps caught by the shell are reset to the values that the shell inherited from its parent at invocation. Builtin commands that are invoked as part of a pipeline are also executed in a subshell environment. Changes made to the subshell environment cannot affect the shell's execution environment.

Code:

$ unset a; a="parent_shell"
$ { declare -p a > /dev/stderr; a="pipe_LHS"; } |
> { declare -p a > /dev/stderr; a="pipe_RHS"; }
declare -- a="parent_shell"
declare -- a="parent_shell"
$ declare -p a
declare -- a="parent_shell"
$

  • The value of a is set to parent_shell in the parent shell.
  • A subshell environment is created on the LHS of the pipeline.
  • LHS of the pipeline inherits the value of a from the parent shell.
  • LHS of the pipeline sets a new value of a in its own environment.
  • LHS of the pipeline ends and its environment is destroyed.
  • A subshell environment is created on the RHS of the pipeline.
  • RHS of the pipeline inherits the value of a from the parent shell.
  • RHS of the pipeline sets a new value of a in its own environment.
  • RHS of the pipeline ends and its environment is destroyed.

As each command group is enclosed in {;} (curly braces), one might expect the entire command line to share the same value of a. Instead, three separate shell environments each contain their own unique a. It is the pipeline which creates the subshells and propagates a into them.

That's how I think it works, but as I said it seems tricky to me.

David the H. 01-15-2012 12:12 AM

Hmm, I was always under the impression that the part before the first pipe ran in the current environment. But it looks like I was mistaken.

In any case, this should also mean that any time you use a (..) subshell in a pipeline you end up spawning two sub-shells for it, correct?

Nominal Animal 01-15-2012 12:38 AM

Quote:

Originally Posted by Telengard (Post 4574558)
Frankly I don't quite get the rationale of creating separate subshells for both the LHS and RHS expressions.

Exactly. I kind of expect the Bash developers to eventually drop the subshell for the rightmost expression, because it is such an obvious optimization, and I believe some other shells do it already. Unfortunately, this means the final expression in a pipe sequence might, in the future, affect the parent shell state, if using command lists.

Quote:

Originally Posted by Telengard (Post 4574558)
It is the pipeline which creates the subshells

Yes, exactly.

I decided to do some tests, and the results are a bit startling.
Code:

strace -qf bash -c '  date    |  cat    |  cat  ' 2>&1 | grep -ce 'clone('
strace -qf bash -c '( date  ) | ( cat  ) | ( cat  )' 2>&1 | grep -ce 'clone('
strace -qf bash -c '{ date ;} | { cat ;} | { cat ;}' 2>&1 | grep -ce 'clone('

These output the number of child processes created by Bash. In the first two cases it is 3 as one would expect; both date and cat are external commands, not Bash built-ins. However, Bash creates 6 child processes for the command list case!

(I believe this is related to the way Bash creates the implicit subshells. Normally, if there is only one command to run in a subshell, Bash exec's it, avoiding the unnecessary fork()/clone().)

Timing tests,
Code:

time bash -c 'for ((i=0; i<1000; i++)); do  date    |  cat    |  cat    ; done' 2>&1 >/dev/null
time bash -c 'for ((i=0; i<1000; i++)); do ( date  ) | ( cat  ) | ( cat  ) ; done' 2>&1 >/dev/null
time bash -c 'for ((i=0; i<1000; i++)); do { date ;} | { cat ;} | { cat ;} ; done' 2>&1 >/dev/null

has similar results. On my workstation, plain commands and explicit subshells produce consistently the same real time results, 1.45s to 1.51s, while command lists is definitely slower, about 2.30s real time.

Using more complex pipelines there is no difference between subshells and command lists:
Code:

strace -qf bash -c '( date ; date  ) | ( date ; cat  ) | ( date ; cat  )' 2>&1 | grep -ce 'clone('
strace -qf bash -c '{ date ; date ;} | { date ; cat ;} | { date ; cat ;}' 2>&1 | grep -ce 'clone('
time bash -c 'for ((i=0; i<1000; i++)); do ( date ; date  ) | ( date ; cat  ) | ( date ; cat  ) ; done' 2>&1 >/dev/null
time bash -c 'for ((i=0; i<1000; i++)); do { date ; date ;} | { date ; cat ;} | { date ; cat ;} ; done' 2>&1 >/dev/null

Bash does create an extra process (subshell) for each pipe segment, forking total 9 child processes, for both above cases. I could not measure any real difference in the timings, either.

These tests show that at least on my workstation, using explicit subshells in Bash pipelines is definitely a good idea. They do not use any extra resources compared to the alternatives, no extra syntax requirements compared to normal shell syntax, and the semantics are clear.

Quote:

Originally Posted by Telengard (Post 4574558)
That's how I think it works, but as I said it seems tricky to me.

I have exactly the same understanding.

You know, up to now I have avoided using command lists in Bash. Where one might use a command list, I've used a Bash function (subshell in a pipeline) instead. Without your posts in this thread, Telengard, I would still be relying on a hazy personal preference, instead of actual knowledge. I for one have learned something new, something that I probably would not have found out on my own alone; thank you!

David the H. 01-15-2012 02:12 AM

Quote:

Originally Posted by Nominal Animal (Post 4574856)
Exactly. I kind of expect the Bash developers to eventually drop the subshell for the rightmost expression, because it is such an obvious optimization, and I believe some other shells do it already.

ksh runs the final expression in the current environment, and bash 4.2 has partially implemented the same behavior. The new lastpipe shell option enables it, but it only works when job control is disabled, so it's kind of inconvenient to use in interactive shells.

ta0kira 01-15-2012 02:47 AM

Quote:

Originally Posted by Telengard (Post 4574558)
Frankly I don't quite get the rationale of creating separate subshells for both the LHS and RHS expressions.

The shell needs to retain it's file descriptors while the commands execute, so it's intuitive from an implementation perspective. The LHS has fd 1 replaced and the RHS has fd 0 replaced.
Quote:

Originally Posted by Nominal Animal (Post 4574856)
Exactly. I kind of expect the Bash developers to eventually drop the subshell for the rightmost expression, because it is such an obvious optimization, and I believe some other shells do it already. Unfortunately, this means the final expression in a pipe sequence might, in the future, affect the parent shell state, if using command lists.

I don't quite understand how this would be implemented, unless the shell temporarily copied fd 0, executed the pipeline, then copied it back. This would obviously compromise background processing and Ctrl+Z of a foreground pipeline. The other alternative would be to do what you suggest only when the last grouping consists only of built-ins, which would be horribly inconsistent. Take these two lines, for example:
Code:

while true; do echo $((val++)); sleep 1; done
while true; do echo $((val++)); sleep 1; done | while true; do head -n1; done

If you Ctrl+Z the first line it will SIGSTOP the sleep process, but when you fg it will no longer be in the loop. Because of the subshells in the second line the process group can be SIGSTOPed, and if necessary, put in the background. Without subshells you couldn't do this because fg/bg is based on process groups. There are certainly advantages to be had in non-interactive mode (scripts) and when the shell isn't a session leader, but it can be a headache when something works on the command line and not in a script.

All of these things make a lot of sense if you look at how a shell is written in C, but the syntax of bash makes it appear as though this behavior is idiosyncratic. In my opinion, things like this irritate people because you don't need to understand the internal limitations of bash in order to use it. Unless bash starts using the "system" idiom to call external programs or it starts routing all IPC itself, it will never get away from extensive use of subshells.
Kevin Barry

Nominal Animal 01-15-2012 04:09 AM

Quote:

Originally Posted by ta0kira (Post 4574909)
I don't quite understand how this would be implemented, unless [..snip..]

Based on your nice analysis, it does seem like my fears for that kind of optimization screwing up things later on is unfounded.

Quote:

Originally Posted by ta0kira (Post 4574909)
Unless bash starts using the "system" idiom to call external programs or it starts routing all IPC itself, it will never get away from extensive use of subshells.

I seriously hope it does not!

We're getting terribly off-topic here, but C system() function is a major source of security problems (related to quoting and escaping), and adding yet another "framework" for IPC will severely restrict the usability of Bash. I'm severely tempted to rant about applying modularity instead of framework paradigm, but that would be completely off-topic, and serve no purpose here really.

I thought my tests above showed that the cost of subshells in pipelines is neglible; zero for all single-command pipe segments, and only one process per pipe segment for multi-command ones. In particular,
Code:

date | # First command in the pipe,
cat  | # second command,
cat    # third command.

and
Code:

( # First command in the pipe
  date
) | (
  # Second command in the pipe
  cat
) | (
  # Third command in the pipe
  cat
)

use the same (minimum!) number of processes, CPU time, and wall clock time. The equivalent code snippet using command lists uses three extra child processes on Bash-4.2.10.

The comment style for the first code example does work in Bash (and many other shells like tcsh, too), but I have not found it explicitly documented as working anywhere. I believe it is implicit, perhaps a side effect of the way commands are parsed, rather than anything intentional.

The second code snippet, the one using subshells, is explicitly documented. (In particular, the semantics are exactly the same at least in Bash, POSIX shells, and tcsh: the state is inherited from the parent process, and changes do not propagate outside the subshell.) There are no extra syntax quirks, unlike command lists in Bash (which require the final semicolon and is whitespace sensitive).

Let me put this in other words:

I claim that using explicit subshells in Bash pipelines, i.e. (command(s)...)|(command(s)...)|...|(command(s)...) when comments or long commands are used, makes the code easier to write and to understand, and has no extra computing cost (run time or processes). Therefore, for complex Bash pipelines, I recommend the style used in my second code example in this post.

danielbmartin 01-15-2012 11:26 AM

This thread expanded into a more thorough exploration of the subject than anticipated.

Some languages (APL and REXX, for example) make it easy to comment in the desired fashion. Now I know it's not so easy in BASH. Okay, I can live with that. Thanks, and let's mark this one SOLVED!

Daniel B. Martin

Telengard 01-15-2012 11:44 AM

Quote:

Originally Posted by David the H. (Post 4574845)
Hmm, I was always under the impression that the part before the first pipe ran in the current environment. But it looks like I was mistaken.

As I was saying, tricky eh? To me, it would seem more natural if all components of a pipeline shared a single subshell environment.

Quote:

Originally Posted by Nominal Animal (Post 4574856)
These output the number of child processes created by Bash. In the first two cases it is 3 as one would expect; both date and cat are external commands, not Bash built-ins. However, Bash creates 6 child processes for the command list case!
...
On my workstation, plain commands and explicit subshells produce consistently the same real time results, 1.45s to 1.51s, while command lists is definitely slower, about 2.30s real time.
...
These tests show that at least on my workstation, using explicit subshells in Bash pipelines is definitely a good idea.

Code:

$ echo $BASH_VERSION
3.2.39(1)-release
$ strace -qf bash -c '  date    |  cat    |  cat  ' 2>&1 | grep -ce 'clone('
3
$ strace -qf bash -c '( date  ) | ( cat  ) | ( cat  )' 2>&1 | grep -ce 'clone('
3
$ strace -qf bash -c '{ date ;} | { cat ;} | { cat ;}' 2>&1 | grep -ce 'clone('
6
$ time bash -c 'for ((i=0; i<1000; i++)); do  date    |  cat    |  cat    ; done' 2>&1 >/dev/null

real    0m18.341s
user    0m5.344s
sys    0m4.772s
$ time bash -c 'for ((i=0; i<1000; i++)); do ( date  ) | ( cat  ) | ( cat  ) ; done' 2>&1 >/dev/null

real    0m16.456s
user    0m5.820s
sys    0m4.920s
$ time bash -c 'for ((i=0; i<1000; i++)); do { date ;} | { cat ;} | { cat ;} ; done' 2>&1 >/dev/null

real    0m20.988s
user    0m5.460s
sys    0m6.148s
$

Astounding! :eek: Your test suggests that (at least in pipelines) { list; } is slower. I'm at a loss to explain why it spawns twice as many child processes. :confused:

Quote:

Originally Posted by Nominal Animal (Post 4574938)
I claim that using explicit subshells in Bash pipelines, i.e. (command(s)...)|(command(s)...)|...|(command(s)...) when comments or long commands are used, makes the code easier to write and to understand, and has no extra computing cost (run time or processes). Therefore, for complex Bash pipelines, I recommend the style used in my second code example in this post.

Barring source code analysis and stringent benchmarks, I must concede that explicit subshells win on efficiency. Congrats, Nom. It would seem you've fully justified your practice. (Not that I doubted you, but I just can't explain it.) :hattip:

Quote:

We're getting terribly off-topic here
I believe danielbmartin already got what s?he wanted from this thread, so IMHO no harm in exploring these tangential topics.

Quote:

The comment style for the first code example does work in Bash (and many other shells like tcsh, too), but I have not found it explicitly documented as working anywhere.
I don't know if it is documented anywhere, but it seems to be accepted practice in more places than just the shell.

Code:

$ awk 'BEGIN {print "one", #comment
> "two"}'
one two
$

Quote:

Originally Posted by ta0kira (Post 4574909)
The LHS has fd 1 replaced and the RHS has fd 0 replaced.
...
All of these things make a lot of sense if you look at how a shell is written in C, but the syntax of bash makes it appear as though this behavior is idiosyncratic. In my opinion, things like this irritate people because you don't need to understand the internal limitations of bash in order to use it.

That's a fine explanation, but doesn't make the behavior more intuitive. Still, I'd rather not see Bash's default behavior stray too far from the traditional Bourne shell. While I do want a modern shell with standards, I see value in preserving compatibility with the past. If the day comes that Bash no longer meets my needs then I can choose a more advanced modern shell.

ta0kira 01-15-2012 12:36 PM

Quote:

Originally Posted by danielbmartin (Post 4575189)
Some languages (APL and REXX, for example) make it easy to comment in the desired fashion. Now I know it's not so easy in BASH.

I just started learning python and I was appalled to find out that I couldn't have blank lines within control structures. Each language has its own style, I suppose...
Kevin Barry

danielbmartin 01-16-2012 07:14 AM

Quote:

Originally Posted by Telengard (Post 4575202)
I believe danielbmartin already got what s?he wanted from this thread...

Yes, my question was answered. I am a he, always have been, have no intention of changing that. :D

Quote:

Originally Posted by Telengard (Post 4575202)
...so IMHO no harm in exploring these tangential topics.

No harm at all, but please don't do so on my behalf.

Daniel B. Martin


All times are GMT -5. The time now is 08:12 AM.