I am seeking for possible explanations under which conditions a command
prog >file
does
not result in
file being created. Here are the special circumstances which I observed (example simplified as much as possible):
I have a bash script
~my/script.sh which essentially contains the following commands:
#!/bin/bash --norc
~my/p1.sh >foo 2>&1
cat foo
~other/p2.sh >bar 2>&1
echo $?
cat bar
~my/p3.sh >baz 2>&1
cat baz
Several processes, running under user
my, use in parallel this program in the following way:
cd ~my/Some-Directory-Unique-To-This-Process
~my/script.sh >log
This works well most of the time. Sometimes, however, we found that
cat bar complains that there exists no file called
bar (the echo statement always outputs 0), and indeed,
foo and
baz exist in these cases, but no
bar. Aside from this, the file
log shows no error message.
This puzzles me, because I had at least expected an empty file named
bar. After all, before starting a process, bash has to do the redirection, which involves opening the output file for writing. Indeed, even a no-op statement
: >something
would create an empty file. From the fact that
bar does not exist afterwards, I would conclude that either the shell could not set up the redirection (but in this case we should see some error message, don't we?), or for some reason ps.sh could not write to the file (in which case $? should not be zero). Do I miss something here? What other explanation could exist for my case?
Here some more information which might or might not be relevant:
- The processes run partially on different hosts
- The execution of myscript.sh takes around 5 minutes
- Around 100 such processes are executed during one hour
- Sometimes, the problem does not occur for weeks, but then, over a period of several hours, we see about 20% of these executions to fail in the described way
- The problem never occurs with the redirections to the files log, foo or baz; only with bar (created from a program residing on ~other instead of ~my)
- The home ~my is physically located on a different host than the processes running the script
- The home ~other is also physically located on a different host, also different from the host which homes ~my
As you can see from the last two items, I vaguely suspect that networking issues could be involved. Any idea what is happening here?