LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   [bash] why is Process Substitution <() so much faster ?! (https://www.linuxquestions.org/questions/programming-9/%5Bbash%5D-why-is-process-substitution-so-much-faster-846593/)

hashbang#! 11-25-2010 05:10 PM

[bash] why is Process Substitution <() so much faster ?!
 
file allids consists of 300,000 rows, each containing a 5-7 digit numeric id.
file newids consists of 20,000 rows of id's.

How do you explain the following timings?

time: 0.07s:
Code:

diff <(sort allids) <(sort newids)
time: 1.6s:
Code:

sort allids >allidssorted
time: 0.07s:
Code:

diff allidssorted <(sort newids)

ntubski 11-25-2010 05:29 PM

Quote:

Originally Posted by hashbang#! (Post 4171235)
How do you explain the following timings?

Writing to disk is slow. It might be interesting to see the timing with the temp file on a ram disk.

hashbang#! 11-25-2010 05:36 PM

Quote:

Originally Posted by ntubski (Post 4171248)
Writing to disk is slow.

Outputting to /dev/null or piping into wc -l took exactly the same time.

ntubski 11-25-2010 05:47 PM

I can't replicate your results:

Code:

~/tmp$ seq 300000 > allids
~/tmp$ seq 0 20000 2 > newids
~/tmp$ time diff <(sort newids) <(sort allids) >/dev/null

real        0m2.333s
user        0m2.260s
sys        0m0.088s
~/tmp$ time sort allids > allsorted

real        0m2.019s
user        0m1.976s
sys        0m0.040s
~/tmp$ time diff <(sort newids) allsorted  >/dev/null

real        0m0.321s
user        0m0.288s
sys        0m0.044s

Perhaps it has something to do with the specific ids you have.

hashbang#! 11-26-2010 04:19 AM

For my performance tests I often pipe output into tail while you were redirecting to /dev/null. I noticed that redirecting the diff output took way longer than tail.

When timing the sort, however, sorting to file, /dev/null, or tail made no difference.


Anyway, I repeated the tests and I realized that my diff output formatting makes a difference, too:

time: 0.07s piped into tail, 0.14s to /dev/null
Code:

diff --new-line-format=%L --old-line-format=  --unchanged-line-format= allidssorted <(sort newids)
time: 0.14s piped into tail, 0.2s to /dev/null
Code:

diff allidssorted <(sort newids)
And now I feel like a total idiot:

time: 0.07s piped into tail, 1.7s to /dev/null
Code:

diff <formatting> <(sort allids) <(sort newids)
I should add that there is no output because there are no newids that are not in allids.


All times are GMT -5. The time now is 09:42 AM.