LinuxQuestions.org - [bash] why is Process Substitution <() so much faster ?!

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - [bash] why is Process Substitution <() so much faster ?! (https://www.linuxquestions.org/questions/programming-9/%5Bbash%5D-why-is-process-substitution-so-much-faster-846593/)

hashbang#!

11-25-2010 05:10 PM

[bash] why is Process Substitution <() so much faster ?!

file allids consists of 300,000 rows, each containing a 5-7 digit numeric id.
file newids consists of 20,000 rows of id's.

How do you explain the following timings?

time: 0.07s:

Code:

diff <(sort allids) <(sort newids)

time: 1.6s:

Code:

sort allids >allidssorted

time: 0.07s:

Code:

diff allidssorted <(sort newids)

ntubski

11-25-2010 05:29 PM

Quote:

Originally Posted by hashbang#! (Post 4171235)

How do you explain the following timings?

Writing to disk is slow. It might be interesting to see the timing with the temp file on a ram disk.

hashbang#!

11-25-2010 05:36 PM

Quote:

Originally Posted by ntubski (Post 4171248)

Writing to disk is slow.

Outputting to /dev/null or piping into wc -l took exactly the same time.

ntubski

11-25-2010 05:47 PM

I can't replicate your results:

Code:

~/tmp$ seq 300000 > allids

~/tmp$ seq 0 20000 2 > newids

~/tmp$ time diff <(sort newids) <(sort allids) >/dev/null



real        0m2.333s

user        0m2.260s

sys        0m0.088s

~/tmp$ time sort allids > allsorted 



real        0m2.019s

user        0m1.976s

sys        0m0.040s

~/tmp$ time diff <(sort newids) allsorted  >/dev/null



real        0m0.321s

user        0m0.288s

sys        0m0.044s

Perhaps it has something to do with the specific ids you have.

hashbang#!

11-26-2010 04:19 AM

For my performance tests I often pipe output into tail while you were redirecting to /dev/null. I noticed that redirecting the diff output took way longer than tail.

When timing the sort, however, sorting to file, /dev/null, or tail made no difference.

Anyway, I repeated the tests and I realized that my diff output formatting makes a difference, too:

time: 0.07s piped into tail, 0.14s to /dev/null

Code:

diff --new-line-format=%L --old-line-format= --unchanged-line-format= allidssorted <(sort newids)

time: 0.14s piped into tail, 0.2s to /dev/null

Code:

diff allidssorted <(sort newids)

And now I feel like a total idiot:

time: 0.07s piped into tail, 1.7s to /dev/null

Code:

diff <formatting> <(sort allids) <(sort newids)

I should add that there is no output because there are no newids that are not in allids.

All times are GMT -5. The time now is 09:42 AM.