LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Print a line on n'th position within a large file...will PIPE optimize this? (https://www.linuxquestions.org/questions/linux-newbie-8/print-a-line-on-nth-position-within-a-large-file-will-pipe-optimize-this-796266/)

f1dg3t 03-18-2010 07:07 AM

Print a line on n'th position within a large file...will PIPE optimize this?
 
OK so I am NEW to Linux , what i know i have learned by reading and asking, so if what I say or do here is incorrect please feel free to correct me.

I have to read a single line from a file , this file may be of any size but i know what line to read within the file..

I have two commands that will do the same thing

cat filename | head -6520 | tail -1

sed -n '6520{p;q;}' filename

I like the command working with cat and the pipe directive but is this the best solution? It is my understanding that with this command cat will stream to head witch will in turn stream to tail ... will this not "eat" a lot of memory? I like this command due to the fact that i may change the head and tail values to receive more output.
So will the pipe directive somehow optimize this, or will the different commands fully execute in memory and only return that what is needed.

i hope i made this question clear enough since English is not my first language :)

Thank you

devnull10 03-18-2010 07:12 AM

No, it won't optimize it. Unless it really is a big file then it shouldn't use a lot of memory. I'd personally use the red route though - it's more elegant and "correct".

f1dg3t 03-18-2010 08:03 AM

thank you for the reply devnull10 .. ill look at sed then :D

schneidz 03-18-2010 12:30 PM

consider using time to find out the cpu/ real time it takes to run your program.

Tinkster 03-18-2010 12:42 PM

Quote:

Originally Posted by devnull10 (Post 3903077)
No, it won't optimize it. Unless it really is a big file then it shouldn't use a lot of memory. I'd personally use the red route though - it's more elegant and "correct".

I don't think this is "quite" right. It does depend on the
process reading the pipe whether or not things "happen"
to the data stream.
Code:

$ time cat biographies.list.new > /dev/null

real    0m0.143s
user    0m0.003s
sys    0m0.137s
$ time cat biographies.list.new|head -n 1 > /dev/null

real    0m0.003s
user    0m0.000s
sys    0m0.000s

And yes, I've taken into account caching - the initial run
to get the large file into memory took 5 seconds.



Cheers,
Tink

devnull10 03-18-2010 01:39 PM

You're just processing one line with the head command though so it can stop once it's got that line. The OP was reading 6520 lines into it before then passing that output into the tail command. I'd imagine this would take slightly more memory, albeit probably hardly noticeable on current machine speeds.

When I said it wouldn't optimize I was referring to the fact that I didn't believe it would be smart enough to see that those sequence of commands were equal to just reading a specific line from a file.

[edit]

Just done a little test - not much in it to be honest! :)

Code:

~ $ for i in $(seq 1 100); do cat /usr/share/dict/words >> bigfile; done
 ~ $ time cat bigfile | head -n 6520 | tail -n 1
chapters

real    0m0.003s
user    0m0.000s
sys    0m0.003s
 ~ $ time sed -n '6520{p;q;}' bigfile
chapters

real    0m0.002s
user    0m0.002s
sys    0m0.000s
 ~ $ wc -l bigfile
3861900 bigfile
 ~ $


f1dg3t 03-19-2010 02:41 AM

Thank you for the help guys, i even learned about "time" :), ill run some random test on this end with the files i have and see what works the best.

Thanks
F1DG3T


All times are GMT -5. The time now is 03:56 PM.