makyo |
01-06-2009 10:17 AM |
Hi.
If you have a really long file, you may want to consider some optimization. I don't have any really large files, but here are results on working with a file that is around 1 GB. I assume that for a match, you want only the first hit. I have adjusted the requirement form 10 to 2 to save posting space:
Code:
#!/bin/bash -
# @(#) s1 Demonstrate obtaining a segment, piece, part of a file.
echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) sed grep cgrep
set -o nounset
echo
FILE=${1-/tmp/test-one-gb}
echo " Lines in data file $FILE:"
time wc -l $FILE
echo
echo " Results, sed:"
time sed -n '434,435 p' $FILE
echo
echo " Results, sed with quit:"
time sed -n -e '434,435 p' -e '436 q' $FILE
echo
echo " Results, grep, max-count:"
time grep --max-count=1 -A 1 -n "nightmare" $FILE
echo
echo " Results, cgrep, -N matches:"
echo " http://www.bell-labs.com/project/wwexptools/cgrep/"
time cgrep -N 1 +1 -n -D "nightmare" $FILE
exit 0
Code:
$ ./s1
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.11-x1, i686
Distribution : Xandros Desktop 3.0.3 Business
GNU bash 2.05b.0
GNU sed version 4.1.2
grep (GNU grep) 2.5.1
cgrep - (local: ~/executable/cgrep Sep 28 2007 )
Lines in data file /tmp/test-one-gb:
14754910 /tmp/test-one-gb
real 0m19.423s
user 0m1.049s
sys 0m1.645s
Results, sed:
nightmare to a dead sartainty. Landlord, I whispered, that aint the
harpooneer, is it? Oh, no, said he, looking a sort of diabolically funny,
real 0m5.926s
user 0m5.020s
sys 0m0.807s
Results, sed with quit:
nightmare to a dead sartainty. Landlord, I whispered, that aint the
harpooneer, is it? Oh, no, said he, looking a sort of diabolically funny,
real 0m0.001s
user 0m0.002s
sys 0m0.000s
Results, grep, max-count:
434:nightmare to a dead sartainty. Landlord, I whispered, that aint the
435-harpooneer, is it? Oh, no, said he, looking a sort of diabolically funny,
real 0m0.001s
user 0m0.000s
sys 0m0.001s
Results, cgrep, -N matches:
http://www.bell-labs.com/project/wwexptools/cgrep/
434:nightmare to a dead sartainty. Landlord, I whispered, that aint the
435:harpooneer, is it? Oh, no, said he, looking a sort of diabolically funny,
real 0m0.002s
user 0m0.001s
sys 0m0.002s
Note that without the "quit", sed will go through the entire file, whereas it's much faster with the "quit". If you want to do matching, then the GNU grep has a feature to stop at "n" hits. If you don't have GNU grep, then one can obtain cgrep, which has similar features (and much more), from the site noted ... cheers, makyo
|