Count the line
My question is about the same as the one posed here, but I'm not satisfied about the conclusion to use 'wc -l'
http://www.linuxquestions.org/questi...-files-583991/ I have a zabbix monitoring server and I am executing the following line each minute on each client Code:
wc -l /proc/net/ip_conntrack I just want to count the lines to get the amount of connections. "wc -l" does more than I want it to and I'm hoping a program that just counts the lines and only does that, can do this in a faster way. I am not good with C. I used it 20 years ago to manipulate strings when things became too slow in the native language (Clipper). I did write some nice things in C at the time, so it wonders me somehow why I can't do it (maybe I should try harder). I tried to alter this program which does a bit more than just count the line, but didn't succeed (how embarassing). http://www.gnu.org/software/cflow/ma...c-command.html Could someone take a look at it? I was thinking of calling it 'lc' and it should only return the amount of '\n' Hopefully the binary is faster than 13 seconds... |
Well I am not sure of the speed comparison, but how does something like this compare:
Code:
grep -c . /proc/net/ip_conntrack |
I did manage to alter that C-program and was able to compile an 'lc', but was disappointed to see it perform this bad. I also benchmarked the 'grep' and it performed the same as wc -l.
Here's my patched 'wc' http://pastebin.com/KUM0EwnN I compiled it with 'gcc lc.c -o lc' Code:
# time ./lc /proc/net/ip_conntrack |
wc has been developed over years by clever people.
all the bugs have been killed years ago. I doubt very much it can be improved upon. you are wasting your time if you think you can improve upon basic unix tools. I suggest your 13 seconds bottleneck is unlikely to be in wc |
Quote:
Quote:
;) |
Code:
# time ./lc /proc/net/ip_conntrack |
Quote:
wc can do much more than just count lines and I assume it has some code in it that is able to count words instead of lines and it may execute a bit of code to test something that never changes. I already found this page: http://www.stokebloke.com/wordpress/...they-are-slow/ which suggests the library function 'getc' should not be used, but fread is OK. For someone using C on a daily basis it should be a piece of cake to rewrite 'lc' BTW, the "wc" which I patched to make "lc" is not the one which is widely used. Maybe I should get hold of that one and try to modify that (take code out) to speed it up. I think the key is into taking a big chunk of data each time you call the library function (getc,fread) and put that in a piece of static memory and count the amount of '\n'. About improving code.... I can remember (20 years ago) speeding up a soundex() function for Clipper. They gave an example in assembly. I used my own algorithm for it and did it in C. Mine was 1000 times faster. |
may I suggest ...
Code:
cat -n /proc/net/ip_conntrack | tail -1 | awk '{print $1}' |
Quote:
They already have a seperate loop for just counting lines, so I don't think it can be optimized that easily. Maybe someone can still see some posibilities? Can't it use a static piece of memory (buffer) which is then parsed and counted? memchr is a library function, should a library function be used per se? Code:
/* Use a separate loop when counting only lines or lines and bytes -- |
Quote:
Code:
# time cat -n /test.pl | tail -n1 |
Quote:
But isn't there also room for speed improvement by just parsing the buffer in plain C (without calling a function in the C-library)? The problem is that I can't put it into code... |
Quote:
Quote:
|
This is the standard wc -l
Code:
# time /usr/bin/wc -l /1.3GB.txt I downloaded coreutils and compiled that wc, it turned out to be slightly faster than the one that came with Ubuntu 10.4LTS. I don't know why. It may even be due to the difference in version. coreutils wc.c: http://pastebin.com/Z91ZFKrD Code:
# time /opt/coreutil/coreutils-8.9/src/wc -l /1.3GB.txt Code:
# diff wc.c wclb.c Code:
# gcc lc.c -o lc -O3 Code:
# gcc lc.c -o lc In the old days when I was still writing in C for fast functions (in comparison with native Clipper) I never used any library functions. afaik this was not possible. There were some functions meant for parameter passing and allocating memory. I then parsed these buffers in native C. I posted these functions in public domain, but this was before Internet became popular. They were uploaded to my brother's BBS which was part of fidonet. I couldn't find any of my sources on the Internet... Don't you think it's worthwile to change the source of coreutils wc.c instead of the other one which uses getc? wc.c uses another library function (memchr). This isn't needed is it? Or doesn't it give you a speed improvement? Code:
/* Use a separate loop when counting only lines or lines and bytes -- PS I googled my name in combination with Clipper and did find these files (how funny) http://members.fortunecity.com/userg...tml/summer.htm Trudf.zip Set of MS C source UDFs (w/OBJs) that total numeric elements of an array, test null expressions, pad character strings, SOUNDEX(), & strip non-alphanumeric characters from strings - by J van Melis That's more than 20 years ago. I wish I could get hold of that file.... |
One idea is, use the size of the file 'stat -c %s' as long as the file has a constant number of characters per line ... other than that you cannot get any faster than wc -l.
|
Quote:
It's only a pseudo file and 'stat -c %s' returns 0 (as I found out a while ago in another situation) But I already made progress by modifying the buffersize of "wc.c" (didn't you see the results I posted?) I'm currently in the process of obtaining my 25 year old sources in C. These sources don't contain calls to library functions. Hopefully things will return and I may even pick up programming in C again. I still think/hope there's some room for improvement. I will keep you posted (also if I don't succeed) Cheers and thanks for all the input, JP |
All times are GMT -5. The time now is 03:06 PM. |