Fastest MEMMEM!?
Hi *nix guys,
here I want to ask for help in finding/sharing the fastest search function when a block is needed to be found in another block, since I am a Windows (don't ask) user and have little experience in *nix console, yes the MEMMEM which is missing in C libraries under Windows (who knows why). Simply put, do you know a better function than mine, 'Railgun_Sekireigan_Bari', I wrote an article at: http://www.codeproject.com/Articles/...-function-in-C The source is this: Code:
// Version: This is Railgun_Sekireigan revision 2, copyleft 2013-Oct-16, Kaze Also I will be pleased if someone benchmark it against (current MEMMEMs), I have no opportunity, neither modern computer nor knowledge of *nix environment. |
1 Attachment(s)
Lucky I am to come up with a new even faster function - 'Railgun_Sekireigan_Piri'.
The graph showing the superiority of 'Piri': http://www.linuxquestions.org/questi...1&d=1383501798 |
1 Attachment(s)
The fastest memmem(), known to me, comes here.
The source/benchmark is downloadable at my INTERNET drive: https://1drv.ms/u/s!AmWWFXGMzDmEglwjlUtnMJrfhosK It also is zipped (the 4 testfiles are not) and attached to the post. The speed showdown has three facets: - compares the 64bit code generated from GCC 5.10 versus Intel 15.0 compilers; - compares two datasets - search speed through English texts versus Genome ACGT-type data; - compares the tweaked Two-Way algorithm (implemented by Eric Blake) and adopted by GLIBC as memmem() versus my Railgun_Swampshine. Note1: The GLIBC memmem() was taken from latest (2016-08-05) glibc 2.24 tar: https://www.gnu.org/software/libc/ Note2: Eric Blake says that he enhanced the linearity of Two-Way by adding some sublinear paths, well, Railgun is all about sublinearity, so feel free to experiment with your own testfiles (worst-case-scenarios), just make such a file feed the compressor with it, then we will see how the LINEAR Two-Way behaves versus Railgun_Swampshine. Note3: Just copy-and-paste 'Railgun_Swampshine' or 'Railgun_Ennearch' from the benchmark's source. So the result on Core 2 Q9550s @2.83GHz: Code:
--------------------------------------------------------------------------------------------- in fact, these numbers are the compression speed using LZSS and memmem() as matchfinder. Two-Way is significantly slower than BMH Order 2, the speed-down is in range: - for TEXTUAL ANSI alphabets: 1729/226= 7.6x - for TEXTUAL UTF8 alphabets: 2898/760= 3.8x - for TEXTUAL ACGT alphabets: 458/109= 4.2x - for BINARY-esque alphabets: 580/144= 4.0x For faster RAM, than mine @666MHz, and for haystacks multimegabytes long, the speedup goes beyond 8x. The benchmark shows the real behavior (both latency and raw speed) of the memmem variants, I added also the Thierry Lecroq's Two-Way implementation: http://www-igm.univ-mlv.fr/~lecroq/s...l#SECTION00260 However, Eric Blake's one is faster, so it was chosen for the speed showdown. Once I measured the total length of traversed haystacks, and for files 100+MB long, it went ... quintillion of bytes i.e. petabytes - good torture it is. Enfun! |
1 Attachment(s)
The fastest memmem() is already the tuned Railgun_Swampshine_BailOut - avoiding second pattern comparison in BMH2 and pseudo-BMH4. It is called 'Trolldom', to signify its secondary purpose - to traverse fast via Trolllandia a.k.a. ugly/deformed haystacks causing quadratic behavior of most Boyer-Moore-Horspool variants. GNU/GLIBC memmem() is linearIC and sublinearIC at times while Railgun_Trolldom is mostly sublinearIC.
Railgun_Trolldom is slightly faster (both with Intel & GCC) than Railgun_Swampshine, this is mostly due to adding a bitwise BMH order 2 (8KB table overhead instead of 64KB) path - for haystacks <77777 bytes long - the warm-up is faster. Code:
So the result on Core 2 Q9550s @2.83GHz DDR2 @666MHz / i5-2430M @3.00GHz DDR3 @666MHz: https://1drv.ms/u/s!AmWWFXGMzDmEgl3izdwhSnIq0Lv5 www.sanmayce.com/Downloads/Nakamichi_Kintaro++_source_executables_64bit_(GCC510-vs-Intel150)_(TW-vs-RG)_BENCHMARK.zip Since it is pure C, no intrinsics used, perhaps it will work unproblematically on most *nix systems. To the attached archive I included the C source along with its listing in 28 pages in PDF format - to be printed and digested more easily from search techniques aficionados. ENFUN! |
All times are GMT -5. The time now is 06:46 AM. |