LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-27-2014, 11:55 PM   #1
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Rep: Reputation: 11
Lightbulb Need assistance in running 16-threaded superheavy fuzzy search


Hi,
I wrote a superb 100% FREE console tool, however I am short on computer power, I have access only to laptop with Q9550s 4cores/4threads, my tool needs 16-threaded CPU in order to scream.
My desire is some *NIX fellow to help me in benchmarking either on AMD or Intel, on both would be ideal.

Despite my affinity to 100% free tools and functions written in C, I have been using Windows platform, don't ask why. Kinda I am a lost lonewolf.

The goal is to search into a textual file at hi-speeds.
Just for comparison, Kazahana is 3-5 times faster than grep 2.5.4 under Windows.

The current implementation has three modes: exact, wildcard, fuzzy.
All the three modes work always as 16 parallel copies, that is, a thread per block is dedicated always.
During my quest toward hi-speeds I was lucky to write the fastest exact/wildcard functions, multiplied by 16 you can see why I am eager to see the result.

The log below is for fuzzy search within Levenshtein Distance 4 for pattern "Silvestor Staloune".

Guess most people would made up to 4 typos, that is why 4:
"Silvestor Staloune" has:
replaced 'i' instead of 'y'
replaced 'o' instead of 'e'
deleted 'l'
added 'u'
The correct name is:
Sylvester Stallone

I spotted in the resultant file following typos (outside the redirect tag):
Sylvester Stalone
Sylvestor Stallone
Slvester Stallone
Silvester Stallone


Obviously even Wikipedia is not proofed fully, guess an ocean of people would misspell the name of the beloved actor as well.

Quote:
// Test on laptop with Q9550s 2833MHz, 4/4 cores/threads, Windows 7 64bit:
/*
D:\_KAZE\GameraWikipediaWiktionary>type Kazahana_2014-Dec-04\Kazahana_compile_GCC.bat
gcc -O3 -funroll-loops -static -o Kazahana_Hexadecad_GCC_472 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE.c -fopenmp -DCommence_OpenMP -D_FILE_OFFSET_BITS=64 -D_gcc_mumbo_jumbo_
gcc -O3 -funroll-loops -static -o Kazahana_Monad_GCC_472 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE.c -fopenmp -D_FILE_OFFSET_BITS=64 -D_gcc_mumbo_jumbo_

D:\_KAZE\GameraWikipediaWiktionary>type Kazahana_2014-Dec-04\Kazahana_compile_Intel12_64bit.bat
icl /O3 /arch:SSE2 /QxSSE2 /Qunroll /MT Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE.c /FAcs /FeKazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE_HEXADECAD-Threads_IntelV12_SSE2_64bit /Qopenmp /Qopenmp-link:static -DCommence_OpenMP -D_icl_mumbo_jumbo_
icl /O3 /arch:SSE2 /QxSSE2 /Qunroll /MT Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE.c /FeKazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE_MONAD-Thread_IntelV12_SSE2_64bit -D_icl_mumbo_jumbo_

D:\_KAZE\GameraWikipediaWiktionary>timer32.exe Kazahana_Hexadecad_GCC_472.exe 4e "Silvestor Staloune" enwiki-20141008-pages-articles.xml 11263
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix, copyleft Kaze 2014-Nov-19.
Pattern: Silvestor Staloune
omp_get_num_procs( ) = 4
omp_get_max_threads( ) = 4
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 11263KB ... OK
\; 00,000,001,376 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 800,855,553/342,059,464,575/2,106
Kazahana: Performance: 1 KB/clock
Kazahana: Performance: 21 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 36,459,222/1,379,563
Kazahana: Performance: I/O time, i.e. fread() time, is 3 percents
Kazahana: Done.

Kernel Time = 38.345 = 0%
User Time =136250.493 = 373%
Process Time =136288.838 = 373% Virtual Memory = 14 MB
Global Time = 36460.185 = 100% Physical Memory = 16 MB

D:\_KAZE\GameraWikipediaWiktionary>dir Kazahana.txt
Volume in drive D is S640_Vol5
Volume Serial Number is 5861-9E6C

Directory of D:\_KAZE\GameraWikipediaWiktionary

12/03/2014 01:10 PM 1,064,420 Kazahana.txt
1 File(s) 1,064,420 bytes
0 Dir(s) 63,694,749,696 bytes free

D:\_KAZE\GameraWikipediaWiktionary>timer32.exe Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE_HEXADECAD-Threads_IntelV12_SSE2_64bit 4e "Silvestor Staloune" enwiki-20141008-pages-articles.xml 11263
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE, copyleft Kaze 2014-Dec-03.
Pattern: Silvestor Staloune
omp_get_num_procs( ) = 4
omp_get_max_threads( ) = 4
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 11263KB ... OK
\; Speed: 00,000,002,001 bytes/clock; Traversed: 50,144,448,379 bytes
Kazahana: Total/Checked/Dumped xgrams: 800,855,553/342,059,464,575/2,106
Kazahana: Performance: 1 KB/clock
Kazahana: Performance: 31 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 25,073,428/602,292
Kazahana: Performance: I/O time, i.e. fread() time, is 2 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,704,219,997,078 ticks
Kazahana: Done.

Kernel Time = 284.670 = 1%
User Time = 92233.204 = 367%
Process Time = 92517.875 = 368% Virtual Memory = 17 MB
Global Time = 25073.682 = 100% Physical Memory = 16 MB

D:\_KAZE\GameraWikipediaWiktionary>dir Kazahana.txt
Volume in drive D is S640_Vol5
Volume Serial Number is 5861-9E6C

Directory of D:\_KAZE\GameraWikipediaWiktionary

12/04/2014 08:51 AM 1,064,420 Kazahana.txt
1 File(s) 1,064,420 bytes
0 Dir(s) 67,609,645,056 bytes free

D:\_KAZE\GameraWikipediaWiktionary>
*/
The compile line:
Quote:
gcc -O3 -funroll-loops -static -o Kazahana_Hexadecad_GCC_472 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE.c -fopenmp -DCommence_OpenMP -D_FILE_OFFSET_BITS=64 -D_gcc_mumbo_jumbo_
The needed superheavy test:
Quote:
./time Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE 4e "Silvestor Staloune" enwiki-20141008-pages-articles.xml 11263
On 3rd gen i3 with 4/4 cores/threads the above test took 533min.
My calculations are that on Haswell-E 5960x the time will be 1 hour.
The Wikipedia file used above is downloadable at:
http://dumps.wikimedia.org/enwiki/20...ticles.xml.bz2

The attached file is the C source zipped, to workaround the inability to upload .C/.ZIP files I renamed the .zip to .zip.log, so you the reversion is needed.

2014-Dec-28,
Kaze
Attached Thumbnails
Click image for larger version

Name:	Capture.jpg
Views:	62
Size:	251.9 KB
ID:	17223  
Attached Files
File Type: log Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE.zip.log (42.9 KB, 22 views)
 
Old 12-29-2014, 08:37 AM   #2
Cybrax
LQ Newbie
 
Registered: Oct 2009
Posts: 24

Rep: Reputation: 0
Ok wth is up with that filename lol?

It compiled fine on Debian 7.

I am looking at your code first to see if you did not put in anything dodgy 22769 Lines of code ouch lol.

I am interested in trying this since I acquired some quad CPU server hardware and would like to stress that and see how long it would do on this compared to the 533 mins.

Will let you know after I have tried it.
 
Old 12-29-2014, 10:44 AM   #3
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137
I'm looking through as well. I have a dual socket Xeon E5-2697v2 with a 6Gb/s SSD to play on, and with the holidays only 3 of the 24 available cores are being used, so there's plenty of room.
 
Old 12-29-2014, 02:02 PM   #4
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
So glad I am that not one but two fellow members have interest in Kazahana.

>Ok wth is up with that filename lol?

It is my wierd naming convention, the name represents the revision in its full form, thus I know what elf/exe is executed only by looking on console prompt dumps, pretty convenient to me.
Of course the dimensions of your console prompt window has to be wide enough, see my on Windows:
the attached shot

>I am looking at your code first to see if you did not put in anything dodgy 22769 Lines of code ouch lol.

Keep in mind that I am not a PRO just an advanced amateur.
The big number of lines is due to 16x copy-paste of each function, perhaps there is some way to make a template but I don't know how.

>I am interested in trying this since I acquired some quad CPU server hardware and would like to stress that and see how long it would do on this compared to the 533 mins.

Those, in fact 553min were reported by my friend running the "Silvestor Staloune" torture on laptop with Linux, the GCC used he said was 4.8, quite a lot of time compared to my Windows counterparts.
My Linux experience is very limited, however all GCC compilation that I did (32/64bit) to my C tools showed very decent speeds compared to my everyday test environment Windows 7 and Intel 12.1 32bit compiler. However for this particular test (4-level nested loops) ICL 12.1 outperforms GCC 4.7.2 by 50%, too much given that I used similar options during compilation.
I would be happy to see the full load on Wikipedia, that is, all the 16 threads running.
You see, for me speed is religion, that is why in the link further below I uploaded two Windows executables, one being the attached in my above post, the other with a small modification enforcing '#pragma parrllel for' within the OMP sections, this is CRAZY, on my 4cores/4threads laptop the runtime jumped fourfold.
I expect this boost to utilize ALL AVAILABLE THREADS (only for exhaustive fuzzy searches, though), to be seen.
Please if you run the torture use 'time' before the elf, to report the utilization, you know.
I fully expect ~1500% for your compile.

Fellows, if you are interested in Kazahana's latest (compiled with latest Intel v15 optimizer) executables they are downloadable at Intel Developer Zone:
https://software.intel.com/en-us/forums/topic/536849
Attached Thumbnails
Click image for larger version

Name:	mydesktop.gif
Views:	15
Size:	169.9 KB
ID:	17234  
 
Old 12-29-2014, 02:22 PM   #5
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137
I have it running on my 2x E5-2697v2, but it's only moving through the file at a rate of about 2.25 MB/s (based on the periodic "Traversed" updates. This puts the total run time at an estimated 350 minutes. The CPU load is about 17-18, 2 for other processes and 15-16 for Kazahana. With 24 physical cores, there is still some overhead available, so no throttling is taking place.
 
Old 12-29-2014, 02:30 PM   #6
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Quote:
Originally Posted by suicidaleggroll View Post
I'm looking through as well. I have a dual socket Xeon E5-2697v2 with a 6Gb/s SSD to play on, and with the holidays only 3 of the 24 available cores are being used, so there's plenty of room.
Nice, nice, you guys are joy for my sore eyes, not that much because of a long-wanted and long-awaited full EnWiki fuzzy search but that *NIX folks are gonna run Kazahana.
My desire is Kazahana to enter *NIX fields and to be widely used and mostly edited/improved by native users, my knowledge about POSIX (portability) is next to nothing.

I did my best to optimize the functions, however the final mix is one step away, that is why I am reluctant to name the tool Kazahana_r1 until heavy *NIX tortures are made.
As I said on Intel's forum, my desire is Kazahana to match the power of incoming Knights Landing (1 year away, next November if I am not mistaken).
Simply, 240 threads in my greedy eyes are not so amazing considering the heaviness of EXHAUSTIVE FUZZY search that is tested with "Silvestor Staloune" pattern.
Seeing how adding threads is only a matter of GOOD WILL from the CPU manufacturers my expectations are that by the end of the decade 256 threads will become mainstream.

By the way, I saw the 7-Zip's 51802 MIPS run on single socket Xeon E5-2697 v2 (on Anandtech) and comparing it to my Q9550S' 9965 MIPS I am eager to see Wikipedia traversed fuzzily.
 
Old 12-29-2014, 02:37 PM   #7
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Quote:
Originally Posted by suicidaleggroll View Post
I have it running on my 2x E5-2697v2, but it's only moving through the file at a rate of about 2.25 MB/s (based on the periodic "Traversed" updates. This puts the total run time at an estimated 350 minutes. The CPU load is about 17-18, 2 for other processes and 15-16 for Kazahana. With 24 physical cores, there is still some overhead available, so no throttling is taking place.
That's bad news, no idea why such miserable utilization, my 4 threaded laptop works at such speed ~2100B/millisec or ~2MB/s.
This revision should run 16 threads ALWAYS, something wrong is going on perhaps with OpenMP defaults, I don not use anything fancy regarding OpenMP just 16 OMP sections which in my terminology is manual multi-threading, not as the lazy (automatic) multi-threading.

I don't know, quite a disappointment.
 
Old 12-29-2014, 03:00 PM   #8
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Quote:
Originally Posted by suicidaleggroll View Post
... The CPU load is about 17-18, 2 for other processes and 15-16 for Kazahana. With 24 physical cores, there is still some overhead available, so no throttling is taking place.
Just a quick dummy caclulation:
48threads * 15/100 = 7threads
My guess is that only 7 threads are running, does the Task Manager (or what is its name) say how many threads the task uses?

Also, could you post the OMP report, mine looks like:
Quote:
Pattern: Silvestor Staloune
omp_get_num_procs( ) = 4
omp_get_max_threads( ) = 4
I guess the first is 24 while the second 48, no?
 
Old 12-29-2014, 03:03 PM   #9
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137
Hyperthreading is disabled on my machine. Given its usage, HT does nothing but slow things down.

Load is not normalized for the number of cores/threads on the machine. As long as there are no bottlenecks (I/O, etc), it's 1 to 1. A load of 17 means 17 CPU cores are being utilized.

The applicable lines from top:
Code:
Tasks: 584 total,   3 running, 581 sleeping,   0 stopped,   0 zombie
Cpu0  : 98.7%us,  1.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 99.3%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 20.9%us,  0.0%sy,  0.0%ni, 78.4%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.7%us,  0.7%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 : 79.1%us,  0.0%sy,  0.0%ni, 20.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  0.0%us,  0.7%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu21 :  0.7%us,  0.7%sy,  0.0%ni, 98.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132123496k total, 131626396k used,   497100k free,   128252k buffers
Swap:  4095996k total,        0k used,  4095996k free, 127667128k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND              
26636 user      20   0  244m  14m  384 S 1598.3  0.0   1015:19 Kazahana_Hexade
1600% CPU means 16 threads.

OMP report:
Code:
omp_get_num_procs() = 24                                                                                                                                                 
omp_get_max_threads() = 24                                                                                                                                               
omp_get_thread_limit() = 2147483647                                                                                                                                      
omp_get_max_active_levels() = 2147483647
I should note that the CPU usage for that process is not always 1600%. It bounces around between about 1200-1600.

Last edited by suicidaleggroll; 12-29-2014 at 03:09 PM.
 
1 members found this post helpful.
Old 12-29-2014, 03:12 PM   #10
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Thank you,
it seems that it works at full speed running 16 threads, but not sure, is there any way manager to report threads per task stats.
I wonder what PR=20 means!

>I should note that the CPU usage for that process is not always 1600%. It bounces around between about 1200-1600.

That's normal, because after finishing traversing the Master-buffer which is ~11MB all the 16 threads become idle until next portion is loaded, grmbl, I still have no knowledge of asynchronous I/O techniques.

Just as reference, my latest result for this torture (Core 2 Q9550s 2833MHz 4threads, Windows 7, 32bit compile, Intel 15):
Quote:
D:\_KAZE\GameraWikipediaWiktionary>timer32.exe Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+EX+CS_fix_DEFINE_HEXADECAD-Threads_IntelV15_Qparallel_32bit.exe 4e "Silvestor Staloune" enwiki-20141008-pages-articles.xml 11263
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE, copyleft Kaze 2014-Dec-04.
Pattern: Silvestor Staloune
omp_get_num_procs() = 4
omp_get_max_threads() = 4
omp_get_thread_limit() = 32768
omp_get_max_active_levels() = 2147483647
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 11263KB ... OK
\; Speed: 00,000,002,281 bytes/clock; Traversed: 50,144,448,379 bytes
Kazahana: Total/Checked/Dumped xgrams: 800,855,553/342,059,464,575/2,106
Kazahana: Performance: 2 KB/clock
Kazahana: Performance: 36 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 21,994,925/360,511
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,020,079,177,536 ticks
Kazahana: Done.

Kernel Time = 535.801 = 2%
User Time = 81999.944 = 372%
Process Time = 82535.745 = 375% Virtual Memory = 17 MB
Global Time = 21995.469 = 100% Physical Memory = 17 MB
Or roughly 6 hours.

Last edited by Sanmayce; 12-29-2014 at 03:20 PM.
 
Old 12-29-2014, 03:19 PM   #11
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137
Not that I'm aware of.

PR (priority) =20 and NI (nice level) =0 are the defaults. Here's an explanation:
http://www.linux.com/learn/tutorials...ops-statistics
Quote:
In Linux 2.6.23 and above, PR is always NI plus 20.
Unless processes start fighting each other for CPU time, it has no effect.
 
Old 12-29-2014, 03:57 PM   #12
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Thank you for your time suicidaleggroll, quite useful.

Two quick thoughts:
- roughly speaking the expected time should be around 6hours/4 or ~ 90min, given that both (Q9550s vs 2xXEON) CPUs work at ~2.8GHz (4 vs 16);
- I expected maximum 60% speed drop for GCC 4.7.2 as seen under Windows, so the total time should be 36460/4= 9115 seconds or 152 min.

Running twice as much alarms for some serious blunder.
I asked for help some Windows users to run this very test on 5960x but still no answer from them.

I will try to obtain 5960x running Intel 15 compile results and thus give the BASELINE.
 
Old 12-29-2014, 05:28 PM   #13
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Just looked at bottom of the page and saw very interesting Similar Thread:
LXer: Linux Terminal: How to do fuzzy search with tre-agrep

It contains link to a very useful page showing some variation of grep featuring fuzzy search.

An excerpt:
Quote:
Basic Usage

The usage it’s best shown with some simple example of this powerfull command, given the file example.txt that contains:

Résumé
RÉSUMÉ
resume
Resümee
rčsümę
Resume
linuxaria


These are the output of the command tre-agrep with some different option:

mint-desktop tmp # tre-agrep resume example.txt
resume

mint-desktop tmp # tre-agrep -i resume example.txt
resume
Resume


mint-desktop tmp # tre-agrep -1 -i resume example.txt
resume
Resümee
Resume


mint-desktop tmp # tre-agrep -2 -i resume example.txt
Résumé
RÉSUMÉ
resume
Resümee
Resume
Looks good, very similar is the usage in Kazahana, didn't know of existence of tre-agrep until now.

Here one important note to be made, though. Above examples show FUZZY search, same mode is available in Kazahana (just after the number omit 'e'), however I don't see the really powerful mode - EXHAUSTIVE FUZZY. This mode is the same as FUZZY (codewise) but brutally slow because it searches for hits in EACH POSITION, not only within entire line as a whole!

In short, Kazahana has three main modes: exact (CASE-SENSITIVE), wildcard (CASE-SENSITIVE/CASE-INSENSITIVE also RECURSIVE/ITERATIONAL), fuzzy (SIMPLE/EXHAUSTIVE)

@suicidaleggroll
Just saw that you have 128GB, if it is not too much, could you share some "normal" (i.e. exact) search: Kazahana vs grep.
For example:
Quote:
./Kazahana "Sylvester Stallone" enwiki-20141008-pages-articles.xml 11263
./grep "Sylvester Stallone" enwiki-20141008-pages-articles.xml
I always wanted to see the performance with the whole EnWiki (cached by the OS) being searched by multi-threaded searcher.
Oops, does Linux allow by default, say, 50GB file to be cached in MAIN RAM, I mean privileges?
My expectations are Kazahana to traverse Enwiki at memcpy() speed minus few percents, for grep no idea.
Theoretically "Sylvester Stallone" should be sought at 16x10GB/s in exact mode, see the attached graph.
Attached Thumbnails
Click image for larger version

Name:	Railgun_Swampshine_Full-Fledged_MEMMEM_WIKI_ragged.png
Views:	19
Size:	121.0 KB
ID:	17236  
 
Old 12-30-2014, 09:58 AM   #14
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137Reputation: 2137
It finished after I left last night:
Code:
$ time ./Kazahana_Hexadecad_GCC_472 4e "Silvestor Staloune" enwiki-20141008-pages-articles.xml 11263Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE, copyleft Kaze 2014-Dec-04.                                                                                                                                                          
Pattern: Silvestor Staloune                                                                                                                                              
omp_get_num_procs() = 24                                                                                                                                                 
omp_get_max_threads() = 24                                                                                                                                               
omp_get_thread_limit() = 2147483647                                                                                                                                      
omp_get_max_active_levels() = 2147483647                                                                                                                                 
Enforcing HEXADECAD i.e. hexadecuple-threads ...                                                                                                                         
Allocating Master-Buffer 11263KB ... OK                                                                                                                                  
\; Speed: 00,000,000,000 bytes/clock; Traversed: 50,144,448,379 bytes                                                                                                    
Kazahana: Total/Checked/Dumped xgrams: 800,855,553/342,059,464,575/2,106                                                                                                 
Kazahana: Performance: 0 KB/clock                                                                                                                                        
Kazahana: Performance: 0 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 265,039,050,001/3,177,950,000
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Done.

real    326m32.615s
user    4416m16.224s
sys     1m3.573s
Code:
$ time ./Kazahana_Hexadecad_GCC_472 "Sylvester Stallone" enwiki-20141008-pages-articles.xml 11263
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+CS_fix_DEFINE, copyleft Kaze 2014-Dec-04.
Pattern: Sylvester Stallone
omp_get_num_procs() = 24
omp_get_max_threads() = 24
omp_get_thread_limit() = 2147483647
omp_get_max_active_levels() = 2147483647
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 11263KB ... OK
\; Speed: 00,000,000,017 bytes/clock; Traversed: 50,144,448,379 bytes
Kazahana: Dumped xgrams: 2,076
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: Total/fread() clocks: 2,826,570,001/2,764,160,000
Kazahana: Performance: I/O time, i.e. fread() time, is 97 percents
Kazahana: Done.

real    2m2.163s
user    46m45.260s
sys     0m21.419s
It was consuming all 24 processor cores during those 2 minutes.

Code:
$ time grep "Sylvester Stallone" enwiki-20141008-pages-articles.xml > grep.out

real    0m31.779s
user    0m20.624s
sys     0m11.111s
It was consuming just one processor core during those 31 seconds.
 
1 members found this post helpful.
Old 12-30-2014, 11:17 AM   #15
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 59

Original Poster
Rep: Reputation: 11
Thanks.

Results are correct albeit much slower than expected.

The correct number of hits for Exhaustive-Fuzzy (first dump) is 2,106 hits, OK.
The correct number of hits for Exact( second dump) is 2,076 hits, OK.

It is unlikely GCC version to be blamed for those 326min, but if it is an old, say, older than 4.7, it could be it could be not.

Thank you for the Kazahana vs grep test also.
The result surprises me, it seems to me that EnWiki had not been cached for Kazahana, no?
122 seconds (16 threads) vs 31 seconds (1 thread) prompts for I/O pollution.
My guess is that your drive worked at 400MB/s read during the upload, so 122*400= 48800MB = ~ 50GB

My tests showed that even 1-threaded ('-DCommence_OpenMP' compile option removed) Kazahana outspeeds always grep.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Linux Terminal: How to do fuzzy search with tre-agrep LXer Syndicated Linux News 1 09-06-2020 06:44 PM
[SOLVED] Scipting assistance to search for servlets in xml file onthevirg71 Programming 6 07-19-2011 04:07 PM
apache: running multi-threaded or multi fork? Swakoo Linux - General 1 03-20-2008 07:18 AM
What is the difference between a "Threaded version" and "Non Threaded" packages? davidas Linux - Newbie 1 04-05-2004 06:23 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration