LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-05-2022, 10:18 AM   #16
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,701

Rep: Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062Reputation: 5062

May I suggest that if you’re really breaking the ground you say, then the best place to present your work is not here? A paper with some computer science organization might be in order.
 
1 members found this post helpful.
Old 04-05-2022, 11:36 PM   #17
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 73

Original Poster
Rep: Reputation: 13
Quote:
Originally Posted by pan64 View Post
In a lot of cases any kind of sort is sufficient, especially when we have only a few items to work with. A general sort algorithm is as good as any other one for this purpose.
From the other hand in case of a huge amount of data [structures] we always need a special and unique implementation where all the important details were taken into account (like data types, record size, index table, "distinctness", way of comparison and also including hardware, like cpu type, amount of ram, i/o speed).

Just curious: how will you sort all the existing/valid phone numbers based on the age of the owners (in minutes?).
Hah, ask this question on other sites, and you will be treated as a spammer, or even better, as a troll, hee-hee.

Speaking of pure sorting speed, kinda similar (sizewise) task was executed yesterday on brother's laptop (Renoir 4800H, 64GB), speaking of sorting all 8bytes long chunks within 'math.stackexchange.com_en_all_2019-02.zim' (7,798,235,442 bytes) testdataset.
Since it was grabbing all the 62212 MB some internal Windows mumbo-jumbo ruined (the resultant time fluctuated by 50s) the run, yet, was able to see 458s for Crumsort and 457s for Akkoda, for 7,798,235,435 elements (6,770,144,405 unique).
So, roughly, a modern laptop will handle it at 7,798,235,435 keys / 457 seconds = 17,063,972 Keys/s, not that bad.

I guess, you are curious how I would handle non-QWORD data, either fixed or variable-length.
The short answer, by padding to a fixed length the 'age'+'number' fields, and proceed with a custom comparator sorting the QWORD offsets (to the whole record).

I love practical benchmarks, where hashing/searching/decompressing/sorting can be put to the test, the next two posts kinda address what you ask:
https://forum.qb64.org/index.php?top...9029#msg139029

Of course, this is memory hog approach, a decade ago, I realized that a solution is missing in that regard, therefore I wrote transparent (using either malloc() or fopen() memory pool) dealing with big data, not relying on virtual RAM by the OS.

Assuming you will need fast lookpus afterwards, I would use my old B-tree order 3 iterative implementation written in C, it is the core of my phrase-ripper Leprechaun achieving 12+ million phrases/words per second (when ripping the whole XML dump of English Wikipedia, ~75GB strong). AFAIK, Leprechaun is the Fastest Phrase Ripper.
In case of such big like 7 billion keys, I would just treat Phone Numbers or People Names (around 10 billion for last century) as variable-length field - each leaf in the B-tree having a 1-byte prefix housing the length of the field.
The niftiness is in allowing ripping in passes (specified at run-time via command line), thus working ONLY in the physical RAM, but if you have access to 3D XPoint drives then can switch to a single-pass mode using Leprechauns's external RAM.
 
Old 04-05-2022, 11:42 PM   #18
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 73

Original Poster
Rep: Reputation: 13
Quote:
Originally Posted by dugan View Post
May I suggest that if you’re really breaking the ground you say, then the best place to present your work is not here? A paper with some computer science organization might be in order.
Nothing ground-breaking, just sharing with the community my amateurish experience. I myself am a rookie in *nix, it gets as it comes.
 
Old 10-07-2022, 11:36 PM   #19
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 73

Original Poster
Rep: Reputation: 13
Finally, glad to share the C source code and the Windows/Linux binaries of my console tool (100% FREE and open-source) for sorting lines - the speed rival to 'sort', the package is Schmekeriada_2022-Oct-05.tar.7z 87.2 MB (91,451,331 bytes) on my Internet drive:
https://drive.google.com/file/d/1-w_...ew?usp=sharing

For those who want to see some fast machines running 8 different Quicksort implementations, my thread "The Definitive Quicksort benchmark" at OCN forum gives a SPEED ROSTER:
https://www.overclock.net/threads/th...#post-29029160

Thanks to one Norwegian overclocker, now we can see the fastest desktop machine running the fastest Quicksort-Magnetica-Akkoda, sorting in 35 seconds 3+ billion QWORDS (being all the 8 bytes building-blocks of Human DNA genome).

Until this night, I thought my tool was faster than Linuxsort, but this happened to be true only on 4 cores machines that I tested, the parallelizability of mergesort proves superior, ugh.
Speaking of practical situations, as sorting the Japanese Wikipedia XML dump (15 GB strong file, 219+ million lines), Fedora Linux sort appears to be very well written, on a modern laptop (AMD 4800H 16 threads, 64GB DDR4) running Fedora 33 live flashboot, it outperforms my tool Schmekeriada, being 3:41/3:20=(3*60+41)/(3*60+20)= 1.10x faster, GRMBL:

First setting fastest locale for sort:
Code:
[root@localhost-live Schmekeriada_2022-Oct-05]# export LC_ALL='C'
[root@localhost-live Schmekeriada_2022-Oct-05]# locale
LANG=en_US.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
[root@localhost-live Schmekeriada_2022-Oct-05]# echo $LC_ALL
C
[root@localhost-live Schmekeriada_2022-Oct-05]#
Setting superuser and performance mode:
Code:
[liveuser@localhost-live ~]$ su
[root@localhost-live Schmekeriada_2022-Oct-05]# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
[root@localhost-live Schmekeriada_2022-Oct-05]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
[root@localhost-live Schmekeriada_2022-Oct-05]#
Running the benchmark from an external USB drive (1TB SSD Kingston KC600):
Code:
[root@localhost-live liveuser]# df -h
Filesystem           Size  Used Avail Use% Mounted on
devtmpfs              32G     0   32G   0% /dev
tmpfs                 32G     0   32G   0% /dev/shm
tmpfs                 13G  9.9M   13G   1% /run
/dev/sda1             29G  8.6G   21G  30% /run/initramfs/live
/dev/mapper/live-rw  7.9G  6.0G  1.9G  77% /
tmpfs                 32G   24K   32G   1% /tmp
vartmp                32G     0   32G   0% /var/tmp
tmpfs                6.3G  180K  6.3G   1% /run/user/1000
/dev/sdb1            954G  934G   20G  98% /run/media/liveuser/TERA-PIG_1TB
[root@localhost-live liveuser]# cd /run/media/liveuser/TERA-PIG_1TB
[root@localhost-live TERA-PIG_1TB]# cd Schmekeriada_2022-Oct-05/
Schmekeriada stats and sha1 checksum:
Code:
[root@localhost-live Schmekeriada_2022-Oct-05]# /bin/time -v ./Schmekeriada_GCC_12.1.1_TetraThread_STATIC.elf jawiki-20220920-pages-articles.xml
   _________        .__                       __                    __             .___         
  /   _____/  ____  |  |__    _____    ____  |  | __  ____ _______ |__|_____     __| _/_____    
  \_____  \ _/ ___\ |  |  \  /     \ _/ __ \ |  |/ /_/ __ \\_  __ \|  |\__  \   / __ | \__  \   
  /        \\  \___ |   Y  \|  Y Y  \\  ___/ |    < \  ___/ |  | \/|  | / __ \_/ /_/ |  / __ \_ 
 /_______  / \___  >|___|  /|__|_|  / \___  >|__|_ \ \___  >|__|   |__|(____  /\____ | (____  / 
         \/      \/      \/       \/      \/      \/     \/                 \/      \/      \/   
This build (2022-Oct-05) features Quicksort-Magnetica-Strongfool.
Current priority is -20.
Size of input file: 15,674,562,752
Allocating FILE-Buffer 14948MB ...
Counting lines ... Done in 1,390,180 clocks, 1.39 seconds.
Number of LF-ending lines: 219,648,620
Postfixing the last "line" with a LF.
Allocating Master-Buffer (Offsets+Lengths) NumberOfLFs*8*2 = 3351MB ... Aligned to 16 bytes boundary.
Assigning pairs (of pointers and lengths) to lines ... Done in 8,134,175 clocks, 8.13 seconds.
ShortestLine = 0
LongestLine = 93,680
Sorting pointers to lines with 'Strongfool' a.k.a. 'Quicksort_Magnetica_v18_BalxchonkaForte_indirect' ...
Thread #1 of 4 sorting partition size=28973203
Thread #2 of 4 sorting partition size=37169789
Thread #3 of 4 sorting partition size=80704296
Thread #4 of 4 sorting partition size=72801336
Done (just sorting) in 68 seconds.
Writing sorted lines to 'Schmekeriada.txt' ... Done (just writing) in 138 seconds.
	Command being timed: "./Schmekeriada_GCC_12.1.1_TetraThread_STATIC.elf jawiki-20220920-pages-articles.xml"
	User time (seconds): 203.74
	System time (seconds): 21.09
	Percent of CPU this job got: 101%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:41.30
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 18740424
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 3
	Minor (reclaiming a frame) page faults: 4684870
	Voluntary context switches: 7653042
	Involuntary context switches: 808
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
[root@localhost-live Schmekeriada_2022-Oct-05]# sha1sum Schmekeriada.txt 
b97bab60515c4bafcaf4fa6b322b199f41e6d6d1  Schmekeriada.txt
Linuxsort stats and sha1 checksum:
Code:
[root@localhost-live Schmekeriada_2022-Oct-05]# /bin/time -v sort -o Linuxsort jawiki-20220920-pages-articles.xml --parallel=8
	Command being timed: "sort -o Linuxsort jawiki-20220920-pages-articles.xml --parallel=8"
	User time (seconds): 214.59
	System time (seconds): 27.18
	Percent of CPU this job got: 120%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:20.17
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 11233504
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 2808058
	Voluntary context switches: 7657686
	Involuntary context switches: 1647
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
[root@localhost-live Schmekeriada_2022-Oct-05]# sha1sum Linuxsort
b97bab60515c4bafcaf4fa6b322b199f41e6d6d1  Linuxsort
The testdatafile is downloadable at:
https://dumps.wikimedia.org/jawiki/2...ticles.xml.bz2

I have troubles in creating STAND-ALONE binaries in Linux (in order to use them in live boots), when using -static in compile line, always getting complaints about missing static libraries or linker crashes (as in CLANG v14), please someone help me to create CLANG/GCC standalone elf, anyone?

Add-on: When running, I saw that 'System Monitor' the Task Manager equivalent, shows 19.6GB peak memory usage for Schmekeriada whereas 33.3GB for sort v8.32, is there a non-GUI tool showing the peak memory usage of a process, such tool is timer64.exe by Igor Pavlov, giving the physical/virtual memory footprint after finishing a task?

Last edited by Sanmayce; 10-08-2022 at 12:17 AM.
 
Old 10-09-2022, 02:19 AM   #20
Sanmayce
Member
 
Registered: Jul 2010
Location: Sofia
Posts: 73

Original Poster
Rep: Reputation: 13
The same benchmark was rerun, this time with a buffered (1GB) dump within Schmekeriada, thus, millions of fwrite() invocations were replaced by some 15, as a result it is the fastest sort tool known to me, is this so, are out there faster tools?!

With this modern laptop with Zen2 4800H and 64GB DDR4, Schmekeriada is 3:14.04/2:52.94=(3*60+14)/(2*60+52)=1.12x faster than Linux' sort.
My experience shows that CLANG generates faster code in all sorting approaches within Schmekeriada, however I couldn't make static elf (in order to run it on live boot without the GLIBC).

This is the console dump, running 'bench_JW.sh':
Code:
[liveuser@localhost-live ~]$ cd /run/media/liveuser/TERA-PIG_1TB/Schmekeriada_2022-Oct-10
[liveuser@localhost-live Schmekeriada_2022-Oct-10]$ su
[root@localhost-live Schmekeriada_2022-Oct-10]# sh bench_JW.sh 
...
              total        used        free      shared  buff/cache   available
Mem:           62Gi       901Mi        60Gi        31Mi       1.3Gi        60Gi
Swap:         4.0Gi          0B       4.0Gi

LANG=en_US.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
C
The new revison:
Code:
   _________        .__                       __                    __             .___         
  /   _____/  ____  |  |__    _____    ____  |  | __  ____ _______ |__|_____     __| _/_____    
  \_____  \ _/ ___\ |  |  \  /     \ _/ __ \ |  |/ /_/ __ \\_  __ \|  |\__  \   / __ | \__  \   
  /        \\  \___ |   Y  \|  Y Y  \\  ___/ |    < \  ___/ |  | \/|  | / __ \_/ /_/ |  / __ \_ 
 /_______  / \___  >|___|  /|__|_|  / \___  >|__|_ \ \___  >|__|   |__|(____  /\____ | (____  / 
         \/      \/      \/       \/      \/      \/     \/                 \/      \/      \/   
This build (2022-Oct-10) features Quicksort-Magnetica-Strongfool, also buffered dump of sorted data.
This tool is 100% FREE and open-source, for improvements: sanmayce@sanmayce.com, enfun!
Current priority is -20.
Size of input file: 15,674,562,752
Allocating FILE-Buffer 14948MB ...
Counting lines ... Done in 1,398,641 clocks, 1.40 seconds.
Number of LF-ending lines: 219,648,620
Postfixing the last "line" with a LF.
Allocating Master-Buffer (Offsets+Lengths) NumberOfLFs*8*2 = 3351MB ... Aligned to 16 bytes boundary.
Assigning pairs (of pointers and lengths) to lines ... Done in 9,719,383 clocks, 9.72 seconds.
ShortestLine = 0
LongestLine = 93,680
Sorting pointers to lines with 'Strongfool' a.k.a. 'Quicksort_Magnetica_v18_BalxchonkaForte_indirect' ...
Thread #1 of 4 sorting partition size=28973203
Thread #2 of 4 sorting partition size=37169789
Thread #3 of 4 sorting partition size=80704296
Thread #4 of 4 sorting partition size=72801336
Done (just sorting) in 66 seconds.
Writing sorted lines to 'Schmekeriada.txt' ... Allocating DUMP-Buffer (for 'fwrite()') 1024MB ...
Done (just writing) in 91 seconds.
Total LPS performance: 1,262,348 Lines-Per-Second
Total BPS performance: 90,083,693 Bytes-Per-Second
	Command being timed: "./Schmekeriada_GCC_12.1.1_TetraThread.elf jawiki-20220920-pages-articles.xml"
	User time (seconds): 183.10
	System time (seconds): 15.25
	Percent of CPU this job got: 114%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:52.94
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 19789028
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 13
	Minor (reclaiming a frame) page faults: 4947006
	Voluntary context switches: 3826884
	Involuntary context switches: 750
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
The version of sort supplied with Fedora 33:
Code:
sort (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.
The sort runs:
Code:
	Command being timed: "sort -o Linuxsort jawiki-20220920-pages-articles.xml --parallel=8"
	User time (seconds): 230.75
	System time (seconds): 28.78
	Percent of CPU this job got: 133%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:14.04
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 20214340
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 5053202
	Voluntary context switches: 7656333
	Involuntary context switches: 3329
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
b97bab60515c4bafcaf4fa6b322b199f41e6d6d1  Schmekeriada.txt
b97bab60515c4bafcaf4fa6b322b199f41e6d6d1  Linuxsort
Out of curiosity, tried 16 threads, but it turned out to be slower than 8:
Code:
	Command being timed: "sort -o Linuxsort jawiki-20220920-pages-articles.xml --parallel=16"
	User time (seconds): 285.05
	System time (seconds): 29.03
	Percent of CPU this job got: 148%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:30.81
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 8162408
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 9
	Minor (reclaiming a frame) page faults: 2040616
	Voluntary context switches: 7734351
	Involuntary context switches: 19051
	Swaps: 0
	File system inputs: 24
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
[root@localhost-live Schmekeriada_2022-Oct-10]#
The full C source is at my Internet drive (along with the elf), enjoy!

This is the package that is FINAL:
Schmekeriada_2022-Oct-12.tar.7z 1.62 MB (1,700,046 bytes):
https://drive.google.com/file/d/1boB...ew?usp=sharing

Last edited by Sanmayce; 10-13-2022 at 01:21 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Issue with sorting using memcmp and qsort Skyrius Programming 2 05-08-2012 05:36 PM
qsort for files? nodger Programming 4 01-12-2006 08:55 PM
qsort() on linux producing the wrong sort Adelsm Programming 17 09-27-2005 02:58 PM
In C, using qsort for dynamically allocated array ntmsz Programming 7 08-23-2005 11:33 AM
qsort in C questions trutnev Programming 4 06-10-2005 09:49 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration