LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-10-2021, 10:10 PM   #136
mimorek
Member
 
Registered: Feb 2013
Distribution: Debian (jessie)
Posts: 42

Rep: Reputation: Disabled

Quote:
Originally Posted by dogpatch View Post
Here's my version 1.31, with some fixes per Mimorek's compiler messages. This is a complete upload; you don't need previous version.

WonderRing_1.31.patch.txt
Output of make:
Code:
$ make SquareRing 
g++ -Wall WonderRing.cpp -include SquareTarget_1.3.h -o SquareRing
In file included from WonderRing.cpp:10:
DataRing.h: In member function ‘bool Ring::rInit(int)’:
DataRing.h:40:53: error: lvalue required as left operand of assignment
  if (((void*)rElement = malloc(AllocSz*sizeof(rType))) == NULL)
                                                     ^
In file included from WonderRing.cpp:44:
WRpairs.h: In function ‘int BuildPairsArray()’:
WRpairs.h:242:59: error: lvalue required as left operand of assignment
  if (((void*)gWkArray = malloc((gRingSize+1)*sizeof(rType))) == NULL) { // alloc extra element
                                                           ^
WRpairs.h:247:61: error: lvalue required as left operand of assignment
  if (((void*)gBestArray = malloc((gRingSize+1)*sizeof(rType))) == NULL) {
                                                             ^
WonderRing.cpp: In function ‘void Terminate()’:
WonderRing.cpp:179:2: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
  if (!ring0.rClockwise) printf("Counter-"); printf("Clockwise");
  ^~
WonderRing.cpp:179:45: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
  if (!ring0.rClockwise) printf("Counter-"); printf("Clockwise");
                                             ^~~~~~
make: *** [makefile:2: SquareRing] Error 1
 
1 members found this post helpful.
Old 05-10-2021, 11:14 PM   #137
dogpatch
Member
 
Registered: Nov 2005
Location: Central America
Distribution: Mepis, Android
Posts: 490

Original Poster
Blog Entries: 4

Rep: Reputation: 238Reputation: 238Reputation: 238
For claiming not to know much about C, you're teaching me some valuable lessons. Had never heard that malloc must be typed as int*. Still don't really understand why. But, anyway, 1.31 was, as mentioned, a blind fix. Have now applied your method of calling malloc. It still makes zero difference in my compiled binary executable, so is still a blind fix. Here's 1.32, still a full upload, less confusing than piecemeal fixes.

WonderRing_1.32.patch.txt
 
Old 05-11-2021, 12:26 AM   #138
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 19,859

Rep: Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715
Quote:
Originally Posted by dogpatch View Post
It still makes zero difference in my compiled binary executable, so is still a blind fix.
No, that is not a blind fix, but a fix for the human being (not for the CPU). That will ensure the code will do exactly what you wrote and there are no "strange" side effects.
 
1 members found this post helpful.
Old 05-11-2021, 05:31 AM   #139
GazL
LQ Veteran
 
Registered: May 2008
Distribution: CRUX 3.7
Posts: 6,605

Rep: Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720
Quote:
Originally Posted by dogpatch View Post
For claiming not to know much about C, you're teaching me some valuable lessons. Had never heard that malloc must be typed as int*. Still don't really understand why.
C++ is very picky about asignment between types, so the cast is necessary.

For pure C, it is not necessary, and in fact some folks used to argue that it was better to leave it off, so as to help catch a missing #include <stdlib.h>. These days, gcc will throw an "Implicit declaration" warning when you do that, so that argument likely no longer stands.
 
1 members found this post helpful.
Old 05-11-2021, 07:14 AM   #140
mimorek
Member
 
Registered: Feb 2013
Distribution: Debian (jessie)
Posts: 42

Rep: Reputation: Disabled
Quote:
Originally Posted by dogpatch View Post
Here's 1.32, still a full upload, less confusing than piecemeal fixes.

WonderRing_1.32.patch.txt
It works
Code:
$ patch -p1 <WonderRing_1.32.patch.txt 
patching file README.txt
patching file makefile
patching file DataRing.h
patching file WonderRing.cpp
patching file WRpairs.h
patching file SquareTarget_1.3.h
patching file PrimeTarget.h
patching file TriangularTarget.h
$ make SquareRing 
g++ -Wall WonderRing.cpp -include SquareTarget_1.3.h -o SquareRing
In file included from WonderRing.cpp:10:
DataRing.h: In member function ‘bool Ring::rOrder(bool)’:
DataRing.h:198:8: warning: variable ‘Hi’ set but not used [-Wunused-but-set-variable]
  rType Hi, Lo;
        ^~
WonderRing.cpp: In function ‘void Terminate()’:
WonderRing.cpp:179:2: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
  if (!ring0.rClockwise) printf("Counter-"); printf("Clockwise");
  ^~
WonderRing.cpp:179:45: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
  if (!ring0.rClockwise) printf("Counter-"); printf("Clockwise");
                                             ^~~~~~
$ ls
DataRing.h  PrimeTarget.h  SquareRing          TriangularTarget.h         WonderRing.cpp
makefile    README.txt     SquareTarget_1.3.h  WonderRing_1.32.patch.txt  WRpairs.h
$ ./SquareRing 
Assembling and optimizing pairs array. done.
Generating ring(s):
15 1 8 28 21 4 32 17 19 30 6 3 13 12 24 25 11 5 31 18 7 29 20 16 9 27 22 14 2 23 26 10 

Ring stats: size=32, seed element=15, head index=3, direction=Counter-Clockwise
1 valid ring generated
37553 total recursion cycles (18792 in final pass)

Last edited by mimorek; 05-11-2021 at 07:19 AM.
 
2 members found this post helpful.
Old 05-11-2021, 11:20 AM   #141
dogpatch
Member
 
Registered: Nov 2005
Location: Central America
Distribution: Mepis, Android
Posts: 490

Original Poster
Blog Entries: 4

Rep: Reputation: 238Reputation: 238Reputation: 238
Quote:
Originally Posted by mimorek View Post
It works
Thanks for the confirmation, and thanks for flushing out some poorly written code. Will try to address those remaining warning messages sometime as well.

Quote:
Originally Posted by GazL View Post
C++ is very picky about asignment between types, so the cast is necessary.

For pure C, it is not necessary, and in fact some folks used to argue that it was better to leave it off, so as to help catch a missing #include <stdlib.h>. These days, gcc will throw an "Implicit declaration" warning when you do that, so that argument likely no longer stands.
Thanks. I ought to learn these kinds of differences between C and C++, and between various standards and how they've changed, so as to produce more portable code.

Thanks also for the c-faq.com link. Will bookmark that site.
 
Old 05-12-2021, 09:22 AM   #142
GazL
LQ Veteran
 
Registered: May 2008
Distribution: CRUX 3.7
Posts: 6,605

Rep: Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720
Ok, here we go, I finally caught my white-whale.

Code:
$ time ./circle -v -a -t4 -o /tmp/circle.out 44
Candidates Table:
1: 35 24 15 8 3
2: 34 23 14 7
3: 33 22 13 6 1
4: 32 21 12 5
5: 44 31 20 11 4
6: 43 30 19 10 3
7: 42 29 18 9 2
8: 41 28 17 1
9: 40 27 16 7
10: 39 26 15 6
11: 38 25 14 5
12: 37 24 13 4
13: 36 23 12 3
14: 35 22 11 2
15: 34 21 10 1
16: 33 20 9
17: 32 19 8
18: 31 7
19: 30 17 6
20: 44 29 16 5
21: 43 28 15 4
22: 42 27 14 3
23: 41 26 13 2
24: 40 25 12 1
25: 39 24 11
26: 38 23 10
27: 37 22 9
28: 36 21 8
29: 35 20 7
30: 34 19 6
31: 33 18 5
32: 17 4
33: 31 16 3
34: 30 15 2
35: 29 14 1
36: 28 13
37: 44 27 12
38: 43 26 11
39: 42 25 10
40: 41 24 9
41: 40 23 8
42: 39 22 7
43: 38 21 6
44: 37 20 5
Using Seed value: 1
10 jobs submitted to the solver queue:
35 1 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 1 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Calculating all sequences for circle of squares: 1 to 44,
4 solver threads will be used.  Please wait...

10091 results returned.
real	0m26.851s
user	1m39.975s
sys	0m0.010s
$
My laptop is not that powerful, so those with more beefy kit will likely get better results from the multithreading.


Pthreads adds about 6% overhead when run as a single thread over how it ran pre-threading, I think this is mostly because it is checking on whether the thread has been cancelled (e.g. when you looking for only the first result some threads will need to be cancelled) each time through find_next(). I could have checked for the cancellation point less often, but that would introduce latency on the exit after the result is found. To cut down on the overhead, I'd probably just sacrifice the "find one result only option", but it was interesting trying to figure out how one is supposed to abort the solver threads.


updates:
4.3 - fix a small bug with how the number of threads was passed from main()
4.4 fix potentially memory leak on seen_values_table when a thread is cancelled
4.6 (final): reworked candidate bucket code into something a little nicer. Style clean-up, Added some comments.
4.7 (really really final): Default seed value now based on number of candidates, which allows more threads to be used.

Program source is attached. Extract with:
patch -p0 < circle-v4.7.patch.txt
Attached Files
File Type: txt circle-4.7.patch.txt (24.9 KB, 8 views)

Last edited by GazL; 05-16-2021 at 11:30 AM.
 
1 members found this post helpful.
Old 05-12-2021, 09:56 AM   #143
igadoter
Senior Member
 
Registered: Sep 2006
Location: wroclaw, poland
Distribution: many, primary Slackware
Posts: 2,717
Blog Entries: 1

Rep: Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625
Great thanks @GazL - think it is time for me to do something. I plan to look for transforms of sequeneces. There are plenty ways in math to transform one sequence into another. Most known is probably Fourier transform. But there are others. I am hoping to find appropriate transformation - two-way - which applied to amazing circle give sequence easier to manage. Then procedure is to generate such sequence and transform it into amazing circle. And of course the question how many there is different sequences? However to answer this question in proper way one needs the first to tell what are different sequences. One possible way is through permutation group. Say every amazing circle is just permutation of sequence 1,2,3,4,...32. So instead of providing sequences one can provide permutations - how to rearrange 1,2,...32 to obtain amazing circle. Next step is to answer the question about structure of family all such permutations. Say simple question is do they form a group? In mathematical meaning of this word as in group theory. So one find this group - one find all amazing circles. Math here is on the level of first year study in pure math in University - so it is rather advanced. But it is really good exercise for first year math students. I know this - I am Ph. D. in math and was math lecturer at University in city I live in. And in truth I am much higher than Ph. D. - just I don't care much about titles. Knowledge matters not title.

Last edited by igadoter; 05-12-2021 at 10:22 AM.
 
1 members found this post helpful.
Old 05-12-2021, 10:19 AM   #144
GazL
LQ Veteran
 
Registered: May 2008
Distribution: CRUX 3.7
Posts: 6,605

Rep: Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720
I'm not much of a mathematician. I only have GCE O'levels in Math and "Additional Math: Pure and Stat.", and those were taken the best part of 30+ years ago, and most of it forgotten by now. I'm afraid any such discussion is going to go way over my head.
 
Old 05-13-2021, 03:51 PM   #145
dogpatch
Member
 
Registered: Nov 2005
Location: Central America
Distribution: Mepis, Android
Posts: 490

Original Poster
Blog Entries: 4

Rep: Reputation: 238Reputation: 238Reputation: 238
Downloaded and ran your latest code, and mine, for a 44 element ring. As expected, on my 933mHz Dell, got much slower results than what you posted, but your code about 3 times faster than mine. I think it used to be about twice as fast.

Quote:
Originally Posted by GazL View Post
Pthreads adds about 6% overhead when run as a single thread over how it ran pre-threading
Assume this overhead is more than compensated for when running multiple parallel thrads. When you run with, e.g. 4 parallel threads per your test run, how much faster is the pthread version over the sequential version?
 
Old 05-13-2021, 06:09 PM   #146
GazL
LQ Veteran
 
Registered: May 2008
Distribution: CRUX 3.7
Posts: 6,605

Rep: Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720
It gives a significant advantage to elapsed time, but comes at a cost in overhead.

The following was running in 3m 10sec elapsed on my v3 (pre-multi threading version.)

New multi-threaded code, but limited to a single thread:
Code:
$ time ./circle -v -a -o /dev/null 45
Calculating all sequences for circle of squares: 1 to 45,
1 solver threads will be used.  Please wait...

21931 results returned.

real	3m17.620s
user	3m17.616s
sys	0m0.002s
So, it costs me an additional 7 seconds of both elapsed and usertime (which are more or less equal).

And now on all 4 threads of my laptop
Code:
$ time ./circle -t 4 -v -a -o /dev/null 45
Calculating all sequences for circle of squares: 1 to 45,
4 solver threads will be used.  Please wait...

21931 results returned.

real	1m18.460s
user	4m45.431s
sys	0m0.347s
Runs in around 40% of the elapsed time, but 'user' is up about 40%. My laptop is 2 cores + 2 hyperthreads, which may explain why I don't get the full 1/4 of elapsed time, at least in part.


But what's really interesting is that when searching for the first result only, over-committing on the threads has a really profound effect, as one of them is likely to find a solution quickly and cancel the other threads, actually saving cpu time:
Code:
$ time ./circle 56
48 1 35 46 54 27 37 44 56 25 39 42 22 14 50 31 18 7 29 52 12 24 40 41 23 26 55 9 16 20 5 11 38 43 21 15 10 6 19 17 8 28 53 47 2 34 30 51 49 32 4 45 36 13 3 33

real	0m11.940s
user	0m11.938s
sys	0m0.001s
$ time ./circle -t32 56
48 1 15 49 51 30 34 47 53 28 36 45 55 26 38 43 21 4 32 17 19 6 10 54 46 35 29 52 12 37 44 56 8 41 40 24 25 39 42 22 27 9 16 20 5 11 14 50 31 18 7 2 23 13 3 33

real	0m0.003s
user	0m0.003s
sys	0m0.003s
$

My find_next() function currently checks for thread cancellation with pthreads_testcancel() every pass, which is likely having a bad effect on overhead here, but it means the program terminates quickly when a result if found. I could probably avoid that by just calling exit() on finding a result, but I wanted to learn how the pthreads_cancel() could be used to terminate the threads cleanly.

This is my first foray into pthreads. It's been quite educational.

update:
ok, so, I just tried ripping out the pthread_cancel() and replacing it with a call to exit()
Code:
Calculating all sequences for circle of squares: 1 to 45,
4 solver threads will be used.  Please wait...

21931 results returned.

real	1m13.281s
user	4m28.318s
sys	0m0.059s
Saved: 5 seconds real, and 17 seconds user. So, it seems threading hits usertime significantly regardless of the calls to pthread_testcancel().

That's all assuming that the usertime values being returned by time are reliable of course.

update 2:

I just tried using 2 threads, and usertime is back to what one would expect.
Code:
Calculating all sequences for circle of squares: 1 to 45,
2 solver threads will be used.  Please wait...

21931 results returned.

real	1m40.711s
user	3m18.138s
sys	0m0.006s
Which now makes me suspect the fake hyperthreaded logical cpus are what's inflating the user time, also, I notice it's only 20 seconds slower than the 4 thread run I did above, which just goes to show how poor hyperthreads really are.

Last edited by GazL; 05-13-2021 at 07:53 PM.
 
Old 05-13-2021, 07:59 PM   #147
dogpatch
Member
 
Registered: Nov 2005
Location: Central America
Distribution: Mepis, Android
Posts: 490

Original Poster
Blog Entries: 4

Rep: Reputation: 238Reputation: 238Reputation: 238
So, these results are dependent upon the machine architecture?
 
Old 05-14-2021, 01:19 AM   #148
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 19,859

Rep: Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715Reputation: 6715
Quote:
Originally Posted by GazL View Post
Which now makes me suspect the fake hyperthreaded logical cpus are what's inflating the user time, also, I notice it's only 20 seconds slower than the 4 thread run I did above, which just goes to show how poor hyperthreads really are.
What I think you cannot check if that was executed on a single core with hyperthreads or on different cores. And you cannot rule it too. But you know your hardware. You can also try to run your app several times - and you will get different results. The result will/may depend on the load of your host (so if libreoffice/youtube/flight sim or whatever is running). Result also may depend on the cache/swap. Anyway, that timing is not that accurate.
 
Old 05-14-2021, 06:26 AM   #149
GazL
LQ Veteran
 
Registered: May 2008
Distribution: CRUX 3.7
Posts: 6,605

Rep: Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720
Passing 'nosmt' as a kernel parameter at boot completely disables the use of the hyperthreaded logical cpus. So we can take them out of the picture easily. Also, all my timings were done on a completely idle system.

Based on multiple observations of finding all results for the 1-45 circle of squares, there is some variation in usertime, but my program consistently uses 3m17.5s +/- 1 second of usertime, regardless of the number of threads I throw at it: even when 5x over-committed (10 threads on a 2 core machine).

However, when hyperthreads are enabled, there's an additional 1m 28sec recorded against usertime, which is well outside the range one could ascribe to inaccuracy of timing.

Therefore, the conclusion I draw is that hyperthread logical cpus take longer to run the same workload, thus inflating user time statistic

So, that said, and ruling out the impact of hyperthreaded cpus, the multi-threaded version of the code has a ~4% overhead in usertime compared to my prior single-threaded versions.
 
Old 05-14-2021, 06:33 AM   #150
GazL
LQ Veteran
 
Registered: May 2008
Distribution: CRUX 3.7
Posts: 6,605

Rep: Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720Reputation: 4720
Quote:
Originally Posted by dogpatch View Post
So, these results are dependent upon the machine architecture?
Yes, for finding all results, the workload will always be the same, but the elapsed time can be divided by the number of cores your system has (give or take a little).

usertime figures will vary machine to machine depending on the performance of the CPU.

Last edited by GazL; 05-14-2021 at 06:34 AM.
 
  


Reply

Tags
circle, perfect square, programming challenge, ring, ring data type


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
damnit!!! i cant see what im typing.. fonts are squares iLLuSionZ Linux - Newbie 22 11-18-2003 03:36 PM
characters as squares in mozilla icyfire Linux - Software 4 09-18-2003 04:22 PM
white squares appear on screen saver? bongo.hickup Linux - Hardware 0 06-13-2003 03:51 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration