[SOLVED] 'Circle of Squares' programming challenge
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
$ make SquareRing
g++ -Wall WonderRing.cpp -include SquareTarget_1.3.h -o SquareRing
In file included from WonderRing.cpp:10:
DataRing.h: In member function ‘bool Ring::rInit(int)’:
DataRing.h:40:53: error: lvalue required as left operand of assignment
if (((void*)rElement = malloc(AllocSz*sizeof(rType))) == NULL)
^
In file included from WonderRing.cpp:44:
WRpairs.h: In function ‘int BuildPairsArray()’:
WRpairs.h:242:59: error: lvalue required as left operand of assignment
if (((void*)gWkArray = malloc((gRingSize+1)*sizeof(rType))) == NULL) { // alloc extra element
^
WRpairs.h:247:61: error: lvalue required as left operand of assignment
if (((void*)gBestArray = malloc((gRingSize+1)*sizeof(rType))) == NULL) {
^
WonderRing.cpp: In function ‘void Terminate()’:
WonderRing.cpp:179:2: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
if (!ring0.rClockwise) printf("Counter-"); printf("Clockwise");
^~
WonderRing.cpp:179:45: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
if (!ring0.rClockwise) printf("Counter-"); printf("Clockwise");
^~~~~~
make: *** [makefile:2: SquareRing] Error 1
For claiming not to know much about C, you're teaching me some valuable lessons. Had never heard that malloc must be typed as int*. Still don't really understand why. But, anyway, 1.31 was, as mentioned, a blind fix. Have now applied your method of calling malloc. It still makes zero difference in my compiled binary executable, so is still a blind fix. Here's 1.32, still a full upload, less confusing than piecemeal fixes.
It still makes zero difference in my compiled binary executable, so is still a blind fix.
No, that is not a blind fix, but a fix for the human being (not for the CPU). That will ensure the code will do exactly what you wrote and there are no "strange" side effects.
For claiming not to know much about C, you're teaching me some valuable lessons. Had never heard that malloc must be typed as int*. Still don't really understand why.
C++ is very picky about asignment between types, so the cast is necessary.
For pure C, it is not necessary, and in fact some folks used to argue that it was better to leave it off, so as to help catch a missing #include <stdlib.h>. These days, gcc will throw an "Implicit declaration" warning when you do that, so that argument likely no longer stands.
Thanks for the confirmation, and thanks for flushing out some poorly written code. Will try to address those remaining warning messages sometime as well.
Quote:
Originally Posted by GazL
C++ is very picky about asignment between types, so the cast is necessary.
For pure C, it is not necessary, and in fact some folks used to argue that it was better to leave it off, so as to help catch a missing #include <stdlib.h>. These days, gcc will throw an "Implicit declaration" warning when you do that, so that argument likely no longer stands.
Thanks. I ought to learn these kinds of differences between C and C++, and between various standards and how they've changed, so as to produce more portable code.
Thanks also for the c-faq.com link. Will bookmark that site.
My laptop is not that powerful, so those with more beefy kit will likely get better results from the multithreading.
Pthreads adds about 6% overhead when run as a single thread over how it ran pre-threading, I think this is mostly because it is checking on whether the thread has been cancelled (e.g. when you looking for only the first result some threads will need to be cancelled) each time through find_next(). I could have checked for the cancellation point less often, but that would introduce latency on the exit after the result is found. To cut down on the overhead, I'd probably just sacrifice the "find one result only option", but it was interesting trying to figure out how one is supposed to abort the solver threads.
updates:
4.3 - fix a small bug with how the number of threads was passed from main()
4.4 fix potentially memory leak on seen_values_table when a thread is cancelled
4.6 (final): reworked candidate bucket code into something a little nicer. Style clean-up, Added some comments.
4.7 (really really final): Default seed value now based on number of candidates, which allows more threads to be used.
Program source is attached. Extract with:
patch -p0 < circle-v4.7.patch.txt
Great thanks @GazL - think it is time for me to do something. I plan to look for transforms of sequeneces. There are plenty ways in math to transform one sequence into another. Most known is probably Fourier transform. But there are others. I am hoping to find appropriate transformation - two-way - which applied to amazing circle give sequence easier to manage. Then procedure is to generate such sequence and transform it into amazing circle. And of course the question how many there is different sequences? However to answer this question in proper way one needs the first to tell what are different sequences. One possible way is through permutation group. Say every amazing circle is just permutation of sequence 1,2,3,4,...32. So instead of providing sequences one can provide permutations - how to rearrange 1,2,...32 to obtain amazing circle. Next step is to answer the question about structure of family all such permutations. Say simple question is do they form a group? In mathematical meaning of this word as in group theory. So one find this group - one find all amazing circles. Math here is on the level of first year study in pure math in University - so it is rather advanced. But it is really good exercise for first year math students. I know this - I am Ph. D. in math and was math lecturer at University in city I live in. And in truth I am much higher than Ph. D. - just I don't care much about titles. Knowledge matters not title.
I'm not much of a mathematician. I only have GCE O'levels in Math and "Additional Math: Pure and Stat.", and those were taken the best part of 30+ years ago, and most of it forgotten by now. I'm afraid any such discussion is going to go way over my head.
Downloaded and ran your latest code, and mine, for a 44 element ring. As expected, on my 933mHz Dell, got much slower results than what you posted, but your code about 3 times faster than mine. I think it used to be about twice as fast.
Quote:
Originally Posted by GazL
Pthreads adds about 6% overhead when run as a single thread over how it ran pre-threading
Assume this overhead is more than compensated for when running multiple parallel thrads. When you run with, e.g. 4 parallel threads per your test run, how much faster is the pthread version over the sequential version?
It gives a significant advantage to elapsed time, but comes at a cost in overhead.
The following was running in 3m 10sec elapsed on my v3 (pre-multi threading version.)
New multi-threaded code, but limited to a single thread:
Code:
$ time ./circle -v -a -o /dev/null 45
Calculating all sequences for circle of squares: 1 to 45,
1 solver threads will be used. Please wait...
21931 results returned.
real 3m17.620s
user 3m17.616s
sys 0m0.002s
So, it costs me an additional 7 seconds of both elapsed and usertime (which are more or less equal).
And now on all 4 threads of my laptop
Code:
$ time ./circle -t 4 -v -a -o /dev/null 45
Calculating all sequences for circle of squares: 1 to 45,
4 solver threads will be used. Please wait...
21931 results returned.
real 1m18.460s
user 4m45.431s
sys 0m0.347s
Runs in around 40% of the elapsed time, but 'user' is up about 40%. My laptop is 2 cores + 2 hyperthreads, which may explain why I don't get the full 1/4 of elapsed time, at least in part.
But what's really interesting is that when searching for the first result only, over-committing on the threads has a really profound effect, as one of them is likely to find a solution quickly and cancel the other threads, actually saving cpu time:
My find_next() function currently checks for thread cancellation with pthreads_testcancel() every pass, which is likely having a bad effect on overhead here, but it means the program terminates quickly when a result if found. I could probably avoid that by just calling exit() on finding a result, but I wanted to learn how the pthreads_cancel() could be used to terminate the threads cleanly.
This is my first foray into pthreads. It's been quite educational.
update:
ok, so, I just tried ripping out the pthread_cancel() and replacing it with a call to exit()
Code:
Calculating all sequences for circle of squares: 1 to 45,
4 solver threads will be used. Please wait...
21931 results returned.
real 1m13.281s
user 4m28.318s
sys 0m0.059s
Saved: 5 seconds real, and 17 seconds user. So, it seems threading hits usertime significantly regardless of the calls to pthread_testcancel().
That's all assuming that the usertime values being returned by time are reliable of course.
update 2:
I just tried using 2 threads, and usertime is back to what one would expect.
Code:
Calculating all sequences for circle of squares: 1 to 45,
2 solver threads will be used. Please wait...
21931 results returned.
real 1m40.711s
user 3m18.138s
sys 0m0.006s
Which now makes me suspect the fake hyperthreaded logical cpus are what's inflating the user time, also, I notice it's only 20 seconds slower than the 4 thread run I did above, which just goes to show how poor hyperthreads really are.
Which now makes me suspect the fake hyperthreaded logical cpus are what's inflating the user time, also, I notice it's only 20 seconds slower than the 4 thread run I did above, which just goes to show how poor hyperthreads really are.
What I think you cannot check if that was executed on a single core with hyperthreads or on different cores. And you cannot rule it too. But you know your hardware. You can also try to run your app several times - and you will get different results. The result will/may depend on the load of your host (so if libreoffice/youtube/flight sim or whatever is running). Result also may depend on the cache/swap. Anyway, that timing is not that accurate.
Passing 'nosmt' as a kernel parameter at boot completely disables the use of the hyperthreaded logical cpus. So we can take them out of the picture easily. Also, all my timings were done on a completely idle system.
Based on multiple observations of finding all results for the 1-45 circle of squares, there is some variation in usertime, but my program consistently uses 3m17.5s +/- 1 second of usertime, regardless of the number of threads I throw at it: even when 5x over-committed (10 threads on a 2 core machine).
However, when hyperthreads are enabled, there's an additional 1m 28sec recorded against usertime, which is well outside the range one could ascribe to inaccuracy of timing.
Therefore, the conclusion I draw is that hyperthread logical cpus take longer to run the same workload, thus inflating user time statistic
So, that said, and ruling out the impact of hyperthreaded cpus, the multi-threaded version of the code has a ~4% overhead in usertime compared to my prior single-threaded versions.
So, these results are dependent upon the machine architecture?
Yes, for finding all results, the workload will always be the same, but the elapsed time can be divided by the number of cores your system has (give or take a little).
usertime figures will vary machine to machine depending on the performance of the CPU.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.