ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The following is a question that arose from a study of discrete math which has become a pleasant diversion. The original question was about the number of unique combinations of objects, of specified length, that could be formed from some set.
Being computer users, we began to think in terms of programs and algorithms that would generate such lists of combinations for us. PHP is an excellent rapid prototyping language, hence the solution below is written in PHP, but the language is really not important.
We can state the math and coding problem like this:
Code:
* Given some set with n members, generate all subsets with k members
* Code for best combined execution speed and elegance of code
* Language of your choice (speed comparison is among same language solutions only)
* Execution time does not include any output or use of resultant combinations
Relying on countability of finite sets to transition from "objects" to integers, we state our universal set to be some range of integers from zero to n-1, and extend that to represent any set of objects.
We have gone through quite a few alternative approaches and post the following code as our best current PHP solution in terms of simplicity consistent with best execution speed.
The initial solution was a much simpler (code wise) recursive approach, but that turns out to be the worst by far in terms of execution speed (with PHP), somewhat surprisingly. (This prompted us to look for faster algorithms/code methods.)
We would be interested to see alternate algorithms and alternate code solutions, including solutions using other programming languages, from any one interested.
Code:
#!/usr/bin/php
<?php
/**
* Specification:
*
* 1: Generate all k-combinations of integers in range 0 to n-1 (inclusive) for
* some k and n where 0 < k <= n
* 2: Make execution time as low as possible.
* 3: Make code as elegant as possible.
* 4: The choice of programming language is yours.
* 5: Use or output of combinations not included in determining execution speed
*
* The number of combinations is given by the function
*
* n!
* C(n,k) = --------
* k!(n-k)!
*
* Below is my solution in PHP - JASA 10/17/15.
*/
/*** Runtime confguration ***/
$n=36; // Numbers to use in combination (0 to $n-1 inclusive).
$k=8; // Combination length (0 < k <= n).
/*** End Runtime confguration ***/
// Newline.
define('NL',chr(10));
function fact($n)
{
/** Returns factorial of $n (n!). */
for($f=1;$n>1;--$n)
$f*=$n;
return $f;
}
$e=(int)(fact($n)/
(fact($k)*fact($n-$k))); // Expected number of combinations.
$x=0; // Counter.
$c=$m=array(); // $c=combination array. $m=max values of elements
// ($n-$k,$n-($k-1),$n-($k-2),...)
// Set $m and $c.
for($i=0;$i<$k;++$i)
$m[$i]=$n-($k-($c[$i]=$i));
// Last value of combination.
$v=&$c[$k-1];
// Special case where k=1 we go straight to solution set
if($k==1)
for(;$v<$n;++$v,++$x){
//Comment out next line to suppress output of all combinations
//echo join(' ',$c),NL;
}
else
// Start combinations (k>1).
while(1){
// Change last number of combination.
for(;$v<$n;++$v,++$x){
//Comment out next line to suppress output of all combinations
//echo join(' ',$c),NL;
}
// Adjust earlier values.
for($i=$k-2;++$c[$i]>$m[$i]&&--$i>-1;)
;
// At end of combinations?
if($i<0)
break;
// Adjust later values (after $i) for next loop.
while(++$i<$k)
$c[$i]=$c[$i-1]+1;
}
// For $n=36 and $k=8 this should be 30260340.
echo 'Combinations: ',$x,NL;
// Correct number of combinations?
if($x!==$e)
echo 'Wrong number of combinations (expected ',$e,', got ',$x,')!',NL;
?>
*** Some added comments:
"Combination" is used here in the strict sense - an unordered collection - not to be confused with a permutation.
I have an awk version in mind, too - but nothing to show, yet... and a relational DB approach would support direct implementation using set notation... lots of ideas to explore.
Last edited by astrogeek; 10-18-2015 at 01:38 AM.
Reason: typos, small wording improvements
Very nice! I have at various times enjoyed solving Project Euler problems.
I cobbled the following C++ program together from some Euler solutions that I have. It executes in <200 milliseconds on my machine.
Very nice indeed! I have thought about doing a c++ version but just had not gotten around to it! Well done!
*** Added comments after having time to look more closely at your code...
Your recursive permute(...) function is essentially what we had originally implemented in PHP, using arrays as you use vectors. It was surprisingly s-l-o-w...
I initially thought we were doing something incorrectly and set about trying tp improve the performance, with little effect. It turned out to be the recursion, not the array operations that were the bottleneck. Similar but non-recursive implementations were faster by a factor of 2-4 times.
Your vector / permute() function implementation meets my definition of elegant!
There is a huge potential for optimization here, but the elegance will drop very quickly. So, can you clarify ?
Yea, I thought the word elegance might be kind of subjective.
What we had in mind was readability and neatness mostly, I think. Nothing too obscure or incomprehensible!
(Our PHP example is over commented for this post, so verbose comments are not an inelegant factor)
But as coding styles can be quite different, submit your optimizations and we won't complain as long as we can make sense of it - so let's see what you have in mind.
Well, the entire problem is just that equation. The equation has many factorials, so factorials need to be done quickly. Also note that the chance of overflowing a type is high. Either you're going to use an optimized library that handles this, or you're going to make a table of factorials and use that. If you really have to calculate factorials, you can use threads.
Well, the entire problem is just that equation. The equation has many factorials, so factorials need to be done quickly...
Actually no, the factorial calculation is only used to find the number of combinations for a given n and k - but that is not part of the stated problem. We added that code before posting here only as a convenience check so anyone writing code could know how to test if theirs (or ours) actually found all possible combinations as determined by actual count.
Code:
* Given some set with n members, generate all subsets with k members
You can leave out the factorial bit of code if you want - the actual problem space is to generate all possible combinations of given length, not to calculate how many there are.
*** Also, making a table of potential combinations and using them for lookup had occurred to us but those would be considered inelegant as a solution. We would want to see the code that generated the tables.
Speaking of slow, last night I tried the following program using 'next_permutation' from the C++ standard library. Compiled without optimization it takes 21.3 sec. to process. This morning I tried compiling it with '-O3' and the result is a more acceptable 1.3 sec. but still slower than the program posted above. Maybe I'm not holding my mouth right . . .
Code:
#include <iostream>
#include <algorithm>
#include <vector>
#include <ctime>
#include <chrono>
void print(const std::vector<bool> &v, int n) {
for (int i = 0; i < n; ++i) {
if (v[i]) {
std::cout << (i+1) << " ";
}
}
std::cout << "\n";
}
int main() {
auto start_time = std::chrono::high_resolution_clock::now();
int n = 36;
int k = 8;
std::vector<bool> v(n);
std::fill(v.begin() + n - k, v.end(), true);
int count = 0;
do {
//print(v, n);
++count;
} while (std::next_permutation(v.begin(), v.end()));
std::cout << count << std::endl;
auto end_time = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);
std::cout << "Processing time= "<< double(elapsed.count()) <<" milliseconds"<< std::endl;
return 0;
}
An aside: Is it not possible to thank someone on the first post of a thread?
As an alternative, here's a somewhat naive, not-very-elegant, non-recursive solution:
Code:
#include <iostream>
#include <vector>
#include <stdio.h>
int comb(unsigned n, unsigned k) {
// check arguments
if (n < k || k < 1) return -1;
// create data
std::vector <unsigned> items;
items.reserve(k);
for (unsigned i = 0; i < k; i++) { items.push_back(i); }
// start with the last of the k items
unsigned which = k - 1;
// get combinations
for(;;) {
// print current
//for (unsigned i = 0; i < k; i++) { std::cout << items[i] << " "; }
//std::cout << std::endl;
// final state
if (items[0] == n - k) { break; }
while( (items[which] == n - 1
|| (which != k - 1
&& items[which + 1] == items[which] + 1))
) { which--; }
items[which]++;
for (unsigned i = which + 1; i < k; i++)
{ items[i] = items[which]+i-which; }
which=k-1;
}
return 0;
}
int main(int argc, char *argv[]) {
unsigned n=36,k=8;
comb(n,k);
return 0;
}
Sorry for slow responses, my new "normal" I think!
Quote:
Originally Posted by norobro
Speaking of slow, last night I tried the following program using 'next_permutation' from the C++ standard library. Compiled without optimization it takes 21.3 sec. to process. This morning I tried compiling it with '-O3' and the result is a more acceptable 1.3 sec. but still slower than the program posted above. Maybe I'm not holding my mouth right . . .
An aside: Is it not possible to thank someone on the first post of a thread?
Thanks for the variation.
On my single CPU 32 bit laptop I get 38.8 seconds without optimization, 3.6 seconds with -O2 and 3.9 seconds with -O3.
It is interesting that it is so much slower than the first one you posted, which comes in at 3.76s not optimization*, 0.85 with -O2 and 0.51s with -O3.
*Note: I had to make the following change to get correct possible value with no optimizations:
Code:
#include <cmath>
...
int possible = ceil(factorial(n) / (factorial(k)*factorial(n-k)));
It is a delight to read your code by the way!
Aside: I think it is not possible on first post for helpful clicks, but you can use the little scale icon in lower left of post near OS icon.
This will not win any speed contest compared to C++, but it occurred to me that the desired result can be most easily expressed in set notation, and that is the world of relational databases and SQL. So I have worked up a relational / SQL implementation using MariaDB (MySQL)...
First express the problem in simple set notation:
Code:
Given a set B with cardinality n
and a constant k, 0 < k <= n
Find all sets A {A is proper subset of B, |A| = k}
So first we need to construct our set B as a simple relation (table) with one column (which we will call 'id') of integer type and insert n rows of sequential integers starting with 1. (We will name the table combo and use 1 based integers instead of zero based as in other examples because it simplifies inserting the values).
I include here a simple SQL script to do this for you. You only need to do this once, or to change the number of rows (cardinality of B).
Code:
Save SQL below to a file, mkset.sql then from mysql client with database seleted:
\. mkset.sql
............................................................
/* Create and populate universal set (table)
Set the number of rows (cardinality of B) in the line...
CALL mkset(rows);
*/
DELIMITER //
DROP TABLE IF EXISTS combo;
CREATE TABLE combo(id int primary key auto_increment);
DROP PROCEDURE IF EXISTS mkset;
CREATE PROCEDURE mkset(members INT)
BEGIN
SET @membs=members;
REPEAT
INSERT INTO combo SET id=NULL;
SET @membs=@membs - 1;
UNTIL @membs<=0 END REPEAT;
END
//
CALL mkset(36);
SELECT COUNT(*) as members FROM combo;
Now, for the actual query code that generates our combinations...
Our desired subsets A, are a simple restriction of the cartesian product of the set B with itself, k times. The restriction is that each successive product term only includes higher valued rows than the previous product term.
Code:
A = B X B X B X B X B X B X B X B
Where X represents restricted product operator
A simple JOIN produces the desired cartesian product and we will express the restriction within an ON(...) condition for each JOIN, or product. (You can do the same with a WHERE clause instead of the JOIN...ON() syntax, but for MariaDB (MySQL) the optimizer produces identical execution times and I prefer the JOIN syntax.)
So our query will look like this...
Code:
Save SQL below to a file, combo.sql then from mysql client with database seleted:
\. combo.sql
............................................................
SELECT
COUNT(*) as Combinations
-- OR uncomment 'k' columns below to see result rows
-- a.id, b.id, c.id, d.id, e.id, f.id, g.id, h.id
-- 'k' is the number of products, or JOINS
FROM combo a
JOIN combo b ON(b.id>a.id)
JOIN combo c ON(c.id>b.id)
JOIN combo d ON(d.id>c.id)
JOIN combo e ON(e.id>d.id)
JOIN combo f ON(f.id>e.id)
JOIN combo g ON(g.id>f.id)
JOIN combo h ON(h.id>g.id)
;
How does this compare to other languages in terms of execution speed?
Code:
MariaDB [test]> \. combo.sql
+--------------+
| Combinations |
+--------------+
| 30260340 |
+--------------+
1 row in set (4 min 7.89 sec)
Not a speed demon, but for a product set of 30-million-plus rows on this machine... not bad either (as long as you don't print them!).
But as an implementation of the original problem in a different language, it is what it is!
As time permits I may try a PostgreSQL and maybe an SQLite versionfor meaningful speed comparisons - and maybe try to improve the speed.
Last edited by astrogeek; 10-20-2015 at 02:04 AM.
Reason: Fixed non-displayed unicode notation
Thanks. That's very interesting! I would have never thought of using SQL and stored procedures. Of course my knowledge of DB's and SQL is very basic.
SQL results on my machine:
Code:
mysql> \. combo.sql
+--------------+
| Combinations |
+--------------+
| 30260340 |
+--------------+
1 row in set (1 min 36.27 sec)
Both Python and Perl provide modules for finding permutations/combinations. I'm not very proficient in either language but here are a couple of simple programs and their execution times for comparison:
Code:
#!/usr/bin/env perl
use strict; use warnings;
use Algorithm::Combinatorics qw(combinations);
my $start = time();
my @n = (1 .. 36);
my $k = 8;
my $iter = combinations(\@n, $k);
my $count = 0;
while (my $c = $iter->next) {
# print "@$c\n";
$count = $count +1;
}
my $end = time();
my $exec_time = $end - $start;
print "$count combinations in $exec_time seconds\n";
Execution time: 49 secs.
Code:
#!/usr/bin/python
import itertools
import time
start = time.time()
iterable = range(1,37)
r = 8
y = 0;
for combo in itertools.combinations(iterable, r):
#print combo
y=y+1
print y," combinations in ",(time.time() - start)," seconds\n";
Thanks. That's very interesting! I would have never thought of using SQL and stored procedures. Of course my knowledge of DB's and SQL is very basic.
SQL results on my machine:
Code:
mysql> \. combo.sql
+--------------+
| Combinations |
+--------------+
| 30260340 |
+--------------+
1 row in set (1 min 36.27 sec)
Thanks for the test run - I kind of thought it unlikely that anyone would actually run it as it involved a little setup.
The stored procedure was only a convenience for easily re-populating the table and is not involved in getting the result set.
I just tried it on PostgreSQL, same machine, same query file, table structure as follows:
Code:
CREATE TABLE combo(id SERIAL);
INSERT INTO combo (id) VALUES(nextval('combo_id_seq')),... 36 times...;
...
# time psql testdb <combo.sql
combinations
--------------
30260340
(1 row)
real 1m19.934s
user 0m0.011s
sys 0m0.006s
That vs 4 min 7.89 sec for MariaDB/MySQL (although - the MySQL server on this machine has other users and >20 databases with ~50GB of data in filesystem. PostgreSQL is fairly quiescent).
But what is really impressive as an SQL implementation is SQLite...
Code:
sqlite3 dc-sqlite.db
CREATE TABLE combo(id INTEGER PRIMARY KEY AUTOINCREMENT);
INSERT INTO combo (id) VALUES(NULL),... 36 times ...;
...
# time sqlite3 -header dc-sqlite.db <combo.sql
Combinations
30260340
real 0m7.344s
user 0m7.255s
sys 0m0.007s
...which beats the simple, fast PHP script on same machine.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.