ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Your problem is to compute the permutations of N elements without repetitions. I cannot think a simple solution, right now. However, it can be done by means of a recursive function. Is bash mandatory or can you accept other solutions? Here is a link that might provide some inspiration, for now.
Hmm.. if by "string order does not matter" you mean that you are only concered with substring existence and not order, that you only want one occurrence of either "apple pie" or "pie apple", then the solution is rather simple for up to 63 components (31 on 32-bit architectures):
Code:
combine() {
local limit=$[ 1 << $# ]
local args=("$@")
for ((value = 1; value < limit; value++)); do
local parts=()
for ((i = 0; i < $#; i++)); do
[ $[(1<<i) & value] -ne 0 ] && parts[${#parts[@]}]="${args[i]}"
done
echo "${parts[@]}"
done
}
This works because each combination where only the substring existence matters maps perfectly to a substring-count-digit positive binary integer -- remember, each substring is either included or not included in each combination. (Zero maps to the empty string -- no substrings at all --, and is therefore uninteresting.)
The above simply loops value over all those positive integers, then constructs the string by letting each binary digit in value select whether a substring is included (1) or not included (0).
Right now it relies on Bash integer variables, and is therefore limited to 31 or 32 substrings on 32-bit architectures, and 63 or 64 substrings on 64-bit architectures.
Output of combine apple pie cake | sort:
Code:
apple
apple cake
apple pie
apple pie cake
cake
pie
pie cake
what can i say..? just perfect!
i've been writing bash scripts for about 3 years, but i can't understand even half of the code you posted... i can just see it works!
thank you VERY much...
i can't understand even half of the code you posted...
Let's see if we can remedy that. Understanding is much more important than just having working code, in my opinion.
The code defines a Bash function. A function is similar to a external script, except that it is run within the same Bash shell.
Let's start:
Code:
combine() {
local limit=$[ 1 << $# ]
local args=("$@")
local value
Above, we define a local variable limit, and set it the value (two to the power of the number of parameters to this function). If you have three parameters, the value will be 2³ = 8. Note that this is by definition the number of combinations, including the empty string. (Therefore, we will output limit-1 combinations.)
In general, the left shift operator a<<n is mathematically equivalent to a×2ⁿ for all integral nonnegative n and all integer nonnegative a. It simply moves the binary digits left, adding zeros to the right. (It does operate the same way with negative numbers, with a couple of quirks, but that's not important here, since we only deal with positive numbers and zero -- nonnegative integers.)
The above also defines an array args containing all the parameters to the function. Note that the first parameter will be ${args[0]} . This is important because bits are counted from 0 upwards: the lowest bit is bit 0 (and not bit 1). (Remember, 2⁰=1, 2¹=1, 2²=4, 2³=8, and so on.)
Note: I forgot the local value definition from the original script. It is not really necessary for proper working, but without it, the value variable will be visible in the caller! If the caller also uses a value variable, without this line the value in the caller will be modified (to limit).
Next, to generate each combination, we need to loop over all positive integers from 1 to limit, not including limit itself, using value as the loop variable:
Code:
for ((value = 1; value < limit; value++)); do
Next, we need to loop over each bit -- each binary digit -- in value . First, though, we'll clear the parts array, which is used to hold the substrings for the current string (the combination that matches the binary representation of variable value ):
Code:
local parts=()
for ((i = 0; i < $#; i++)); do
The following line is the tricky bit. Its logic is simple: If bit i in value is set, then append the corresponding substring to the parts array.
Above, $[ (1 << i) & value ] evaluates to nonzero if and only if bit i in value is set, and to zero otherwise. (The value it evaluates to is 2ⁱ, but who cares: it makes much more sense to compare against zero.)
The expression parts[${#parts[@]}]=foo is the fastest way to append foo to the array parts. It is equivalent to parts=("${parts[@]}" foo) if the existing indices in parts run consecutively starting from zero. (Expression ${#parts[@]} evaluates to the number of elements already in the parts array.)
After the end of the inner loop, we have the combination in array parts , so we just echo it:
ok, not that bad...
i can't understand how the expression $[ 1 << $# ] could mean "two to the power of the number of parameters to this function", but that's a detail...
i got lost in the "tricky bit" as well, but i guess if i didn't it wouldn't have been "tricky"...
but i got the rest, and i thank you so much for explaining...
i can't understand how the expression $[ 1 << $# ] could mean "two to the power of the number of parameters to this function"
Well, $# tells you the number of positional parameters. In a function, it is the number of parameters given to the funtion.
The $[ expression ] is the traditional Bash way to calculate arithmetic expressions.
A << B is an arithmetic expression, "shift A left B bits". Therefore, 1 << N always evaluates to two to the power of N. That is just basic binary arithmetic: in decimal systems, you can multiply by ten by adding zero to the right, but in binary systems, you multiply by only two when you add a zero to the right.
Quote:
Originally Posted by masavini
i got lost in the "tricky bit" as well
A & B is an arithmetic expression, "A AND B", where the AND is a binary operator.
(1 << N) & B is an arithmetic expression which means "B and the bit N". It is a very common way to test if the bit N is set in B . The only thing to remember about that is that the first (lowest) bit is numbered 0, not 1.
The && is a logical operator which can be read as "and, if that succeeded, then". If its left side is true, then and only then the right side is evaluated. If its left side is false, then the right side is ignored.
In the articles, ∧ is binary AND, written & in Bash. Binary OR is written as ∨ in the articles, | in Bash. If I ever discover a new operator, I shall name it 🐱 or maybe ̐😲 .
The $[ expression ] is the traditional Bash way to calculate arithmetic expressions.
Correction: $[..] is the old, deprecated way to calculate arithmetic expressions, and it's not really a good idea to use it any more. While it is still supported for now, it has, for the longest time, been supplanted by $((..)). The newer version also has the benefit of being posix-compliant.
And if you don't need the value to be expanded, bash also provides ((..)), which only evaluates the expression and spits out a success/failure exit code, such as used in the c-style for loop.
Finally, you can more cleanly append values to an array using "array+=( value )". Also, the -a option should really be used when declaring one.
Therefore the function should more properly be written like this:
Code:
combine() {
local limit=$(( 1 << $# ))
local -a parts args=( "$@" )
for (( value = 1; value < limit; value++ )); do
parts=()
for (( i = 0; i < $#; i++ )); do
(( ( 1 << i ) & value )) && parts+=( "${args[i]}" )
done
echo "${parts[@]}"
done
}
I can't say yet that I fully understand the use of the bitwise operators, however.
Correction: $[..] is the old, deprecated way to calculate arithmetic expressions, and it's not really a good idea to use it any more. While it is still supported for now, it has, for the longest time, been supplanted by $((..)). The newer version also has the benefit of being posix-compliant.
I had not noticed POSIX shells (including dash) support $((...)) arithmetic expressions.
I have accounts on machines with only old versions of Bash available, and unfortunately the Bash Reference Manual neglects to describe the version required for each feature. This is why I recommend against switching to every new Bash features, but wait for them to be standardized, or at least stabilized. It is very annoying when you have to work with older versions of Bash occasionally.
David the H. is absolutely correct: $(( ... )) is the standard way for evaluating arithmetic expressions. Thanks!
I can't say yet that I fully understand the use of the bitwise operators, however.
I think an in-depth explanation may be in order: if you didn't follow the logic, then I don't think many other members did either.
Let's start with the initial set of words: apple pie cake. Create a table of all the combinations. Put the first word rightmost, because we are used to putting the least significant digit to the right.
From the above, we can already tell that whenever we add a new word, the table will double in size. Therefore, the size of the table is two multiplied by itself the number of words times, i.e. two to the power of the number of words.
As you can see, you can use as many words as you have bits (binary digits), and it will work exactly the same.
Now consider the bit shift operation. If we write binary numbers as say 100₂ = 4, 11₂ = 3, 1₂ = 1, and so on, then the bit shift operation is easy to understand:
Therefore, $(( 1 << i )) is the same as 2ⁱ. In binary, it has only one 1, at the i'th digit (if you consider 0 as the first digit).
Also, you can see from the table size that for i words, you have 2ⁱ combinations (including the empty combination at zero).
The obvious approach then is to have an outer loop, looping from 1 (to skip the empty combination at zero) to 2ⁱ less one (for i words), and generate each combination based on the loop counter value. (The combination for 2ⁱ would be just a new word; check the table!)
The logic is very simple: If the first binary digit is 1, "apple" is part of the combination. If the second bit is 1, "pie" is part of the combination. If the third bit is 1, "cake" is part of the combination. And so on.
In binary arithmetic, value & mask is the binary AND operation. The result will have 1 in each place where both value and mask are 1, and 0 elsewhere. The order does not matter, both sides are treated equally: mask & value yields always the same value as value & mask.
We can use the bit shift operation for picking out a single digit. Remember, $(( 1 << bit )) is a binary number with only one 1, at the bit position (starting from zero).
Therefore, $(( value & ( 1 << bit ) )) evaluates to zero if bit bit is zero, and to nonzero (two to the power of bit) otherwise.
Using a temporary array to hold the words helps because in Bash array indices start at 0. If you have word=(apple pie cake), then ${word[2]} is cake.
So. If value is the number that describes the combination we want, we loop over each bit ((i = 1; i < the-number-of-words; i++ )), and the ${word[i]} is part of the combination if and only if $(( value & (1 << i) )) evaluates to nonzero.
The textual explanation is quite dry and perhaps a bit difficult to follow, but if you switch back to looking at the table, it should help a lot. Hopefully you can see the box drawing characters correctly; it really should be a very nice table!
The logic is very simple: If the first binary digit is 1, "apple" is part of the combination. If the second bit is 1, "pie" is part of the combination. If the third bit is 1, "cake" is part of the combination. And so on.
Thank you so much for the education, NA. The extra post did help to make it much more comprehensible.
To be honest, the real reason it wasn't clear was because I had only looked over your posts quickly at that point, and hadn't yet had the time to read through them carefully. It wasn't until tonight (Sunday) that I had the time to really study them.
Actually, your posts here were quite serendipitous. I was just looking at the ARITHMETIC EVALUATION section of the bash man page the other day and wondering exactly what the bitwise mode operators listed there do. And while I (believe I) understand the basic concept now, trying to switch my mind over to thinking binary-wise is still a bit of headache for me, as is exactly how all the related operators operate. It's going to take some more research and practice to really get them down.
BTW, you can look here for a rundown of which features were introduced in which version of bash:
I agree that it's frustrating when the documentation doesn't clearly list when a feature was introduced. The gawk users guide suffers from the same weakness.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.