Printing Armstrong numbers from 1 to 500. (C Programming)

ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.

Exclusive for LQ members, get up to 45% off per month. Click here for more info.

Yes, it looks like. I have no idea why. I have AMD FX-8320, it supports SSE4.1.
I have just tried to debug this and it looks like it used the real SSE, there was no emulation. But again, probably I missed something.

I was just thinking, I wonder if there's an alignment problem. I think Intel got rid of alignment requirements but AMD may not have.

I was thinking you can actually use 256 bit vectors in these Intel processors:

Quote:

Intel® Advanced Vector Extensions Types

The Intel® AVX intrinsic functions use three new C data types as operands, representing the new registers used as operands to the intrinsic functions. These are __m256, __m256d, and __m256i data types.

The __m256 data type is used to represent the contents of the extended SSE register, the YMM register, used by the Intel® AVX intrinsics. The __m256 data type can hold eight 32-bit floating-point values.

The __m256d data type can hold four 64-bit double precision floating-point values.

The __m256i data type can hold thirty-two 8-bit, sixteen 16-bit, eight 32-bit, or four 64-bit integer values.

The compiler aligns the __m256, __m256d, and __m256i local and global data to 32-byte boundaries on the stack. To align integer, float, or double arrays, use the __declspec(align) statement.

The Intel® AVX intrinsics also use Intel® SSE2 data types like __m128, __m128d, and __m128i for some operations. See Details of Intrinsics topic for more information.

So you can expand this to come up with much higher narcissistic numbers with no performance hit whatsoever when using SSE. I'll try to work on that later.

Let's try to get all Narcissistic number up to 9 digits long.

So let's say we have 9 digit number.

9^9 = 0x17179149, i.e. a 4-byte number (UInt32)

But 9 of them together would be 0xCFD41B91, another 4-byte number (UInt32), so we should be fine with anything that can hold 9 32-bit numbers.

Unfortunately, the __m256i data type can only hold 8 32-bit numbers, so we have to lower our goal to 8 digit Narcissistic numbers, unless we use more than one vector, but we're just learning here, so let's just use one.

Now, what can hold the sum of 8 32-bit numbers? 8 * 0x17179149 = 0xB8BC8A48, another UInt32, which is very serendipitous.

So we should be able to use basically the same code to determine eight 32-bit numbers in an __m256i data variable as we did with the 128-bit variable, right?

Just for some fun and definitely not in the same category as my C friends for speed, here is a solution in gawk that can take any number, but be warned it gets real slow real fast

Code:

#!/usr/bin/awk -f
BEGIN{
n = ARGV[1]
for(i = 1; i <= 10^n-1; i++){
split(sprintf("%0"n"d",i),nums,"")
for(j in nums){
x += nums[j]^n
if(x > i)
break
}
if(x == i)
print i
x = 0
}
}

And a Ruby one for some fun too:

Code:

#!/usr/bin/env ruby
n = ARGV[0].to_i
first = "0" * (n - 1) + "1"
last = "9" * n
(first..last).each do |num|
puts num if num.split(//).sum{|i| i.to_i ** n} == num.to_i
end

So after a bit of sleep, it dawned on me that part of the expense is the constant calculation of the powers, so this speeds things up by a factor of 5 on my machine:

Code:

#!/usr/bin/env ruby
n = ARGV[0].to_i
totals = [0,1]
(2..9).each{|i| totals<<i**n}
(1...10**n).each do |num|
x = 0
(1..n).each do |i|
x += totals[num % 10 ** i / 10 ** (i-1)]
break if x > num || num < 10 ** i
end
puts num if x == num
end

I also removed all string calculations and conversions (except the input from the command line)

After dabbling around with FORTRAN, I came up with this. I tried to make it more elegant, but kept seeming to hit a brick wall somewhere, so I just wanted to get something that worked, so here it is:

Code:

integer, dimension (1:3) :: v
integer :: a, b, c, value, vsum
do a = 0, 5, 1
do b = 0, 9, 1
do c = 0, 9, 1
v(1) = a
v(2) = b
v(3) = c
vsum = sum(v ** 3)
value = a * 100 + b * 10 + c
if ( value .eq. vsum) then
print *, value, " is an Armstrong number"
endif
enddo
enddo
enddo
END

The output is this:

Code:

0 is an Armstrong number
1 is an Armstrong number
153 is an Armstrong number
370 is an Armstrong number
371 is an Armstrong number
407 is an Armstrong number
RUN FINISHED; exit value 0; real time: 10ms; user: 0ms; system: 0ms

in this solution a^3 and b^3 calculated several times without any benefits, so something like this will definitely speed up:

Code:

do a = 0, 5, 1
# calculate a^3
do b = 0, 9, 1
# calculate b^3
do c = 0, 9, 1
# calculate c^3
vsum =
value =
if ( value .eq. vsum) then
print *, value, " is an Armstrong number"
endif

and, actually, using an array instead of a, b, c may allow to use any length, but need to handle the power (which is not really hard, just need to set it dynamically too)

in this solution a^3 and b^3 calculated several times without any benefits, so something like this will definitely speed up:
and, actually, using an array instead of a, b, c may allow to use any length, but need to handle the power (which is not really hard, just need to set it dynamically too)

Oh, yeah, there's no purposeful optimization (other than what the compiler does itself) in here.

But FORTRAN does have the built-in '**' operator that can be used in vectors, so can raise each member of an array to a power in one line (which the modern compilers use SSE commands for so it's one operation anyway).

The problem I ran into was trying to make it flexible to go to any length, i.e. 999999999 or something.

But that seemed elusive, always giving me weird errors. As I've said, I have almost zero experience with FORTRAN except reading one program and rewriting it in C (and everyone should know reading is much easier than writing!)

If you know FORTRAN, please be my guest to try to improve the code!

Edit: Also this code goes to 599, not 500, not that it really matters because there are no Armstrong numbers between 500 and 599. I imagine there's a way in FORTRAN to break out of the loops once the value gets to 501.

Edit again: Just to be clear, in modern processors with vector processing, it doesn't matter whether you're raising one digit to a power or three (or many more). It still does it in one instruction cycle (or at least the number of instruction cycles the power is, I guess most processors don't have a general power instruction, or especially for larger exponents, you [or the compiler] can use logarithmic math like log() and exp() [ x^y = exp(log(x) * y) ] for faster results).

But there are ways to prune the data instructions intelligently, so it doesn't go down paths that obviously won't lead to a number that fits the parameters (here, an Armstrong number).

Still, the time gives 0ms for both usr and sys, so it's hard to improve on that. You'd have to run it many hundreds of thousands of times or even millions to get numbers you can really differentiate.

in this solution a^3 and b^3 calculated several times without any benefits, so something like this will definitely speed up:

Code:

do a = 0, 5, 1
# calculate a^3
do b = 0, 9, 1
# calculate b^3
do c = 0, 9, 1
# calculate c^3
vsum =
value =
if ( value .eq. vsum) then
print *, value, " is an Armstrong number"
endif

Allow me...

Many of the programs in this thread take the approach of starting with a large number and breaking it down to individual digits by integer division and modulo. An alternative approach is to start with separate digits and make up the large number by adding multiples of powers of ten. This turns out to be very fast if you exploit the fact that each multiple of each power of ten only needs to be calculated once, and likewise each cube (or other power) only needs to be calculated once. The nested loop structure involves no multiplications or divisions, only array lookups, additions, and a comparison.

Here is a first try, before further optimization:

Code:

// armstrong3.c
// finds Armstrong numbers with 3 digits
// May be compiled by:
// gcc -lm -o armstrong3 armstrong3.c
#include <stdio.h>
#include <math.h>
#include <malloc.h>
int main()
{
long i, j;
long d0, d1, d2; // the digits
// Allocate memory for tables
long *b1 = (long *) malloc(10*sizeof(long));
long *b2 = (long *) malloc(10*sizeof(long));
long *px = (long *) malloc(10*sizeof(long));
// fill the tables
for(i=0;i<10;++i)
{
b1[i] = i*10; // array b1 is digit*base**1
b2[i] = i*100; // array b2 is digit*base**2
px[i] = pow(i,3); // array px is digit**power
}
printf("Armstrong numbers:\n");
// start search
for(d2=0;d2<10;++d2)
for(d1=0;d1<10;++d1)
for(d0=0;d0<10;++d0)
if(b2[d2]+b1[d1]+d0==px[d2]+px[d1]+px[d0])
printf("%ld%ld%ld\n", d2, d1, d0);
// now do it a million times over, in order to estimate the computation time
j = 0;
for(i=0;i<1000000;++i)
for(d2=0;d2<10;++d2)
for(d1=0;d1<10;++d1)
for(d0=0;d0<10;++d0)
if(b2[d2]+b1[d1]+d0==px[d2]+px[d1]+px[d0])
++j; // needed to keep the compiler honest
printf("Armstrong numbers counted in 1000000 rounds: %ld\n", j);
// free allocated memory
free(b1);
free(b2);
free(px);
return 0;
}

When I run it in on an Intel Core2 Duo 2.20GHz (executing in a single thread), this is the time report:

Code:

$ time ./armstrong3
Armstrong numbers:
000
001
153
370
371
407
Armstrong numbers counted in 1000000 rounds: 6000000
real 0m9.338s
user 0m9.328s
sys 0m0.000s

Which means that on average each pass through the innermost loop takes about 9 nanoseconds,

... or each scan of all numbers from 0 to 999 is completed in about 9 microseconds.

To be continued...

Last edited by Beryllos; 06-26-2017 at 11:23 PM.
Reason: Giving credit to pan64

I extended the above method to 9-digit Armstrong numbers involving 9th powers of the digits:

Code:

// armstrong9.c
// finds Armstrong numbers with 9 digits and 9th power
// May be compiled by:
// gcc -lm -o armstrong9 armstrong9.c
#include <stdio.h>
#include <math.h>
#include <malloc.h>
int main()
{
long i, j;
long d0, d1, d2, d3, d4, d5, d6, d7, d8; // the digits
// Allocate memory for tables
long *b1 = (long *) malloc(10*sizeof(long));
long *b2 = (long *) malloc(10*sizeof(long));
long *b3 = (long *) malloc(10*sizeof(long));
long *b4 = (long *) malloc(10*sizeof(long));
long *b5 = (long *) malloc(10*sizeof(long));
long *b6 = (long *) malloc(10*sizeof(long));
long *b7 = (long *) malloc(10*sizeof(long));
long *b8 = (long *) malloc(10*sizeof(long));
long *px = (long *) malloc(10*sizeof(long));
// fill tables
for(i=0;i<10;++i)
{
b1[i] = i*10; // b1 is digit*base**1
b2[i] = i*100; // b2 is digit*base**2
b3[i] = i*1000; // and so on
b4[i] = i*10000;
b5[i] = i*100000;
b6[i] = i*1000000;
b7[i] = i*10000000;
b8[i] = i*100000000;
px[i] = pow(i,9); // px is digit**power
}
printf("Armstrong numbers:\n");
// start search
for(d8=0;d8<10;++d8)
for(d7=0;d7<10;++d7)
for(d6=0;d6<10;++d6)
for(d5=0;d5<10;++d5)
for(d4=0;d4<10;++d4)
for(d3=0;d3<10;++d3)
for(d2=0;d2<10;++d2)
for(d1=0;d1<10;++d1)
for(d0=0;d0<10;++d0)
if(b8[d8]+b7[d7]+b6[d6]+b5[d5]+b4[d4]+b3[d3]+b2[d2]+b1[d1]+d0==px[d8]+px[d7]+px[d6]+px[d5]+px[d4]+px[d3]+px[d2]+px[d1]+px[d0])
printf("%ld%ld%ld%ld%ld%ld%ld%ld%ld\n", d8, d7, d6, d5, d4, d3, d2, d1, d0);
// free allocated memory
free(b1);
free(b2);
free(b3);
free(b4);
free(b5);
free(b6);
free(b7);
free(b8);
free(px);
return 0;
}

It tests all 10^9 combinations of the 9 digits in about 26 seconds. Not bad, but we can do much better. Consider the comparison statement in the above program:

It involves 17 lookups and 16 additions. Most of the time, most digits are unchanged from the previous pass. We needn't look up and add all the digits every time. In fact, in each of the nested loops, we only have to look up one multiple of a power of ten, look up one 9th power of a digit, and perform two additions. This is the program that does that:

Code:

// armstrong9s.c
// finds Armstrong numbers with 9 digits and 9th power
// using fewer lookups and additions in the main loop structure
// May be compiled by:
// gcc -lm -o armstrong9s armstrong9s.c
#include <stdio.h>
#include <math.h>
#include <malloc.h>
int main()
{
long i, j;
long d0, d1, d2, d3, d4, d5, d6, d7, d8; // the digits
long bsum1, bsum2, bsum3, bsum4, bsum5, bsum6, bsum7, bsum8; // intermediate sums
long psum1, psum2, psum3, psum4, psum5, psum6, psum7, psum8; // intermediate sums
// Allocate memory for tables
long *b1 = (long *) malloc(10*sizeof(long));
long *b2 = (long *) malloc(10*sizeof(long));
long *b3 = (long *) malloc(10*sizeof(long));
long *b4 = (long *) malloc(10*sizeof(long));
long *b5 = (long *) malloc(10*sizeof(long));
long *b6 = (long *) malloc(10*sizeof(long));
long *b7 = (long *) malloc(10*sizeof(long));
long *b8 = (long *) malloc(10*sizeof(long));
long *px = (long *) malloc(10*sizeof(long));
// fill tables
for(i=0;i<10;++i)
{
b1[i] = i*10; // b1 is digit*base**1
b2[i] = i*100; // b2 is digit*base**2
b3[i] = i*1000; // and so on
b4[i] = i*10000;
b5[i] = i*100000;
b6[i] = i*1000000;
b7[i] = i*10000000;
b8[i] = i*100000000;
px[i] = pow(i,9); // px is digit**power
}
printf("Armstrong numbers:\n");
// start search
for(d8=0;d8<10;++d8)
{
bsum8 = b8[d8]; // update place value of digit d8
psum8 = px[d8]; // update n-th power of digit d8
for(d7=0;d7<10;++d7)
{
bsum7 = bsum8+b7[d7]; // add place value of digit d7
psum7 = psum8+px[d7]; // add n-th power of digit d7
for(d6=0;d6<10;++d6)
{
bsum6 = bsum7+b6[d6]; // add place value of digit d6
psum6 = psum7+px[d6]; // add n-th power of digit d6
for(d5=0;d5<10;++d5)
{
bsum5 = bsum6+b5[d5]; // and so on
psum5 = psum6+px[d5];
for(d4=0;d4<10;++d4)
{
bsum4 = bsum5+b4[d4];
psum4 = psum5+px[d4];
for(d3=0;d3<10;++d3)
{
bsum3 = bsum4+b3[d3];
psum3 = psum4+px[d3];
for(d2=0;d2<10;++d2)
{
bsum2 = bsum3+b2[d2];
psum2 = psum3+px[d2];
for(d1=0;d1<10;++d1)
{
bsum1 = bsum2+b1[d1];
psum1 = psum2+px[d1];
for(d0=0;d0<10;++d0)
if(bsum1+d0==psum1+px[d0]) // add last digit
printf("%ld%ld%ld%ld%ld%ld%ld%ld%ld\n", d8, d7, d6, d5, d4, d3, d2, d1, d0);
}
}
}
}
}
}
}
}
// free allocated memory
free(b1);
free(b2);
free(b3);
free(b4);
free(b5);
free(b6);
free(b7);
free(b8);
free(px);
return 0;
}

and the results:

Code:

$ time ./armstrong9s
Armstrong numbers:
000000000
000000001
146511208
472335975
534494836
912985153
real 0m5.274s
user 0m5.272s
sys 0m0.000s

That's roughly 5 nanoseconds per inner loop cycle.

There's two basic approaches -- try to calculate every Armstrong number between 1 and 500 or check every number between 1 and 500 to see if it's an Armstrong number. I assume this has already been made clear.

Which is faster would need testing.

There are other approaches. Here is one which I found as I was reading about Armstrong numbers:

1. Pick a random number in the range (for example, between 100 and 999).
2. Calculate the sum of the third (or other) power of digits.
3. If the sum is equal to the original number, you have found an Armstrong number.
4. Now it gets interesting: If the sum is not equal to the original number, use it as the next starting number. In other words, calculate the sum of the power of its digits and iterate.
5. Break when the sum goes out of range, enters a cycle, or reaches an Armstrong number.

Examples of cycles for digits to the third power: 136, 244, 136, ... 160, 217, 352, 160, ...

I'm not sure that this method is guaranteed to save computation time, but there is at least one reason why it can: All permutations of digits yield the same sum. For example, 145, 154, 415, 451, 514, and 541 all yield 190, which leads to 730 and finally the Armstrong number 370. In addition, some of the numbers in that permutation can be obtained by other numbers with their permutations. For example, 145 comes from 809, 890, 908, and 980. So it is that Armstrong numbers descend from a "tree" of different possible ancestors. Some of these trees are large, some not so large, like that of 407, which can only come from its own permutations.

I have been playing around with this. It appears that it has an advantage for finding longer Armstrong numbers, where the number of permutations is much larger.

I have been coding it in Python, which allows arbitrarily large integers and runs with multiple cores on my machine.

Last edited by Beryllos; 06-28-2017 at 06:15 PM.
Reason: small errors

There are other approaches. Here is one which I found as I was reading about Armstrong numbers:

1. Pick a random number in the range (for example, between 100 and 999).
2. Calculate the sum of the third (or other) power of digits.
3. If the sum is equal to the original number, you have found an Armstrong number.
4. Now it gets interesting: If the sum is not equal to the original number, use it as the next starting number. In other words, calculate the sum of the power of its digits and iterate.
5. Break when the sum goes out of range, enters a cycle, or reaches an Armstrong number.

Examples of cycles for digits to the third power: 136, 244, 136, ... 160, 217, 352, 160, ...

I'm not sure that this method is guaranteed to save computation time, but there is at least one reason why it can: All permutations of digits yield the same sum. For example, 145, 154, 415, 451, 514, and 541 all yield 190, which leads to 730 and finally the Armstrong number 370. In addition, some of the numbers in that permutation can be obtained by other numbers with their permutations. For example, 145 comes from 809, 890, 908, and 980. So it is that Armstrong numbers descend from a "tree" of different possible ancestors. Some of these trees are large, some not so large, like that of 407, which can only come from its own permutations.

I have been playing around with this. It appears that it has an advantage for finding longer Armstrong numbers, where the number of permutations is much larger.

I have been coding it in Python, which allows arbitrarily large integers and runs with multiple cores on my machine.

This is very interesting (really). but according to everything I've read the only way to determine Narcissistic Numbers (of which Armstrong Numbers are just a subset of three-digit numbers) is by brute force, i.e. testing each number to see if it is one. Even if your algorithm works, how can you make sure that you've found all of the Narcissistic numbers within a range?

Here is another idea ... if we look at the 1000 numbers with 3 digit combinations, there are only 220 effective combinations that need to be tested as all others are repeats, as per Beryllos' example
of 145, 154, 415, 451, 514, and 541 all yielding the same solution.

The nice thing on working from this direction is the growth of numbers to be tested as the power increases is considerably less than the total number added:

Code:

10 ^ 3 = 220 to test
10 ^ 4 = 715 to test
10 ^ 5 = 2002 to test
10 ^ 6 = 5005 to test
10 ^ 7 = 11440 to test

Of course the problem with the above scenario is trying to come up with what numbers are in the shortlist to be checked against.
As an idea, I tried this in Ruby and just getting the number above took ages, as you can see I never got to 8 or 9

$ time ./armstrong9s
Armstrong numbers:
000000000
000000001
146511208
472335975
534494836
912985153
real 0m5.274s
user 0m5.272s
sys 0m0.000s

I'm sorry, but that is incorrect. The leading zeroes cannot be taken into account, they modify the "length", the number of digits. The valid result is:

@Beryllos -- You really don't have to explicitly free() all that memory before you quit, when the process ends, the kernel will clean it up for you. You only have to free() memory that you malloc() when you're running a program that continues to run indefinitely and you want that memory back.

LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.