LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-14-2017, 05:42 AM   #46
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309

Quote:
Originally Posted by Laserbeak View Post
OMG, I actually did it with SSE!!! This is my first vector program (with some help from reading a lot of things and one post to another site):


It'd be great if you can test its speed!
Sorry guy, how should I compile/build it?
 
Old 06-14-2017, 11:07 AM   #47
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by pan64 View Post
Sorry guy, how should I compile/build it?
As long as you have an advanced enough processor and compiler, it should just build. Have you tried and gotten an error? If so, please post it.
 
Old 06-14-2017, 11:23 AM   #48
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Code:
user@host:/tmp/armstrong$ gcc -msse4.1 -o a a.c
a.c: In function ‘main’:
a.c:20:14: error: incompatible type for argument 1 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
a.c:20:14: error: incompatible type for argument 2 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
a.c:21:14: error: incompatible type for argument 1 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
a.c:21:14: error: incompatible type for argument 2 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
user@host:/tmp/armstrong$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2

AMD FX-8320
 
Old 06-14-2017, 11:25 AM   #49
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
I looped over it 1000x from within C (not rerunning the program), not printing the results (I did print the results once and they were all correct) and got:

Code:
real	0m0.005s
user	0m0.001s
sys	0m0.001s
10000x:
Code:
real	0m0.005s
user	0m0.001s
sys	0m0.002s
100000x:
Code:
real	0m0.005s
user	0m0.001s
sys	0m0.001s
1000000x:
Code:
real	0m0.004s
user	0m0.001s
sys	0m0.001s
So basically the documentation is true, SSE commands are basically NOOPs.

This the 1,000,000 time code:

Code:
#include <stdio.h>
#include <smmintrin.h>  // SSE 4.1

__m128i vcube(const __m128i v)
{
    return _mm_mullo_epi32(v, _mm_mullo_epi32(v, v));
}


int main(int argc, const char * argv[]) {
    for (unsigned int y = 0; y < 1000000; y++) {
        for (unsigned int i = 1; i <= 500; i++) {
            unsigned int firstDigit = i / 100;
            unsigned int secondDigit = (i - firstDigit * 100) / 10;
            unsigned int thirdDigit = (i - firstDigit * 100 - secondDigit * 10);
            
            __m128i v = _mm_setr_epi32(0, firstDigit, secondDigit, thirdDigit);
            __m128i v3 = vcube(v);
            
            v3 = _mm_hadd_ps(v3,v3);
            v3 = _mm_hadd_ps(v3,v3);

       /*     if (_mm_extract_epi32(v3, 0) == i)
                printf ("%d is an Armstrong number\n", i);
        */
        }
    }
    return 0;
}
 
Old 06-14-2017, 11:30 AM   #50
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by pan64 View Post
Code:
user@host:/tmp/armstrong$ gcc -msse4.1 -o a a.c
a.c: In function ‘main’:
a.c:20:14: error: incompatible type for argument 1 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
a.c:20:14: error: incompatible type for argument 2 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
a.c:21:14: error: incompatible type for argument 1 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
a.c:21:14: error: incompatible type for argument 2 of ‘_mm_hadd_ps’
         v3 = _mm_hadd_ps(v3,v3);
              ^
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/tmmintrin.h:31:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/4.9/include/smmintrin.h:32,
                 from a.c:2:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/pmmintrin.h:56:1: note: expected ‘__m128’ but argument is of type ‘__m128i’
 _mm_hadd_ps (__m128 __X, __m128 __Y)
 ^
user@host:/tmp/armstrong$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2

AMD FX-8320
I don't really know... the one thing that sticks out is you have AMD and I know Intel's implementation is slightly different. Like I said this was my first attempt at SSE instructions so maybe someone else can help. But it certainly compiles file on my Mac:

Code:
________________________________________________________________________________
| ~/Library/Mobile Documents/com~apple~CloudDocs/AppDevel/armstrongsse/armstrongsse @ SpocksBrain (djohnsto) 
| => cat main.c
//
//  main.c
//  armstrongsse
//
//  Created by Douglas Johnston on 6/13/17.
//  Copyright © 2017 Douglas Johnston. All rights reserved.
//

#include <stdio.h>
#include <smmintrin.h>  // SSE 4.1

__m128i vcube(const __m128i v)
{
    return _mm_mullo_epi32(v, _mm_mullo_epi32(v, v));
}


int main(int argc, const char * argv[]) {
    for (unsigned int y = 0; y < 1000000; y++) {
        for (unsigned int i = 1; i <= 500; i++) {
            unsigned int firstDigit = i / 100;
            unsigned int secondDigit = (i - firstDigit * 100) / 10;
            unsigned int thirdDigit = (i - firstDigit * 100 - secondDigit * 10);
            
            __m128i v = _mm_setr_epi32(0, firstDigit, secondDigit, thirdDigit);
            __m128i v3 = vcube(v);
            
            v3 = _mm_hadd_ps(v3,v3);
            v3 = _mm_hadd_ps(v3,v3);

       /*     if (_mm_extract_epi32(v3, 0) == i)
                printf ("%d is an Armstrong number\n", i);
        */
        }
    }
    return 0;
}
________________________________________________________________________________
| ~/Library/Mobile Documents/com~apple~CloudDocs/AppDevel/armstrongsse/armstrongsse @ SpocksBrain (djohnsto) 
| => clang -O3 -o armtestsse main.c
________________________________________________________________________________
| ~/Library/Mobile Documents/com~apple~CloudDocs/AppDevel/armstrongsse/armstrongsse @ SpocksBrain (djohnsto) 
| => time ./armtestsse

real	0m0.005s
user	0m0.001s
sys	0m0.001s
________________________________________________________________________________
| ~/Library/Mobile Documents/com~apple~CloudDocs/AppDevel/armstrongsse/armstrongsse @ SpocksBrain (djohnsto) 
| =>

Forget that copyright s***, Xcode adds that automatically.
 
Old 06-14-2017, 11:45 AM   #51
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
I changed it to still test, but got the really the same numbers:

Code:
#include <stdio.h>
#include <smmintrin.h>  // SSE 4.1
#include <stdbool.h>

__m128i vcube(const __m128i v)
{
    return _mm_mullo_epi32(v, _mm_mullo_epi32(v, v));
}


int main(int argc, const char * argv[]) {
    for (unsigned int y = 0; y < 1000000; y++) {
        for (unsigned int i = 1; i <= 500; i++) {
            bool isArmstrong = false;
            unsigned int firstDigit = i / 100;
            unsigned int secondDigit = (i - firstDigit * 100) / 10;
            unsigned int thirdDigit = (i - firstDigit * 100 - secondDigit * 10);
            
            __m128i v = _mm_setr_epi32(0, firstDigit, secondDigit, thirdDigit);
            __m128i v3 = vcube(v);
            
            v3 = _mm_hadd_ps(v3,v3);
            v3 = _mm_hadd_ps(v3,v3);

            if (_mm_extract_epi32(v3, 0) == i)
        
                isArmstrong = true;
        }
    }
    return 0;
}
 
Old 06-14-2017, 11:59 AM   #52
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
if I remember well it happened on an intel based laptop too, but i will try to check it tomorrow. From the other hand this CPU supports SSE4.1.
Code:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold vmmcall bmi1
it looks like type mismatch/incompatibility:
_mm_hadd_ps returns __m128, not __m128i.
 
Old 06-14-2017, 12:22 PM   #53
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Let me try it in Solaris... rebooting...
 
Old 06-14-2017, 12:57 PM   #54
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
I ran into some compilation problems in Solaris too. I was able to fix them and this printed out the correct answers... hopefully it'll work on your compiler:

Code:
#include <stdio.h>
#include <smmintrin.h>  // SSE 4.1

__m128i vcube(const __m128i v)
{
    return _mm_mullo_epi32(v, _mm_mullo_epi32(v, v));
}


int main(int argc, const char * argv[]) {
    for (unsigned int i = 1; i <= 500; i++) {
        unsigned int firstDigit = i / 100;
        unsigned int secondDigit = (i - firstDigit * 100) / 10;
        unsigned int thirdDigit = (i - firstDigit * 100 - secondDigit * 10);

        __m128i v = _mm_setr_epi32(0, firstDigit, secondDigit, thirdDigit);
        __m128 v3 = (__m128) vcube(v);

        v3 = _mm_hadd_ps(v3, v3);
        v3 = _mm_hadd_ps(v3, v3);

        if (_mm_extract_epi32((__m128i) v3, 0) == i)
            printf ("%d is an Armstrong number\n", i);
    }
    return 0;
}
I used this to compile:

Code:
gcc -std=c99 -m64 -msse4.1 -O3 -o testsse testsse.c
Also, the same code works unchanged on the Mac. It seemingly is just more forgiving when it comes to typing.

Last edited by Laserbeak; 06-14-2017 at 01:14 PM.
 
Old 06-14-2017, 01:14 PM   #55
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Code:
            normal   sse4.1
compiled   0.85    14.91
gcc -O3    0.31     9.00
if I did not miss anything.
 
Old 06-14-2017, 01:23 PM   #56
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by pan64 View Post
Code:
            normal   sse4.1
compiled   0.85    14.91
gcc -O3    0.31     9.00
if I did not miss anything.
So you're saying it's a lot SLOWER on your machine? Hmm... can anyone else confirm that? I get much faster rates.

What type of processor do you have? Maybe it's emulating these instructions in software, not executing them directly in hardware. My computer shows them to be much faster...

I don't mean to make this a d**k measuring contest, but it's definitely interesting why you seemingly got so different speed measurements.
 
Old 06-14-2017, 01:34 PM   #57
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Yes, it looks like. I have no idea why. I have AMD FX-8320, it supports SSE4.1.
I have just tried to debug this and it looks like it used the real SSE, there was no emulation. But again, probably I missed something.
 
Old 06-14-2017, 01:38 PM   #58
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by pan64 View Post
Yes, it looks like. I have no idea why. I have AMD FX-8320, it supports SSE4.1.
I have just tried to debug this and it looks like it used the real SSE, there was no emulation. But again, probably I missed something.
OK, well I have the Intel Core i7 2.3 GHz that came with the MacBook Pro (Retina, 15-inch, Late 2013). Maybe someone who knows more about this can shed some light on it.
 
Old 06-14-2017, 01:42 PM   #59
KenJackson
Member
 
Registered: Jul 2006
Location: Maryland, USA
Distribution: Fedora and others
Posts: 757

Rep: Reputation: 145Reputation: 145
Quote:
Originally Posted by Laserbeak View Post
Let me try it in Solaris... rebooting...
Real Solaris? Or one of the illumos distros?
 
Old 06-14-2017, 01:54 PM   #60
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by KenJackson View Post
Real Solaris? Or one of the illumos distros?
Real Solaris x86_64 11.3
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
printing line numbers for code bauer172uw Programming 3 04-13-2006 11:10 PM
C programming - sorting random numbers Gigantor Programming 8 12-05-2005 10:32 PM
Printing numbers from a text file dynamically mrobertson Programming 1 06-28-2005 08:19 AM
printing numbers without using semicolon vijeesh_ep Programming 18 09-08-2004 11:59 AM
printing line numbers? fisheromen1031 Programming 1 07-27-2004 02:19 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration