LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-26-2012, 09:40 PM   #46
foodown
Member
 
Registered: Jun 2009
Location: Texas
Distribution: Slackware
Posts: 611

Rep: Reputation: 222Reputation: 222Reputation: 222

Quote:
Originally Posted by ejspeiro View Post
Yes, I knew it was not there when I posted the reference, however, the reference stands. It is just a matter of taking a look at the actual book's glossary. Maybe in a Google Book's preview? Or at Amazon?
What kind of a-hole logic is that?

Why not just post the definition to the forum?
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 05-27-2012, 12:44 PM   #47
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora
Posts: 3,935
Blog Entries: 5

Rep: Reputation: Disabled
I'm going to have to agree with sundialsvcs. This is a beautiful thread. (In fact, it is the first time I have subscribed to a thread in over one year.) Please let's not destroy it with nit picking; and that is not directed at any one poster.

OP's question has been answered - and then some - in a most satisfactory fashion.
 
Old 05-28-2012, 10:58 AM   #48
dogpatch
Member
 
Registered: Nov 2005
Location: Central America
Distribution: Mepis, Android
Posts: 492
Blog Entries: 4

Rep: Reputation: 238Reputation: 238Reputation: 238
Quote:
Originally Posted by Sergei Steshenko View Post
So, "implementation of integer math for all numeric functions eliminates rounding errors associated with floating point" is a baseless statement - with long enough mantissa floating point numbers are no worse and may be even better than integers.

On 32 bit machine integers are typically 32 bits while doubles have 53 bits long mantissa (IIRC), so doubles are already better out of the box. And now 'gcc' supports already 'long double': http://en.wikipedia.org/wiki/Long_double .

And, of course, there is http://gmplib.org/ :
A typical Cobol shop would much more frequently issue reports that involved adding, subtracting and multiplying numeric fields that involved dollars and cents, that is, dollar amounts which are stored as integers with the two decimals included in the integer. Your example would be much less common, but could still be handled by standard Cobol.

Could you do the same in C or C++? Sure, but you would either have to code your own library functions to automatically adjust for the implied decimals, or implement some other non-standard library extensions. My original point was that such common business usages are already implemented in standard Cobol resulting in much more simple and intuitive coding.
 
Old 05-28-2012, 11:18 AM   #49
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by dogpatch View Post
A typical Cobol shop would much more frequently issue reports that involved adding, subtracting and multiplying numeric fields that involved dollars and cents, that is, dollar amounts which are stored as integers with the two decimals included in the integer. Your example would be much less common, but could still be handled by standard Cobol.

Could you do the same in C or C++? Sure, but you would either have to code your own library functions to automatically adjust for the implied decimals, or implement some other non-standard library extensions. My original point was that such common business usages are already implemented in standard Cobol resulting in much more simple and intuitive coding.
In C++ it's just a matter of appropriate classes and operator overloading. ...

http://gmplib.org/ :

"
Code:
GMP function categories

There are several categories of functions in GMP:

    High-level signed integer arithmetic functions (mpz). There are about 140 arithmetic and logic functions in this category.
    High-level rational arithmetic functions (mpq). This category consists of about 35 functions, but all signed integer arithmetic functions can be used too, by applying them to the numerator and denominator separately.
    High-level floating-point arithmetic functions (mpf). This is the GMP function category to use if the C type `double' doesn't give enough precision for an application. There are about 65 functions in this category.
    C++ class based interface to all of the above. (The C functions and types can of course be used directly from C++ too.)
.

So, I don't have to code anything - the library already exists. Such libraries exist for various languages, for example, for Perl:

http://search.cpan.org/~flora/bignum-0.29/lib/bignum.pm

http://search.cpan.org/~flora/bignum-0.29/lib/bigint.pm
http://search.cpan.org/~pjacklam/Mat...Math/BigInt.pm

http://search.cpan.org/~flora/bignum-0.29/lib/bigrat.pm
http://search.cpan.org/~pjacklam/Mat...Math/BigRat.pm

http://search.cpan.org/~pjacklam/Mat...th/BigFloat.pm
.

My point is that having math implemented on top of integers by itself does not improve accuracy.

...

Using BCDs ( http://en.wikipedia.org/wiki/Binary-coded_decimal ) is a matter of "taste" - IBM mainframes used to have them.

Again, using BCDs alone does not improve accuracy.
 
Old 05-28-2012, 11:37 AM   #50
dogpatch
Member
 
Registered: Nov 2005
Location: Central America
Distribution: Mepis, Android
Posts: 492
Blog Entries: 4

Rep: Reputation: 238Reputation: 238Reputation: 238
Sergei Steshenko: My original post to this thread was in response to the (off-topic) question, why is Cobol still out there? You have not refuted my stated response to that question. You have merely asserted that you could, as a C++ guru, do the same things that a rookie (or dinosaur) Cobol programmer could do with Cobol, without having to attain your level of expertise nor acquaintance with modern C++ library functions. So, do your C++ coding, and acknowledge that there may still be a place for Cobol as well.
 
Old 05-28-2012, 12:10 PM   #51
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by dogpatch View Post
Sergei Steshenko: My original post to this thread was in response to the (off-topic) question, why is Cobol still out there? You have not refuted my stated response to that question. You have merely asserted that you could, as a C++ guru, do the same things that a rookie (or dinosaur) Cobol programmer could do with Cobol, without having to attain your level of expertise nor acquaintance with modern C++ library functions. So, do your C++ coding, and acknowledge that there may still be a place for Cobol as well.
My point is that stating that implementing math using integers improves accuracy is nonsense - regardless of language. Accuracy is lost either due to the fact we have insufficient number of bits for mantissa or due the fact that mathematically mantissa should use inifinite number of bits (periodic fraction), but in reality number of available bits is finite.

I said nothing about Cobol specifically.
 
Old 05-28-2012, 01:03 PM   #52
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by Sergei Steshenko View Post
My point is that stating that implementing math using integers improves accuracy is nonsense - regardless of language.
That is correct.

The reason integers -- more correctly, fixed-point arithmetic -- is used in some financial software, is that at the designed precision, over all possible values barring overflows and underflows, all transactions (signed sums) are exact. Typical cases are two or four decimal digits; i.e. 100 or 10000 correspond to 1.0, depending on what the appropriate laws/statutes/customs dictate, and whether the application calculates daily interests or only over longer periods, for example monthly.

With floating-point values, only a very limited range provides exact transactions; basically all exponent bits are wasted. Also, IEEE-754 was officially adopted in 1985, and before that, the rules wrt. rounding and LSB error when using floating-point values depended on the hardware. The key point to note is that the range where all operations are exact when using floating-point values is only a very small fraction of the range of possible floating-point values, and the rules wrt. rounding and precision have only been standardized for about 25 years. (Obviously, the range of possible floating-point values is astronomical considering financial needs, but that is beside the point.)

The myth of integers being more precise than floating-point values is directly related. It is not about precision, really, but about having proof that all possible transactions (signed sums) are exact. Overflow is obviously possible with fixed-point arithmetic, but all CPU archtectures have overflow/carry flags, and it is trivial to check even without overflow flag support and to reject such transactions. With floating-point values, it is a lot more difficult to check if the transaction is accounted for exactly, or if rounding occurred while summing. If all discrepancies are flagged, then the rounding would be a complete disaster.

Last edited by Nominal Animal; 05-28-2012 at 01:07 PM.
 
Old 05-29-2012, 07:48 AM   #53
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 952

Rep: Reputation: 217Reputation: 217Reputation: 217
AFAIR COBOL had a "Rounded" clause that rounded according to the format (pre-defined as a string in COBOL). Whenever accuracy was needed, we used ROUNDED with the COMPUTE statement.

The ROUNDED function didn't need to have a precision because the PIC clause defined it.

OK
 
Old 05-29-2012, 08:08 AM   #54
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by Sergei Steshenko View Post
My point is that stating that implementing math using integers improves accuracy is nonsense - regardless of language. Accuracy is lost either due to the fact we have insufficient number of bits for mantissa or due the fact that mathematically mantissa should use inifinite number of bits (periodic fraction), but in reality number of available bits is finite.
You are missing the detail that "fixed point" math typically scales by a fixed power of TEN, while floating point scales by a variable power of TWO.

In some fields, the answer you get when rounding only after scaling by a fixed power of ten is correct by definition, while a different rounding is incorrect.

When there is some other inherent advantage of floating point, it is obviously possible to pre scale the floating point values by a fixed power of ten, just as you pre scale integer values by a fixed power of ten in traditional fixed point operations. Then the comparison (made earlier in this thread) between the integer and the mantissa of the floating point, would be fair.

Traditionally, floating point is more expensive than integer. So if you have to pre scale either to get "correct" results, then you might as well pre scale integer. But if you have other reasons to choose floating point, it is important to be aware you can pre scale that as well. For example, if you need 48 bits of fixed point precision in an architecture with 48 bit floating point mantissas and 32 bit integers, going to double wide integers might be more trouble than going to floating point.
 
Old 05-29-2012, 08:30 AM   #55
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 24,515

Rep: Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064Reputation: 8064
about multiplying: http://stackoverflow.com/questions/3...-ia32-assembly
so if you want to implement a specific function and you need to be really fast you need to use assembler, c is not good enough. But in most cases c can be really powerful and efficient.
 
Old 05-29-2012, 09:06 AM   #56
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by johnsfine View Post
You are missing the detail that "fixed point" math typically scales by a fixed power of TEN, while floating point scales by a variable power of TWO.

In some fields, the answer you get when rounding only after scaling by a fixed power of ten is correct by definition, while a different rounding is incorrect.

When there is some other inherent advantage of floating point, it is obviously possible to pre scale the floating point values by a fixed power of ten, just as you pre scale integer values by a fixed power of ten in traditional fixed point operations. Then the comparison (made earlier in this thread) between the integer and the mantissa of the floating point, would be fair.

Traditionally, floating point is more expensive than integer. So if you have to pre scale either to get "correct" results, then you might as well pre scale integer. But if you have other reasons to choose floating point, it is important to be aware you can pre scale that as well. For example, if you need 48 bits of fixed point precision in an architecture with 48 bit floating point mantissas and 32 bit integers, going to double wide integers might be more trouble than going to floating point.
I am not missing the detail, in fact, I was talking about BCDs (Binary Coded Decimals).

And I've been running a lot of numeric FP code - some of it requires prescaling.
 
Old 05-29-2012, 10:38 AM   #57
orgcandman
Member
 
Registered: May 2002
Location: new hampshire
Distribution: Fedora, RHEL
Posts: 600

Rep: Reputation: 110Reputation: 110
Quote:
Originally Posted by pan64 View Post
about multiplying: http://stackoverflow.com/questions/3...-ia32-assembly
so if you want to implement a specific function and you need to be really fast you need to use assembler, c is not good enough. But in most cases c can be really powerful and efficient.
This is true if you know more than the optimizer (hint: on a large project, this becomes more difficult). For trivial software, the compiler writer has probably done the right thing (doubtless, you can find any number of ways of completing a task that you might deem 'more optimal,' but without actually benchmarking it's all programming by voodoo).

------------

I don't know how to put in a horizontal rule - and I've never really been a fan of double posting, so I'm going to add something here on the C vs. C++ vs. asm vs. whatever.

I tend to approach things very methodically. Any language comparison must first start with the language guarantees and anti-guarantees. It is important to understand that any intrinsics, optimizations, pragma flags, and other compiler or platform specific switches are NOT guaranteed by the language. That means that if you are trying to compare languages care must be taken to choose a correct context.

As an example, I'm going to take the following:
Quote:
Originally Posted by Nominal Animal View Post
I am asking for a concrete case where C++ yields faster execution than what I can write in C. You answered with std::valarray, which I consider to be a rather tricky answer, close to trolling. Yes, currently the atomic primitives related to vector operations have not been standardized in C yet, but they are provided by all C compilers I use.
Here, you have admitted in the bolded part that std::valarray DOES provide something not provided by the C standard, which means that any fair comparison between C and C++ on vectorized maths must now be relegated to the following:

Code:
1. Code *someone* writes as a perfect language implementation
2. A library (whether compiled or not) provided by *someone*
3. Non-standard extensions
You can see that 3 should not be used as either point or counter-point. Anyone quoting non-standard extensions as the BASIS for a solution misses the fact that NSE goes both ways. IE: any example you can give in language X using NSE, I can give a counter example in language Y using NSE.

2 MAY or MAY NOT be a valid argument. Any library code which relies on NSE becomes part of 3. In this context, I choose reliance to mean it is inseparable from the NSE. As an example of what I mean, if there were a library called "SomeBigVectorizedMaths.h", but it REQUIRED that you use the GNU C compiler because it used GNU C extensions, I would say that fell under 3. However, if it could be used with ANY C compiler, and merely took advantage of NSE when they were present, I would say that does NOT fall under 3 - it is merely knowledgeable programming. It does not preclude me from using the Tiny C compiler, Microsoft's C compiler, or *insert name* C compiler.

1 is similar to 2. The idea behind 1 is the 'mythical' (I use that to mean we don't need actual working coded stuff, we accept that if relevant portions of the standards are quoted, the example could exist) code which solves the problem using ONLY GUARANTEED LANGUAGE PRIMITIVES. IE: this is the case where the code uses NO NSE ever.

When I gave the example of std::valarray, I was not trolling (and am offended that you would imply such, given my s:n ratio). The question posed was (paraphrased):
Quote:
Originally Posted by Nominal Animal View Post
I am asking for a concrete case where C++ yields faster execution than what I can write in C.
...
std::valarray (orgcandman) is irrelevant to this discussion: sure, it is useful, but I can provide the same functionality in C using my own code.
With this understanding and the framework I outlined above, I believe that std::valarray provides something that C lacks. I'm not going to try and say that in all cases C++ is superior to C. However, your question asks this, and I believe std::valarray represents this case (and is therefore why I picked it - making it quite relevant).

I can rely on std::valarray with ANY C++ compiler. It will always exist, and will always provide vectorized maths. So lets analyze it using either part 1 OR part 2 from above.

With part 1, I am not sure of any standardized C code which allows one to specify vectorized maths. If there is, I would be very interested. As it stands, the only way I know of to implement anything like a vectorized math is to build structs and loops. Note, I'm not talking about any NSE here. If there IS a way to do this in standard C, then they are equivalent.

Lets look at part 2 then, since this is potentially more interesting. We both agree that there exist for most C compilers, a suite of NSE intrinsics that will provide some basic vectorized math operations. This means that anyone who requires vectorized maths DOES have access, as a good bet. In the C++ case, this does NOT require studying the intrinsics - the compiler knows exactly how to use the NSE under the hood to generate the correct code. With C, it REQUIRES that someone take a lot of extra time crafting a solution to a problem that has been solved. This alone makes C++ far more efficient, programmer time wise. Code generation (and therefore execution time), they look the same (at least, under GNU C, the optimizer throws away the same code it will in C). The std::valarray has only a few interfaces which _could_ throw (and all are related to memory), and if we accept NSE, we have a good bet that we can remove most, if not all, of the performance bottlenecks traditionally cited against C++. So again, we have gained programmer time from C++.

The other advantages we gain from C++ are the type-safety, and code reuse. That is, if I want to reuse my vectorizing code for, say, int and float datatypes the compiler will autogenerate the code, rather than my cut/paste. Additionally, the correct call will always be made, meaning that I can confidently turn on -Wall -Werror (something that traditional C-only projects are scared to do when combined with optimizers that perform code path analysis) from the beginning. This is not specific to std::valarray, but again the example was chosen for a purpose. The drawback from this approach is the code-space bloat associated.

In all of this, I STILL believe that a traditional C language has a valuable purpose, especially when it comes to constrained memory systems. The fact that C doesn't specify certain things should be seen as a BOON to the language, NOT a drawback. I believe that there are plenty of valid places to use either. As I've always said, specific cases are important as there exists no general case software.

Anyway, this is an attempt to elucidate my position. As for your comment here:

Quote:
Originally Posted by Nominal Animal View Post
Does it yield the optimum results? I do not think so.
...
As far as I know, apply() is limited to (one slice of) a single array, it cannot do multiple arrays in tandem. In C, I have no problem with this.
Care to post some code? I think you'll be quite surprised by the latest version of the G++ optimizer. When given complete information it can vectorize most things impressively optimally. Heck, it even correctly optimized a coworker's naive usage of the class. Additionally, multiple arrays can be implemented with a single array. Heck, you can even think of a matrix as a fiction on top of a single-dimensioned array.

Quote:
Originally Posted by Nominal Animal View Post
the latter to avoid any security implications by the implicit work done by the language or standard library.
In fact, I would argue that C++ by virtue of its type-safety, implicit code-reuse, and pre-build collections schemes far exceeds C in terms of security benefits. In C, it is possible to exploit a mis-typed *printf() routine. In C++, one is encouraged, quite audibly, to avoid the *printf suite. That's just an example, mind you. As someone well versed in security, this is one place where I cannot even fathom C offering more than C++.
 
Old 05-29-2012, 10:52 AM   #58
orgcandman
Member
 
Registered: May 2002
Location: new hampshire
Distribution: Fedora, RHEL
Posts: 600

Rep: Reputation: 110Reputation: 110
Quote:
Originally Posted by Nominal Animal View Post
I do understand. The point johnsfine made in a later post, about using a subset of C++ suitable for the domain at hand, I also agree on.
Also agreed.

Quote:
Originally Posted by Nominal Animal View Post
As I've mentioned in other threads, C does lack certain useful features C++ has. Quite often I wish C had standardized ..., proper atomic...
I'll send you a PM on this, but C++11 and C11 add atomics to the language, and I wanted to make sure I mentioned it. HOORAY FOR STANDARDS!
 
Old 05-29-2012, 01:20 PM   #59
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by orgcandman View Post
Here, you have admitted in the bolded part that std::valarray DOES provide something not provided by the C standard, which means that any fair comparison between C and C++ on vectorized maths must now be relegated to the following:
You skipped the case I'm interested in: the features provided by current C (or C++) compilers, "de facto standard features", if you will.

Personally, I'm mostly interested in GCC, Intel Compiler Collection, and Pathscale C compilers, since these are the ones I've worked with in HPC environments. I can install Open64 and probably have access to Portland Group compiler, too.

Quote:
Originally Posted by orgcandman View Post
When I gave the example of std::valarray, I was not trolling (and am offended that you would imply such, given my s:n ratio). The question posed was (paraphrased):

With this understanding and the framework I outlined above, I believe that std::valarray provides something that C lacks. I'm not going to try and say that in all cases C++ is superior to C. However, your question asks this, and I believe std::valarray represents this case (and is therefore why I picked it - making it quite relevant).
I think we found the disconnect. I quite agree with your characterization above, but I was not arguing about that. Let me try explaining my concern or point in a different way.

Johnsfine stated that In the hands of an expert, C++ is equal or occasionally faster execution speed than C. I have not yet seen the faster execution speed than C, when the C code uses the features I consider "de facto standard". (Well, I don't actually consider it that, but they are available in all the compilers I use. If you look at all C compilers, especially MS one, you'd probably have to limit yourself to C89, which I would find basically hopeless to work in.)

I thought you were trolling, because you only described a feature, not an application of that feature; and I thought I specifically asked for cases where C++ yields faster execution speed.

This is important to me, because it impacts on how I should direct my development efforts in the future. Just because I prefer C now, I can change my mind if new evidence appears. If I find out that moving to C++ (in my case, most likely a tight subset, or perhaps even freestanding C++) yields clearly faster execution speed for my use cases, I'll happily do that. I just have not seen that case yet.

Quote:
Originally Posted by orgcandman View Post
I am not sure of any standardized C code which allows one to specify vectorized maths. If there is, I would be very interested.
Most compilers support these vector extensions. On x86 and x86-64, I heavily rely on these. I believe they actually originated in the Intel Compiler Collection. I am not sure of how complete the AVX support is, because I don't have access to hardware supporting AVX yet.

For example,
Code:
typedef float v4sf __attribute__((vector_size (16)));

v4sf multiply_add(const v4sf a, const v4sf b, const v4sf c)
{
    return a * b + c;
}
compiled in e.g. GCC with SSE2 available, produces perfectly vectorized code (mulps %xmm1, %xmm0; addps %xmm2, %xmm0) for the function.

This is not standardized by anybody yet, as far as I know, so you could say it is a nonstandard extension and therefore invalid wrt. this discussion. I could claim it is a de facto standard, because most C compilers -- certainly the ones used for HPC -- do compile that C code just fine.

The important point to me is that the above code compiles to vectorized code (although I prefer to use the explicit built-ins instead), using the most popular C compilers used in HPC. This, and the built-in functions, are also a very stable feature; this support has existed as-is since MMX times.

Quote:
Originally Posted by orgcandman View Post
With C, it REQUIRES that someone take a lot of extra time crafting a solution to a problem that has been solved. This alone makes C++ far more efficient, programmer time wise.
No, and yes. The reason why I use these intrinsics is that the extra control yields better results. It is very taxing design work to implement a potential model in a way that vectorizes efficiently, but the end result is easily worth it. I'm talking about 25% to 50% increase in computation speed. In my world, the existing solutions, including std::valarray, are poor.

Quote:
Originally Posted by orgcandman View Post
The other advantages we gain from C++ are the type-safety
Yes, yes; I've agreed on those features already. It is not like I'm recommending new programmers to ignore C++ and concentrate on C.

Quote:
Originally Posted by orgcandman View Post
Care to post some code?
I'd rather not, but I guess I have to. My multiband EAM code is too complex for this, but I think Lennard-Jones should be simple enough. I'll include the mathematical stuff and explanations, so others can implement the same algorithm, instead of trying to translate C to some other language. But I warn you, I do have to use the vector built-in functions; it's pretty nasty to read. I'll try to add enough comments.
 
Old 05-29-2012, 05:09 PM   #60
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Actually I found a very useful real world case, which is even simpler: Calculating distances and unit vectors between one atom and all its neighbors.

The difficulty is that atom coordinates are not consecutive in memory. Instead, you have an array of atom identifiers (indexes); the coordinates must be referred to indirectly.

Neighbor data (atom index, distance, unit vector) is consecutive in memory, however. Atom index, distance, and unit vector also have room for one extra entry, and are padded to a multiple of four entries.

There is an additional twist: the distances, unit vectors, and atom numbers must be arranged so that all neighbors closer than a specified limit are listed first. Basically, the same thing as in quicksort: the distance limit is the pivot. The pivot itself belongs to the latter set. This is an important algorithmic optimization relied on elsewhere, so that we do not need to calculate known zeros.

With this code, a simulation relying on atom pair interactions (Morse, Lennard-Jones, EAM, dipole interactions between charged particles) does not need to refer to the scattered atom coordinates, but has both the distance and pair unit vectors readily available. My tests indicate this yields faster execution than the traditional model, where interatom distances and vectors are calculated from coordinates on each run. In particular, the ability to skip the atoms that are known to not interact (albeit being still neighbors), is important. So, this is not an abstract benchmark, but real-world useful code.

Here is the slow, plain, reference C code. This uses separate arrays for each coordinate, but you can use 3- or 4-component point structures or objects just as well.
Code:
int distances(const unsigned int   neighbors, /* Number of neighbors */
              unsigned int *const  neighbor,  /* Neighbor atom indexes */
              double *const        distance,  /* Consecutive output array: distance */
              double *const        deltax,    /* Consecutive output array: unit vector x components */
              double *const        deltay,    /* Consecutive output array: unit vector y components */
              double *const        deltaz,    /* Consecutive output array: unit vector z components */
              const double *const  x,         /* Atoms' x coordinates */
              const double *const  y,         /* Atoms' y coordinates */
              const double *const  z,         /* Atoms' z coordinates */
              const unsigned int   atom,      /* Index for the one atom */
              const double         limit)     /* Distance limit, "pivot" */
{
    {
        const double    x0 = x[atom],  /* This atom coordinates; */
                        y0 = y[atom],  /* subtract from the other */
                        z0 = z[atom];  /* atom to get delta vector. */
        int             i;

        for (i = 0; i < neighbors; i++) {
            const unsigned int n = neighbor[i];
            const double    dx = x[n] - x0,   /* Delta vector components */
                            dy = y[n] - y0,
                            dz = z[n] - z0;
            const double    dd = dx*dx + dy*dy + dz*dz;   /* Distance squared */
            const double    d = sqrt(dd);     /* Distance */

            /* Save unit vector, */
            deltax[i] = dx / d;
            deltay[i] = dy / d;
            deltaz[i] = dz / d;

            /* and distance. */
            distance[i] = d;
        }
    }

    /* The distance[], deltax[], deltay[], deltaz[] are guaranteed to
     * have at least one extra element, and padded to a multiple of 4 elements.
     * Use a sentinel to avoid ihead < itail checks.
    */
    distance[neighbors] = limit;

    {
        unsigned int    ihead = 0;
        unsigned int    itail = neighbors - 1;

        while (1) {

            /* Keep close neighbors at start of list */
            while (distance[ihead] < limit)
                ihead++;

            /* All under limit? */
            if (ihead >= itail)
                return ihead;

            /* Keep far neighbors at end of list */
            while (distance[itail] >= limit)
                itail--;

            /* Done? */
            if (ihead >= itail)
                return ihead;

            /* Swap. */
            {   double  dx = deltax[ihead],
                        dy = deltay[ihead],
                        dz = deltaz[ihead],
                        dd = distance[ihead];
                unsigned int nn = neighbor[ihead];

                deltax[ihead] = deltax[itail]; deltax[itail] = dx;
                deltay[ihead] = deltay[itail]; deltay[itail] = dy;
                deltaz[ihead] = deltaz[itail]; deltaz[itail] = dz;
                distance[ihead] = distance[itail]; distance[itail] = dd;
                neighbor[ihead] = neighbor[itail]; neighbor[itail] = nn;
            }

            /* This pair is now in correct order. */
            ihead++;
            itail--;
        }
    }
}
On my machine (Athlon II X4 640), using gcc-4.6.1 and -O3 -fomit-frame-pointer -msse3 , the above runs in 80 clocks per neighbor, including all overhead. My test case is a unit cubic lattice, neighbors included to up to +-3 (7󬱟-1 = 342 neighbors total), with pivot at distance 4.5. That should be roughly typical; about 10% of neighbour list members do not interact with the atom at any given point in time.

My reference implementation using SSE3 runs in 47 clocks per neighbor, including all overhead; about a 41% speedup to plain C:
Code:
typedef double v2df __attribute__((vector_size (16)));

/* TODO: See if using doubles would actually be faster.
*/
static inline void swap_doubles(void *const ptr1, void *const ptr2)
{
    const unsigned long t  = *(unsigned long *)ptr1;
    *(unsigned long *)ptr1 = *(unsigned long *)ptr2;
    *(unsigned long *)ptr2 = t;
}

static inline void swap_uints(void *const ptr1, void *const ptr2)
{
    const unsigned int t  = *(unsigned int *)ptr1;
    *(unsigned int *)ptr1 = *(unsigned int *)ptr2;
    *(unsigned int *)ptr2 = t;
}

/* Update the neighbor list for one atom.
 * Return the number of interacting neighbors.
 * The neighbor and distance lists are padded to a multiple of 4,
 * with at least one unused entry at end.
*/
int distances(const unsigned int   neighbors,
               unsigned int *const neighbor,
               double *const       distance,
               double *const       deltax,
               double *const       deltay,
               double *const       deltaz,
               const double *const x,
               const double *const y,
               const double *const z,
               const unsigned int  atom,
               const double        limit)
{
    /* Note: neighbor[] array is padded with a safe index
     *       to a multiple of four entries. */

    /* Distance calculation part. */
    {
        const v2df    x0 = { x[atom], x[atom] },
                      y0 = { y[atom], y[atom] },
                      z0 = { z[atom], z[atom] };
        unsigned int  i = 0;

        while (i < neighbors) {
            const unsigned int i1 = neighbor[i++];
            const unsigned int i2 = neighbor[i++];
            const unsigned int i3 = neighbor[i++];
            const unsigned int i4 = neighbor[i++];
            v2df               x1 = { x[i1], x[i2] },
                               y1 = { y[i1], y[i2] },
                               z1 = { z[i1], z[i2] },
                               x2 = { x[i3], x[i4] },
                               y2 = { y[i3], y[i4] },
                               z2 = { z[i3], z[i4] },
                               d1, d2, t1, t2;

            /* Substract current atom coordinates. */

            x1 = __builtin_ia32_subpd(x1, x0);
            y1 = __builtin_ia32_subpd(y1, y0);
            z1 = __builtin_ia32_subpd(z1, z0);

            x2 = __builtin_ia32_subpd(x2, x0);
            y2 = __builtin_ia32_subpd(y2, y0);
            z2 = __builtin_ia32_subpd(z2, z0);

            /* Calculate squared distances. */

            d1 = x1;
            d2 = x2;
            d1 = __builtin_ia32_mulpd(d1, d1);
            d2 = __builtin_ia32_mulpd(d2, d2);

            t1 = y1;
            t2 = y2;
            t1 = __builtin_ia32_mulpd(t1, t1);
            t2 = __builtin_ia32_mulpd(t2, t2);

            d1 = __builtin_ia32_addpd(d1, t1);
            d2 = __builtin_ia32_addpd(d2, t2);

            t1 = z1;
            t2 = z2;
            t1 = __builtin_ia32_mulpd(t1, t1);
            t2 = __builtin_ia32_mulpd(t2, t2);

            d1 = __builtin_ia32_addpd(d1, t1);
            d2 = __builtin_ia32_addpd(d2, t2);

            /* Take square root to get distances. */

            d1 = __builtin_ia32_sqrtpd(d1);
            d2 = __builtin_ia32_sqrtpd(d2);

            /* Divide deltas by distances to get unit vectors. */

            x1 = __builtin_ia32_divpd(x1, d1);
            y1 = __builtin_ia32_divpd(y1, d1);
            z1 = __builtin_ia32_divpd(z1, d1);

            x2 = __builtin_ia32_divpd(x2, d2);
            y2 = __builtin_ia32_divpd(y2, d2);
            z2 = __builtin_ia32_divpd(z2, d2);

            /* Save coordinate deltas. */

            __builtin_ia32_storeupd(&deltax[i-4], x1);
            __builtin_ia32_storeupd(&deltay[i-4], y1);
            __builtin_ia32_storeupd(&deltaz[i-4], z1);

            __builtin_ia32_storeupd(&deltax[i-2], x2);
            __builtin_ia32_storeupd(&deltay[i-2], y2);
            __builtin_ia32_storeupd(&deltaz[i-2], z2);

            /* Save distances. */

            __builtin_ia32_storeupd(&distance[i-4], d1);
            __builtin_ia32_storeupd(&distance[i-2], d2);
        }
    }

    /* Set the distance sentinel. */
    distance[neighbors] = limit;

    /* Neighbor sorting part.
     * Move atoms closer than limit to start of list,
     * and atoms at limit or further to end of list.
     * Other than that, the order is not important.
    */
    {
        unsigned int ihead = 0;
        unsigned int itail = neighbors - 1;

        while (1) {

            /* Keep close neighbors at start of list */
            while (distance[ihead] < limit)
                ihead++;

            /* All close? */
            if (ihead >= itail)
                return ihead;

            /* Keep far neighbors at end of list */
            while (distance[itail] >= limit)
                itail--;

            /* Done? */
            if (ihead >= itail)
                return ihead;

            /* Swap. */
            swap_doubles(&deltax[ihead], &deltax[itail]);
            swap_doubles(&deltay[ihead], &deltay[itail]);
            swap_doubles(&deltaz[ihead], &deltaz[itail]);
            swap_doubles(&distance[ihead], &distance[itail]);
            swap_uints(&neighbor[ihead], &neighbor[itail]);

            /* This pair is now in correct order. */
            ihead++;
            itail--;
        }
    }
}
Let me be the first to say that my SSE3 code is butt-ugly. It belongs in a simulation library if anywhere -- which is exactly where I'll end up putting it. It is already complex enough to escape my assembly optimization skills: I can easily write it in assembly, but GCC seems to produce at least as fast code. I'd need an IDE with the ability to estimate clocks used, pipelines used, stalls, and so on; it is just too slow to find a good balance otherwise.

My point for writing it is that I still believe this code yields the fastest execution time. I really would not mind being proven wrong.

Note that I just wrote this from scratch, it is not something I've yet tried to optimize any further. Other than working in groups of four indirect atoms, it is a very straightforward implementation, I think.

I'd be very interested to see an efficient C++ implementation we could compare to my SSE3 version above. If you need to tweak the problem to solve it better, please do; just remember that neighbor atom coordinates cannot be consecutively in memory, they have to be indirectly referenced, we need both distances and unit vectors, and the neighbors closer than the specified limit should be listed first in the result.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration