If you handle very large files, or files only rarely accessed, it turns out you can do the endianness conversion while reading the data without slowing down the program. It does consume more CPU cycles than not doing a conversion, but reading a large file is I/O bound anyway; so,
if you do the conversion while still reading the file, the conversion is practically free. I write my own low-level I/O routines, reading the data in 64k to 2M chunks, and apply any necessary conversions for each completed chunk. It does get a bit complicated, because the read does not necessarily end with a field boundary, but for large data files it is certainly worth the code complexity.
If you know the data is always in little-endian order, you could use
Code:
#include <stdint.h>
static inline uint16_t get_le16(const void *const from)
{
return ((uint16_t)(((const unsigned char *)from)[0]) )
| ((uint16_t)(((const unsigned char *)from)[1]) << 8U);
}
static inline uint32_t get_le32(const void *const from)
{
return ((uint32_t)(((const unsigned char *)from)[0]) )
| ((uint32_t)(((const unsigned char *)from)[1]) << 8U)
| ((uint32_t)(((const unsigned char *)from)[2]) << 16U)
| ((uint32_t)(((const unsigned char *)from)[3]) << 24U);
}
static inline uint64_t get_le64(const void *const from)
{
return ((uint64_t)(((const unsigned char *)from)[0]) )
| ((uint64_t)(((const unsigned char *)from)[1]) << 8U)
| ((uint64_t)(((const unsigned char *)from)[2]) << 16U)
| ((uint64_t)(((const unsigned char *)from)[3]) << 24U)
| ((uint64_t)(((const unsigned char *)from)[4]) << 32U)
| ((uint64_t)(((const unsigned char *)from)[5]) << 40U)
| ((uint64_t)(((const unsigned char *)from)[6]) << 48U)
| ((uint64_t)(((const unsigned char *)from)[7]) << 56U);
}
but the above functions end up being pretty slow. Certainly they are much slower than just reversing the endianness:
Code:
#include <stdint.h>
static inline uint16_t swap_endian16(uint16_t u)
{
return ((u >> 8U) & 0xFFU)
| ((u & 0xFFU) << 8U);
}
static inline uint32_t swap_endian32(uint32_t u)
{
const uint32_t m8 = (uint32_t)0xFF00FFUL;
const uint32_t m16 = (uint32_t)0xFFFFUL;
u = ((u >> 8U) & m8) | ((u & m8) << 8U);
u = ((u >> 16U) & m16) | ((u & m16) << 16U);
return u;
}
static inline uint64_t swap_endian64(uint64_t u)
{
const uint64_t m8 = (uint64_t)0x00FF00FF00FF00FFULL;
const uint64_t m16 = (uint64_t)0x0000FFFF0000FFFFULL;
const uint64_t m32 = (uint64_t)0x00000000FFFFFFFFULL;
u = ((u >> 8U) & m8) | ((u & m8) << 8U);
u = ((u >> 16U) & m16) | ((u & m16) << 16U);
u = ((u >> 32U) & m32) | ((u & m32) << 32U);
return u;
}
On 32-bit architectures, arrays of 16-bit values are fastest to convert two at a time; use a variant of swap_endian32() that only does the m8 step.
On 64-bit architectures, arrays of 16-bit values are fastest to convert four at a time; use a variant of swap_endian64() that only does the m8 step. Arrays of 32-bit values are fastest to convert two at a time; use a variant of swap_endian64() that only does the m8 and m16 steps.
There used to be certain architectures which had mixed byte orders (CDAB); the latter conversion functions only need small modifications to convert those too.
On some architectures it is possible that floats (float and double) have different byte order than integer values. I personally put
prototype values in the header:
- uint16_t: 43981 (0xABCD)
- uint32_t: 67305985 (0x04030201)
- float: 721409.0/1048576.0 (0x3d302010)
- double: 66809.0/8323200.0 (0x3f80706050403020)
Note that internally,
float can be treated as
uint32_t, and
double as
uint64_t, if you remember that they may have different endianness than the integer values. The prototype values also make sure the architecture understands IEEE 754 (float AKA binary32, and double AKA binary64) floating-point values, possibly after an endianness correction.
For those who still use Fortran, it is possible to do the conversion in Fortran too, if the compiler supports sequential raw I/O (binary, no record boundaries). That is where I originally developed this for. It was an order of magnitude faster than text I/O; conversion between strings and floating-point values is surprisingly slow.