LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-27-2011, 12:19 AM   #1
jaepi
Member
 
Registered: Apr 2007
Location: Urban Jungle
Distribution: Ubuntu
Posts: 189
Blog Entries: 1

Rep: Reputation: 30
Converting a binary file from little endian to big endian


I'm a little new in dealing with endianness. I created a binary file, compiled it in Windows using Visual Studio in little endian. My problem is, this file will be read in Linux in which the application is compiled using big endian. I have a function to convert unsigned long variables (the header of my bin file) which is doing its job pretty well but slows down the speed of the application. My other concern is the entire buffer or the content of the file (I can only byte swap the header but not the entire content X_X). I think the best way to deal with this is to convert the entire file first before reading it. But I don't have an idea how to start. I will appreciate all the help and suggestions I can get. Thank you
 
Old 08-27-2011, 01:22 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,236

Rep: Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150
By binary, I hope you mean "binary data" and not "binary executable".
If so, you know the layout of the data - your structs after all. You need to know the type of (big-endian) hardware to know what the data-types represent. How big (as in how many bytes are occupied by ...) an int is, or a double - that sort of thing. Each field has to be handled separately.
AFAIC, it's a no-brainer that the entire file be converted prior to feeding it to the app. Where might be a consideration, but you'd expect Intel CPU cycles to be cheaper - do it there.
 
Old 08-28-2011, 12:52 AM   #3
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
If you handle very large files, or files only rarely accessed, it turns out you can do the endianness conversion while reading the data without slowing down the program. It does consume more CPU cycles than not doing a conversion, but reading a large file is I/O bound anyway; so, if you do the conversion while still reading the file, the conversion is practically free. I write my own low-level I/O routines, reading the data in 64k to 2M chunks, and apply any necessary conversions for each completed chunk. It does get a bit complicated, because the read does not necessarily end with a field boundary, but for large data files it is certainly worth the code complexity.

If you know the data is always in little-endian order, you could use
Code:
#include <stdint.h>

static inline uint16_t get_le16(const void *const from)
{
        return ((uint16_t)(((const unsigned char *)from)[0])      )
             | ((uint16_t)(((const unsigned char *)from)[1]) << 8U);
}

static inline uint32_t get_le32(const void *const from)
{
        return ((uint32_t)(((const unsigned char *)from)[0])       )
             | ((uint32_t)(((const unsigned char *)from)[1]) <<  8U)
             | ((uint32_t)(((const unsigned char *)from)[2]) << 16U)
             | ((uint32_t)(((const unsigned char *)from)[3]) << 24U);
}

static inline uint64_t get_le64(const void *const from)
{
        return ((uint64_t)(((const unsigned char *)from)[0])       )
             | ((uint64_t)(((const unsigned char *)from)[1]) <<  8U)
             | ((uint64_t)(((const unsigned char *)from)[2]) << 16U)
             | ((uint64_t)(((const unsigned char *)from)[3]) << 24U)
             | ((uint64_t)(((const unsigned char *)from)[4]) << 32U)
             | ((uint64_t)(((const unsigned char *)from)[5]) << 40U)
             | ((uint64_t)(((const unsigned char *)from)[6]) << 48U)
             | ((uint64_t)(((const unsigned char *)from)[7]) << 56U);
}
but the above functions end up being pretty slow. Certainly they are much slower than just reversing the endianness:
Code:
#include <stdint.h>

static inline uint16_t swap_endian16(uint16_t u)
{
        return ((u >> 8U) & 0xFFU)
             | ((u & 0xFFU) << 8U);
}

static inline uint32_t swap_endian32(uint32_t u)
{
        const uint32_t m8  = (uint32_t)0xFF00FFUL;
        const uint32_t m16 = (uint32_t)0xFFFFUL;

        u = ((u >>  8U) & m8)  | ((u & m8)  <<  8U);
        u = ((u >> 16U) & m16) | ((u & m16) << 16U);

        return u;
}

static inline uint64_t swap_endian64(uint64_t u)
{
        const uint64_t m8  = (uint64_t)0x00FF00FF00FF00FFULL;
        const uint64_t m16 = (uint64_t)0x0000FFFF0000FFFFULL;
        const uint64_t m32 = (uint64_t)0x00000000FFFFFFFFULL;

        u = ((u >>  8U) & m8)  | ((u & m8)  <<  8U);
        u = ((u >> 16U) & m16) | ((u & m16) << 16U);
        u = ((u >> 32U) & m32) | ((u & m32) << 32U);

        return u;
}
On 32-bit architectures, arrays of 16-bit values are fastest to convert two at a time; use a variant of swap_endian32() that only does the m8 step.

On 64-bit architectures, arrays of 16-bit values are fastest to convert four at a time; use a variant of swap_endian64() that only does the m8 step. Arrays of 32-bit values are fastest to convert two at a time; use a variant of swap_endian64() that only does the m8 and m16 steps.

There used to be certain architectures which had mixed byte orders (CDAB); the latter conversion functions only need small modifications to convert those too.

On some architectures it is possible that floats (float and double) have different byte order than integer values. I personally put prototype values in the header:
  • uint16_t: 43981 (0xABCD)
  • uint32_t: 67305985 (0x04030201)
  • float: 721409.0/1048576.0 (0x3d302010)
  • double: 66809.0/8323200.0 (0x3f80706050403020)
Note that internally, float can be treated as uint32_t, and double as uint64_t, if you remember that they may have different endianness than the integer values. The prototype values also make sure the architecture understands IEEE 754 (float AKA binary32, and double AKA binary64) floating-point values, possibly after an endianness correction.

For those who still use Fortran, it is possible to do the conversion in Fortran too, if the compiler supports sequential raw I/O (binary, no record boundaries). That is where I originally developed this for. It was an order of magnitude faster than text I/O; conversion between strings and floating-point values is surprisingly slow.
 
1 members found this post helpful.
Old 08-28-2011, 01:30 AM   #4
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
jaepi -

* Please double-check and make sure that byte ordering is even a problem. Byte ordering usually comes into play when you exchange data between two different CPU architectures (e.g. SPARC or MIPS with an Intel CPU). It's seldom an issue between Windows and Linux (assuming you're using Intel CPUs on both).

* If it *is* an issue, and if you can convert the file en masse, perhaps the fastest/easiest/most efficient way is with "dd -conv=swab":

http://www.codecoffee.com/tipsforlin...icles/036.html

'Hope that helps .. PSM
 
Old 08-28-2011, 03:17 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,236

Rep: Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150Reputation: 4150
If it's big-endian vs little-endian, it's a problem. And it's extraordinarily unlikely to be a simple byte swap that dd can help with.

When I did this, I used perl as it was a one-off for a customer I didn't have access to.
 
Old 08-28-2011, 12:32 PM   #6
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
As a general rule, when storing binary data (especially data structures) you should only expect the data to be accessible on the boot it was created on unless you've standardized the file format to be usable on all systems. If you're using data structures, you can't even guarantee the member alignment will be the same (without explicit manipulation, that is). If you don't need to mmap or have random access you might consider storing the data as text and compressing the file (e.g. bzip2) to make up for the increase in size.
Kevin Barry
 
Old 09-05-2011, 09:51 PM   #7
jaepi
Member
 
Registered: Apr 2007
Location: Urban Jungle
Distribution: Ubuntu
Posts: 189

Original Poster
Blog Entries: 1

Rep: Reputation: 30
Hello, everyone. Sorry for the late reply, I've been very busy lately x_X. It turns out that I don't need to convert the content of the .bin file since it was written using a char* buffer (my bad). The header which contains all the important information for me, is converted during run time in my app so I don't have to worry about it, although the application runs a little slow becuase of the conversion, it is rather efficient because the binary file remains unchanged. Thanks for all the help guys
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] A question about big-endian little-endian and how it affects things joeBuffer Programming 12 08-20-2009 01:02 PM
small-endian to big-endian conversion of data to store in a structure NancyT Programming 2 11-26-2008 10:06 AM
problem in understanding little endian/big endian machine program indian Programming 6 04-19-2006 02:50 PM
What is all this big endian-little endian stuff about? vdemuth Linux - Newbie 1 04-28-2004 02:16 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration