[SOLVED] GCC and why is there strings in the binary ?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Hello, im a long time programmer for windows but newbi to compiling linux apps and want to port one of my free-ware but closed source applications to linux of which I have done, and I have it all working just fine.
My concern however is that the resulting binary is full of strings from the source file which constitutes an IP release. This is a compilation without debugging information set (no -g) in fact I can set -g0 also to ensure no debug and its the same.
My app is quite large, uses many classes with some virtual functions, uses some shared libraries like math, GL, SDL etc and links a few more of the wider licensed ones statically like libz.a
Aside from the strings I would expect in the binary, such as error messages, I can also see class names, function names and even source file names. I have tried to use the -O2 or -Os optimization flags as well as tried to mark the visibility as hidden. I have used the -s strip option as well as calling the Strip util on the exe. All to no avail.
I have also done some 'strings' call on some of the apps which ship with Ubuntu such as chess and Quadrapassel games and they also have some filename and function names in the binary too.
For into, I'm using Ubuntu 10.04 with its default gcc so I am using g++ 4.4.3-4 to compile but I am doing this in preparation to go to an ARM based embedded PC under debian.
So to the questions.... is it normal that gcc is stuffing the binary with this information ? Is there any way to remove it ? Do I have to use a code obfuscator to mangle the source before compilation in order to mitigate against it ?
This is not what I need. I'm ok with all the hard coded strings I put into the app myself. Its the source code class names, private function names, and source code file names which I object to being there.
If your application is not statically linked, it will have public symbols embedded which will be used to link with shared object libraries. Is that what you are talking about?
Hi Rod & thanks for posting but its not the shared library strings that is at issue here. If I take the final release binary from a gcc build and I type: 'nm app.exe' then it lists off all of the ELF sections and they include sections labelled with the file names of the source file. All of the names of the class members are there too. Ok. If I go "strip -o Stripped.exe Orig.exe" then it generates a smaller file. If I go "nm Stripped.exe" then it says "No symbols" great ! BUT... if I then go "strings Stripped.exe | grep cpp" then it lists all of the filenames used to make up the app again. Using strings, I can see many function names and class names.
So not just member functions for shared libraries but many member function names and file names in there.
As an aside it seems from many forum posts that you can try to static link an app but gcc will try to link with libgcc and libc shared anyway and even if you override that then its unsafe because of gcc ABI changes with different versions as they interact with the kernel. So not sure its possible to static link an app in gcc properly anyway.
link with:
g++ -fmax-errors=20 -Wl,-O,-s,--gc-sections objs/Redrobes.o -o HelloWorld.exe
i.e no specific shared libs this time. Then:
strip -o Stripped.exe HelloWorld.exe
and:
strings Stripped.exe
produces at the end:
...
Hello World
5Robes
3Red
Now the "Hello World" is just fine as that is my const string. The Robes and the Red should not be in the file. Its the class name. In this example the filename is not appearing but in a larger app, loads more source strings are in the binary - its chock full of it. I think its something to do with the ELF section naming conventions. Can anyone elaborate how it generates them or how to not forward source strings or filenames of source code into the ELF section names.
Thanks, thats good advice and it does shrink my binary a little more. But its still not removing the strings. I have downloaded LLVM and compiled up the latest version of that and its still in that binary too. It seems that its necessary for there to be that section but I suspect that it does not have to have that name specifically. Well, worst come I could obfuscate it manually before compile but that seems well overkill.
What you're seeing is the mangled names from RTTI. No amount of stripping is going to take those out, because the compiler/linker can't actually determine that there's no way for .name() on the types will ever get called.
Don't go source-mangling, it is not worth the headaches. You're trying to use a technical solution for a legal problem. Same reason why DRM never works out. The only way to avoid IP issues completely.. is to not release *anything*.
What you're seeing is the mangled names from RTTI. No amount of stripping is going to take those out, because the compiler/linker can't actually determine that there's no way for .name() on the types will ever get called.
Ok thanks that does sound like a likely candidate reason. What I have been noticing as I am including more and more static libraries of my own code or free libs like PNG into the main app is that a) C based code does not seem to appear at all and also b) its a very low amount on any library code compared to the main code. As I move some of the code out of the main core and into the proper placed library sections and link it back in the amount of string info is going down. I also note that it seems to be prevalent on virtual functions more than anything else.
In terms of RTTI, though it might get used as part of normal C++ type use or overloading etc, I don't personally use it much or at all. I those instances I would normally define an enum in the base class and pass one of them on a constructor and switch / cast the type manually. I know that might not be a 'best practice' way but I keep control over the way it determines the type. I'll definitely try some options around the RTTI and see what happens. If it were RTTI then I could account for the class names being there but the filename.cpp names still being there seems a bit iffy. Maybe I have botched some aspect of debug where __FILE__ is being introduced in non debug cases.
Where I could be sure about how I might use RTTI, I don't know if SDL might use it. I don't know if I can switch off RTTI just for the main code and on for some libs. If its per file then I could add a make option for the main code or each lib but I suspect that it would be a link option. I'll check it out.
I have tried the -fno-rtti option and it has indeed removed all the unwarranted strings from the binary. So a combination of a few options but mainly this one and the question can be marked [SOLVED] now. Thanks very much. I have ported my one app successfully now. Ill do the others and release a full set as Linux now along side the Windows set in due course.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.