Sunday, May 17, 2009

32bit/64bit programming -- an interesting problem #2

...continued

I was recently looking at the source of an open-source library. The library is supported on all popular platforms in both 32bit and 64bit. When providing a library for 32bit and 64bit platforms a new problem kicks in. ie., to make sure that the applications using this library uses the correct version of the library. ie., a 32bit application should use the 32bit version of the library and a 64bit application should use the 64bit version of the library. Obviously, it is not possible to cross link the binaries of 32bit and 64bit, and so the linker would fail if the application tried to do so. But, the difficult problem here is to restrict the application from using the wrong header files of the library. ie., a 64 bit application can inadvertantly include the 32bit headers of the library and link against the 64bit version of the library -- and it is quite possible that this will succeed even without a warning (although there are cases where this would not work).

Consider this function:
//
void __cdecl messup(struct my_struct *);
//
A 64bit translation unit that calls this function after #including a 32bit header for this function would just link fine with a 64bit library for the same function. The 32bit version of my_struct and 64bit version of my_struct shall possibly be defined differently by the library due to the data-alignment requirement between 32bit and 64bit for performance reasons (padded with extra bytes?). Thus the application assumes a different structure while the library expects a different structure. This might lead to crash. Aah!

Now that's bad. So what does it finally mean? It does mean that appropriate headers are equally important as the appropriate binaries, but unfortunately lacking the support to enforce from the building tools. To take this problem one step further, given the various data models within 64bit platforms, it is not just the platform that matters, but it is the data model.

To redefine the problem again in its final form: An application that is being built on a X data model should include the headers and libraries that were built for the X data model.

There could potentially be many ways to solve this problem. A quick answer would be to have a common header file for all data models but have ifdef'ed code for each data model in the same file. This has few drawbacks (in my opinion): declarations for all data models need to be in the same file (clutter? maintenance?); it might be very difficult (possible?) to determine the data model in the pre-processor phase, so the right set of declarations go in for compilation (afaik, there does not seem to be a pre-processor directive for the data models and depending on the pre-processor directives for the platform might be too many to handle; what about unknown platforms?).

I was actually impressed by another option that this library I talked about, had used. Actually among the 32bit and 64bit platforms, the predominant data models (LP64, LLP64, ILP32) only differ in the size of long and pointer. This library while generating its own headers (during its build time) puts in the size of the long and pointer, into the header file as it was inferred during the library's compilation. This provides an easier and reliable way of identification of the data model later for which the header was built for.

The header file generation code would be something as simple as this:
//
fprintf(header_file, "#define MYLIB_SIZEOF_LONG %d", (int) sizeof(long));
fprintf(header_file, "#define MYLIB_SIZEOF_PTR %d", (int) sizeof(void*));
//
Now that we have a means to carry forward the metadata of the data model of the library onto the headers, how do we prevent the compilation in an inappropriate data model. The idea used was simple, and should be self-explanatory. The library also added the following code to their header file:
//
static char _somearray_[sizeof(long) == MYLIB_SIZEOF_LONG ? 1 : -1];
static char _somearray2_[sizeof(void*) == MYLIB_SIZEOF_PTR ? 1 : -1];
//
If it isn't obvious, these lines declare an array of size -1 (which is illegal for compilation) incase if the sizes of long and pointer of the application didn't match with the one in the headers. Cool! that's what we need.

I feel that there are 2 tradeoffs I see with this approach:

1. Though the misuse is prevented, the error message isn't friendly. When you use a wrong header file, you get a message saying 'invalid array size' or 'invalid array subscript' or 'an array should have at least one element' etc., One might have to refer to Google to figure out the issue.

2. Two more names are added to the namespace (and 2 bytes) to the current translation unit. Usage of underscores and uncommon names might almost avoid a possibility of a name collision, but still :) I would think of a single struct having one member for each enforcement rule, so there is only 1 symbol added to the global namespace.

Any other solution??

4 comments:

  1. You can use template specializations for more "meaningful" compile time error messages. The general idea is to create a template class with a bool type parameter, and then specialize it for true (or false), making the specialization non-compilable.
    I picked this up from Alexei Alexandrescu's "Modern C++ Design" - here's a link to the relevant content (http://books.google.co.in/books?id=aJ1av7UFBPwC&pg=PA23&lpg=PA23&dq=Compile-Time+Assertions+Modern+C%2B%2B+design&source=bl&ots=YQeF-uTi21&sig=KrIqRJ2Ju8MoUQT_228jfSIgfxY&hl=en&ei=AzMQSsvkB4iIkAWfz-S4BA&sa=X&oi=book_result&ct=result&resnum=2#PPA26,M1)

    ReplyDelete
  2. That's a neat one. The library I was mentioning was written in C and should be compatible against both C/C++ apps -- that explains why they had to stay away from C++ based enforcements.

    ReplyDelete
  3. you can use C macro & typedefs (Btw, your solution is wrong), here's the better one:

    typedef long INT32;
    typedef long long INT64;
    typedef unsigned long UINT32;
    typedef unsigned long long UINT64;

    // or use this to achieve source code portability:

    #if __x86_64__
    #define INT INT64
    #elif
    #define INT INT32
    #endif

    in your program just use INT to declare new integer variable :)
    eg: INT x = 123;

    ReplyDelete
  4. I think you didn't understand the intent of this post. It is anybody's guess to ifdef appropriate types. Please read through it again.

    I had in fact mentioned about your solution and why I wouldn't want it:

    "A quick answer would be to have a common header file for all data models but have ifdef'ed code for each data model in the same file. This has few drawbacks (in my opinion): declarations for all data models need to be in the same file (clutter? maintenance?); it might be very difficult (possible?) to determine the data model in the pre-processor phase, so the right set of declarations go in for compilation (afaik, there does not seem to be a pre-processor directive for the data models and depending on the pre-processor directives for the platform might be too many to handle; what about unknown platforms?)."

    Also read about LP64, LLP64 and other data models.

    ReplyDelete