December 18, 2005 8:09 PM

Universal binaries introduction for OpenOffice.org developers

I'll try to summarize important things from my universal binaries investigation.

ChangeLog

2005/12/20 Problem: specifying CFLAGS and LDFLAGS in one command-line (filed as issue number 4387241 in Apple Bug Reporter database).

Why universal?

Apple is going to produce Intel based systems really soon now (right now 87% people on MacPolls expect it to happen in January next year). Previous systems were based on PowerPC processors.

PowerPC is big endian, Intel is little endian. They use different instruction sets. Apple wants the software to run on both platforms in the future. So they invented universal binaries concept.

What is it?

The best description is on small and simple example. We have two files - Makefile and example.c. The Makefile creates several executables:
macmini:~/tmp/example oo$ ls -al example_i386 example_ppc example example_all
-rwxr-xr-x    1 oo       oo          14772 Dec 18 19:35 example_i386
-rwxr-xr-x    1 oo       oo          17748 Dec 18 19:35 example_ppc
-rwxr-xr-x    1 oo       oo          38228 Dec 18 19:35 example
-rwxr-xr-x    1 oo       oo          38228 Dec 18 19:35 example_all
macmini:~/tmp/example oo$
This example is produced on PowerPC G4 based Mac Mini because Intel based system are not available yet (they are available if you accept to limit your freedom, but it is unacceptable for me).

First two files are simple binaries for their respective architectures:
macmini:~/tmp/example oo$ file example_i386 example_ppc
example_i386: Mach-O executable i386
example_ppc:  Mach-O executable ppc
macmini:~/tmp/example oo$
The file example shows one possible way to create universal binary - gcc can do that for you (see Makefile for details).
macmini:~/tmp/example oo$ file example example_all
example: Mach-O fat file with 2 architectures
example (for architecture i386):        Mach-O executable i386
example (for architecture ppc): Mach-O executable ppc
example_all: Mach-O fat file with 2 architectures
example_all (for architecture i386):        Mach-O executable i386
example_all (for architecture ppc): Mach-O executable ppc
macmini:~/tmp/example oo$
The second way is to prepare separate binaries for all architectures and then use the tool lipo to create final, universal binary that will combine them into example_all. The resulting files are the same:
macmini:~/tmp/example oo$ md5sum example example_all
1c1e96a58c9944a409fe093d7f9b436d  example
1c1e96a58c9944a409fe093d7f9b436d  example_all
macmini:~/tmp/example oo$
For my testing purposes, I decided to use the first (gcc) way.

So to sum up, from the porter's point of view, you can think of universal binary as a native binary (PPC in my case) bundled with cross-compiled binary for the other architecture (Intel in my case). And this "cross-compiled" brings several interesting features ;-)

Potential problems, issues and ideas

I'll collect (and update) various issues and problems with universal binaries here as we reach them during the build process.
  • Not all tools and programs should be compiled as universal binaries. If the tool is used only at build time and is not bundled with the final product, it can be compiled for native architecture only.
  • I wrote simple script that can check if the resulting binaries and/or libraries are universal. It is very simply and stupid but can help you to check the resulting binaries quickly.
  • If you are building universal binaries (so your CFLAGS variable contain two -arch flags) you can't create dependencies and use gcc's -MD argument). You can often disable it by using configure's --disable-dependency-tracking option.
  • Problems with predefined macros. If you are building on PowerPC, you get POWERPC macro defined. But this doesn't mean that you'll run on PowerPC! Do not forget about cross-compiling part! Intel binary is cross-compiled on your PowerPC machine, so you have to handle it properly. See the above mentioned example.c source code file. It prints different string depending on the architecture it *runs* on.
  • The problem mentioned above is also connected with assembler. We use assembler in *many* parts of OpenOffice.org source code. In module sal, we use assembly in interlck.c file. Module sndfile is using assembly to cast (!?) numbers. Mozilla is also using assembly (see Mozilla's universal binary page). All assembler parts have to be modified accordingly.
  • Bridges. This is a problem ^2. We still do not have bridges code for MacTel. But imagine we have it. And now we have to compile them both into one! It will be funny task! I think we will use lipo here instead of combining them together. It could be much easier.
  • Build prerequisites have to be universal binaries too! Do not forget that if your build depends on e.g. gtk, you must have gtk available as universal binary too, because otherwise the cross-compiling part won't find all symbols. This is a problem right now, because fink doesn't support universal binaries. You have to use DarwinPorts.
  • You can't specify both CFLAGS and LDFLAGS on one command-line (modified example):
    macmini:~/tmp/example oo$ gcc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -arch i386 -arch ppc \
                            -Wl,-syslibroot,/Developer/SDKs/MacOSX10.4u.sdk -arch i386 -arch ppc \
                            example.c -o example_allbyone
    /usr/bin/ld: -syslibroot: multiply specified
    collect2: ld returned 1 exit status
    /usr/bin/ld: -syslibroot: multiply specified
    collect2: ld returned 1 exit status
    lipo: can't open input file: /var/tmp//ccslVnnk.out (No such file or directory)
    macmini:~/tmp/example oo$
    
    I still do not know how to solve this properly :-( This affects several modules (like curl and libxml2). They fail in the configure phase.

Sample patches

... are included in my build system. Search in the directory Patches/SRC680* for files named UB-*. Warning: almost everything is brutal hack. I'd like to first introduce the concept to all developers and then collect ideas how to continue properly. Ause? Heiner? Mac OS X porters?

My build compiles many modules as universal binaries, but still some are only for native architecture - like python, curl, sndfile, berkeleydb, libxml2 (see the above mentioned script check_universal!) etc. If you have patch for new modules, do not hesitate to send it to me :-)

BTW - this will be the first porting effort where OpenOffice.org internal only code is much easier to port than OpenOffice.org external modules ;-)

References

Universal Binary Programming Guidelines, Second Edition
Cross-Development Programming Guide
Building Universal Binaries from "configure"-based Open Source Projects - hmm, OpenOffice.org is also based on configure. Rene ? ;-)
Porting UNIX/Linux Applications to Mac OS X

Thanks

I'd like to thank Apple for wonderful weekend!

Did I forgot something? Please tell me so I can put it here for future reference. -----

Posted by Pavel | Permanent link | File under: OpenOffice.org, Mac OS X