Monday, August 15, 2005

Adventures with Java 5 and GCJ

A few days ago I set out to do something that had seemingly never been tried, yet the process seemed straight-forward enough and I anticipated few problems (famous last words, I know). You see, I am working on this client-server systems management application which requires a small agent to be installed on a number of client machines (Windows, Linux, MacOS, etc). The server side is written with Java 5. I entertained some thoughts of writing the agent in C++, but my C++ is a little rusty and I haven't had much experience in managing a full-blown C/C++ cross-platform initiative... besides, I like working in Java. So, throwing caution to the wind, I set out to see if a natively compiled version of my Java agent would be possible and acceptable (I don't want to contend with a huge JRE install or the memory footprint associated with it). Just to make life more interesting, my development environment is Windows XP.

So, here is the basic process I intended to follow:
  1. Install Mingw/Msys (I already use Cygwin heavily but want to be closer to the Win32 api and not have dependencies on cygwin1.dll)
  2. Use RetroWeaver to convert my Java 5 class files to Java 1.4 compatible equivilents.
  3. Use the RetroWeaver reference verifier to find references to classes/methods which are new to Java 5.
  4. Create a compatibility layer to call through to new features of Java 5 (note that from the outset I had been trying to avoid most new APIs so this step wasn't so bad. Most of the stuff I found was as simple as using Arrays.toString(...) in test classes)
  5. Use GCJ to compile my classes and dependencies natively.
Well, I hit my first snag with RetroWeaver. While I was able to convert the class files, the reference verifier was buggy and difficult to invoke from ant. Poking around on the SourceForge site, I found a number of patches to fix the things I was having problems with. Unfortunately, the author hasn't released an updated version of RetroWeaver in over 6 months and many of these patches conflict with each other or do not note properly what version of the source code the diff was generated against. Also, for some reason that was probably my own brain-damage, I was having a particularly bad time getting patch to actually do its thing. So, if anyone is looking for a version of RetroWeaver that has what I consider to be the right set of patches applied, you can get it here. This version seems to have a "mostly working" reference verifier, and the Ant task has been updated to make it easy to invoke, like so (assumes that the directory ${common.dir}/retroweaver contains all of the jars found in the RetroWeaver distribution. also assumes that retroverf.dir contains a copy of all class files, retrort.file points to a JRE 1.4 rt.jar file and std.classpath and depend.classpath are valid path constructs that contain all the needed dependencies for your classes):

<taskdef name="retroweaver" classname="com.rc.retroweaver.ant.RetroWeaverTask">
<classpath>
<fileset dir="${common.dir}/retroweaver">
<include name="*.jar"/>
</fileset>
</classpath>
</taskdef>
<retroweaver srcDir="${retroverf.dir}"
verify="true">
<classpath>
<pathelement location="${retrort.file}"/>
<path refid="std.classpath"/>
<path refid="depend.classpath"/>
<pathelement location="${retroverf.dir}"/>
</classpath>
</retroweaver>


Having spent far to long attempting to get RetroWeaver to do everything I wanted it to, I naively thought that GCJ would just work and life would be good. I couldn't have been more mistaken! When I tried to compile my main jar file, I got an obscure error from the assembler complaining about duplicate symbols. I was on my way down the rabbit hole now! I recognized the duplicate symbol name as the mangled form of a method on one of my classes. I looked it up and saw that the method in question was an implementation of a method from an implemented generic interface. Looking at the byte-code revealed that the class had two methods with the same signature and differing return types. I vaguely recalled some discussion about covariant return types and bridge methods in Java 5 and the problem started to become a little less murky. The error spit out by the assembler included a mangled name that did not have the return type encoded in it, and that was apparently causing the name collision.

Doing some googling and sifting through the GCC bug database revealed that this bug is officially listed as bug #9861 and it has been open since 2003 (apparently the original bug was noticed while using one of the old prototype generics compilers). There was some comment that a new ABI was expected to fix this problem, but a look at recent snapshots of GCC revealed no fix yet. Well, it started to look like my original idea had hit a dead-end.

For some reason, though, I didn't stop there. I downloaded the May 15, 2005 GCC 4.1 snapshot and started hacking. Now, the last time I actually tried to build GCC was back in 1998 on a Slackware Linux box. I don't remember much except searing pain and an eventual partial success. To be fair, the GCC build has gotten a lot easier since then... at least if you are using Linux. If you are trying to build GCC on Windows using mingw, just stop. I eventually settled for the procedure outlined by Ranjit Mathew on his website for first building a cross-compiler on Linux, then using it to build a native compiler for Windows/Mingw. He includes some scripts that work really well for this purpose. Of course, there is a reason why you can't find pre-built binaries for any GCC >= 4.0 on Windows/Mingw. There are two compilation problems that keep the suite from being built on mingw. The patches for these two problems are included in the main patch below. I'll get them to the GCC team eventually, after making sure that the fix already isn't in CVS head.

So I set my old Linux box to building the GCC compilers and went camping for the weekend. I just picked up the results today and started looking for actually how to fix the original problem. It seemed to me that the goal should be to modify the mangling routines so that they include the mangled return type in the name passed to the assmebler. For GCJ, this is actually really easy to comprehend and implement... just a one-liner change. The problem is, however, that with this change, the C++ and Java ABI's no longer match up. This may not seem like a problem at first until you consider that all of the native parts of libjava are written in C++ using CNI, which requires the mangled names of methods match up on the C++ and Java sides of the house. It took me quite a bit longer to paw through the C++ compiler to figure out what to do. It would seem that including the return type in the mangled function name is already done for function templates (I think... I didn't trace this down all the way), so the fix involved just adding another condition in which to include the return type. So, if the function is a method of a class and if the class is a Java class (descended from java::lang::Object), I do the same thing as is done for that special template case. There are some macros in G++ that make this a one-liner change as well... it's just a much scarier change for the uninitiated because the G++ compiler is really complicated!

In conclusion, I think it is a Good Thing to have a working path for moving from Java 5 source code to native binaries and I hope that this article can help make that a reality (at least in the interim, since the "official" support for Java 5 in GCJ seems to be a way off). I also think it is really important to have an up-to-date GCJ compiler for Windows and am going to pursue making actual builds available. For now, though, the patch will have to be enough for anyone interested in duplicating my work. Here it is. I haven't finished the clean build of the compiler and test suite yet, but have visually inspected and verified the results of this patch and it seems to be ok. YMMV. Since this is a breaking ABI change, I wouldn't anticpate seeing it anytime soon in the official GCJ releases.

UPDATE 8/17/05:
It turns out that there was one additional thing that complicated the process of changing the mangling scheme. This was causing unsatisfied link errors when calling, either directly or indirectly, a number of static methods on the Math class. Since this was clearly a problem with the Java side of things, I breathed a sigh of relief (for not having to delve back into the C++ compiler). I did an objdump on the Math.o file and looked at the disassembly for the round(float) method, because it called the Math.floor(float) method which was one of the unresolved symbols. I was surprised to find that even though the Java code for round(float) calls floor, this call was nowhere to be found in the disassembly. At this point I realized that the compiler must have some mechanism for inlining some operations. So I did a grep for "floor" in the gcc/java directory and found the following:
builtins.c: double_ftype_double, "_ZN4java4lang4Math5floorEd");
There was my unresolved symbol without the return type mangled into the name. There are 13 of these in the builtins.c file. Updating them with the proper signature makes everything work correctly.

File List:

4 Comments:

Blogger rmathew said...

Hi Laurenzo,

I have nothing substantial to add except to congratulate you on your efforts so far and to wish you the best of luck. This is something that is sorely needed in GCJ to support generics. Do try to bounce off your ideas and progress thus far on the main GCJ list (java@gcc.gnu.org).

Regards,
Ranjit.

3:03 AM  
Blogger rmathew said...

By the way, the "proper" way to check for the 1-arg v/s 2-arg form of mkdir() is via a configury macro. See $GCC_SRC_DIR/fastjar/acinclude.m4 for details.

3:30 AM  
Blogger TJ Laurenzo said...

Ranjit,
Thanks for the autoconf tip. I hadn't done an exhaustive search yet to see where the mkdir thing was done. Also, I uncovered a problem with this patch, so its not ready for primetime yet. I'll announce to the gcc list when I get it going properly.

Also, thanks for your page on building the cross compilers for mingw. It was most helpful!

8:21 AM  
Blogger rmathew said...

Before you polish off the patch, you might want to discuss your approach on the GCJ list first to see if there are any objections or helpful suggestions. In particular, to discuss the change in the mangled name of methods. If nothing, this would most likely bloat up the binaries a bit. Do you have "before" and "after" sizes for the binaries (libgcj.so)? I understand the desire to not publish something until you think it is good to go, but many a time discussing your approach can be more productive and waste less time.

10:19 PM  

Post a Comment

<< Home