Monday, May 4, 2009

Java-compatible syntax for C++

Since I first realized how much more productive you are in Java compared to C++, it has bugged me that the syntactic difference is so small. Take this code as an example:

class A {
public void a() { }

Is that C++ or Java? (Hint: add ":" and ";" and it becomes another language). The difference is syntactically tiny, but huge when you think of all the things you get from Eclipse when using Java.

So, this the idea: express C++ with a syntax that is compatible with Java. This Java-compatible syntax (JCS from now on), of course, requires a program to translate it to C++, but it will make it possible to use a number of tools currently only available for Java. Refactoring and code browser (which is reliable), for example.

Yeah, I hear you, "you can't express advanced-feature-X and meta-programming-feature-Y using a JCS". You're right; macros and advanced template-programming is far beyond the reach of a JCS, but that's not my point. My point is that the majority of C++ code could easily be expressed with a JCS. If 95% of your code could be refactored or browsed using Eclipse, thats much better than if 0% of your code could be refactored properly.

Actually, I think that the alot of the C++ language is an example of not keeping the corner-cases in the corner. Simple things like writing a script to list all defined functions in a source file is (in the general case) impossible because macros can possibly redefine the language... (thus all #include:ed header files has to be parsed, thus the entire build system with all its makefiles has to be known to the scrip.).

I know Bjarne Stroustrup had reasons for doing this (backwards compatability with C), but I think this was more of marketing reason than a technical reason. His new language could have been compatible with C (being able to call it, and be called from it, etc) without the new language having to be a syntactial superset of C. Anyway, back to JSC for C++.

Friends and colleagues have told me that the new CDT for Eclipse gives you refactoring, code completion, and browsing, but it works poorly from my experience. Perhaps I've failed to configure Eclipse correctly, or I'm using a crappy indexer to index my C++ code... but I can't refactor my C++ code the way I can with Java code. (Compare the number of available refactorings in Eclipse for Java and C++ if you like to have an objective measurement).

I've implemented a prototype that proves that it is possible to create a JCS that covers the most common part of C++. It works by traversing the Java AST (abstact syntax tree) and translates relevant nodes to its C++ representation. Example:

class A extends B implements C {
public int foo(@unsigned int[] a, boolean b) {
if (b) return a[1];
return 0;

translates to

class A : public B, public C {
public: virtual int foo(unsigned int* a, bool b) {
if (b) return a[1];
return 0;

There are very much that's not covered with this prototype, and it's probably riddled with bugs... but it fulfills its purpose perfectly: proving that expressing C++ using a JCS is possible. The prototype is available here.

I'd love to make a real-world worthy implementation of this idea, but I'm afraid it will take up my entire spare-time... I have other things to think about! :)


Mark said...

I've been coding in a Java like subset of C++ for years, even wrote a pretty significant compiler project this way. I found it made C++ much more productive.

Whoa! Mozart's Ghost! said...

One problem with this approach is that different methods of design are used in C++ that a Java programmer doesn't have to worry about. Some examples (some of these are just assumptions based on the code you've shown):

1) All of your methods are virtual. In C++ only the methods which are part of a derivable interface should be virtual. Some (like Scott Myers) go as far as saying public methods should never be virtual, but rather only protected and private, in order to keep the public interface clean and unchanging.
2) If your argument for doing (1) the way it is is that all classes are designed to be derivable, then you need to add some dummy virtual destructors to all generated classes or else you will start hearing complaints the first time somebody calls delete on a base pointer.
3) If you are going to embed function implementations in the class declaration, then it's probably a good idea to make something as simple as foo as inlinable as possible, i.e.: return b ? a[1] : 0; 4) Also if you're going to embed all implementation into the class declaration then you'll run into two problems: 1) your header files miss out on the benefits of forward declaration and build time will get longer and longer; 2) at some point you will probably get circular references between two generated classes that cannot possibly be resolved without hiding one or the other's implementation in a translation unit.
5) Arrays may be typesafe in Java, but they aren't in C++. Converting uint[] into uint* is probably a bad idea, especially if the original Java code is not doing any bounds checking itself. It's probably a better idea to convert those into std::vector (or boost::array for fixed-size arrays)
5) It's questionable whether or not it's necessary to use multiple inheritance for those two classes. Often it's better and cleaner just to compose. I realise it's hard to generate correct C++ code that would do something like this, but that's kind of the reason why I'm skeptical of such transformations in the first place.

scumm_fredo said...

Similar to what Vala does for C (C#-like syntax translated to C).

aberrant said...

This reminds me of GCJ. Which takes the approach of treating Java as a subset of C++. It works as a front end to GCC translating Java code into native code along the same lines as the equivalent C++ code. I havent heard much news about it though since OpenJDK was announced.

Togge said...

Whoa! Mozart's Ghost!:
I should clairify that this is a Ja

>1) All of your methods are virtual.
That true for the examples I've published. But the prototype I've implemented translates a final Java metod to a non-virtual C++ method. So it's possible to make methods non-virtual, the only difference is that with JCS makes methods virtual (non-final) by default.

>2) add some dummy virtual destructors
I know about this problem, and if I had time to develop the prootype further I would fix this bug. It's just a prototype, you know. :)

>3) If you are going to embed function implementations in the class declaration...
I cheated for simplification when I wrote the examples I posted. Actually, all method implementations are generated in a .cc-file and all declarations are generated to a .hh-file. Sorry, I simplified too much.

>...forward declaration and build time will get longer and longer...
>...circular references between two generated classes...
I've realized these problems, and right now the prototype does not handle these situations very good. Actually, it should be fairly easy to fix. Simply translate "import x.Z" to "namespace x { class Z; }" in the .hh file and translate it to '#include "x/Z.hh"' in the .cc-file.

>It's probably a better idea to convert those into std::vector (or boost::array for fixed-size arrays)
I both agree with you and don't agree with you on this one. I agree because using a std::vector would mimic the behavior of the Java array better. On the other hand (this is why I don't agree) when you type "int[]" in a JCS for C++ you ask the C++ data structure with that corresponds to that syntax.
The point I'm trying to make is that this JSC for C++ should just be another way of expressing C++ code. "int[]" should be translated into an array, because the code "int[]" represent an array in Java.

>It's questionable whether or not it's necessary to use multiple inheritance for those two classes. Often it's better and cleaner just to compose.
I think you should take a look at the corresponding Java code that say "class A extends B implements C", that is, class A implements and interface (C) and extends a class (B); there is no multiple inheritance in this Java code.
But, ok there is multiple inheritance in the corresponding C++ code, you right. But the way I represent Java interfaces in C++ with the prototype is like this: "class C { virtual foo() = 0; };".
In other words, although the C++ code has multiple inheritance, its actually only inheriting one non-pure-virtual class (class B) and one pure-virtual class (class C). Thus, there is no multiple inheritance of implemented methods.

I would share you scepticism of transformation like this it it wasn't for this reason: I'm trying to keep the transformation simple, even trivial.
It should be "screamingly obvious" how a piece of C++ expressed using JCS should be translated into C++; otherwise it should not be translated at all. For instance, the cast in the code "Child c = (Base)b;" could be translated into C++ to mean a dynamic_cast, because that what the code means in Java. But that not what that code means in C++. Instead this code is translated to this C++ code "Child* c = (Base*)b;".

As a side note:
In the example above you see that a Java reference are translated in a C++ pointer, because that's the thing in C++ that behaves as a Java reference. The syntax for Java reference and C++ pointer are different, but the behavior is the same. For my point of view this was the most easy translation for JSC to C++.

Togge said...

Ive tried GCJ, and the performance of it's generated native code was far worse than the performance of Java byte code on a HotSpot JVM. I don't know why though.

The performance of code written using this JCS for C++ should be exactly the same as code written in C++ directly.

Togge said...

Sound interesting. I'll take a look at it.

Do you mean that you compiled the Java source using a C++ compiler, or do you mean that you only used features of Java that also is available in C++?

asdacap said...

I totally agree with that. I have been coding JAVA from the start of me and now when I try to use C++, it seem's to be to complicated to use. Maybe this will work... Hopefully