I usually don't write about things that actually happen to me. Instead I focus on describing tools, cool ideas, or just telling a joke. In this post, though, I'll tell an ongoing story about bringing protest into the company where I work.
protest is a neat little unit testing framework for C++ that I started working on around September last year. It's been open-source under the Boost Software Licence since its very beginning. The reason I started working on it was for one simple reason that has caused many hackers to start... well, hack... scratching an itch.
The particular itch in this case is the terrible state of C++ unit testing. I've tried several testing frameworks over the year, but all make me say yuck, do they really want me to write that? or should I really do that manually? or even worse what... I can't do that?
I've already written about protest here several times, so I won't do that again. What I will do however, is describing the process of using protest at my work. It started in November when I presented protest to my colleagues. They were positive and saw it as a good candidate for replacing UnitTest++ that we're currently using.
I'm working at a company that is very protective of it's source code and information -- for good reasons. What I am worried about is that if we started using protest without explicit acceptance and knowledge from some manager(s), I might run into problems if the source is found on the internet by the "security police" since it has my name on it (my user name on Gitorious is my real name, just as here on my blog). If they found it under my name on the internet, they can (falsely) draw the conclusion that I brought the code outside of the company.
So, to make sure this wouldn't happen I contacted a manager and explained the situation. Unfortunately, he contacted a person who specializes in law that looked into the matter in more detail. The response I got was we can't accept this, CompanyName might lose the the right to use protest if this-and-that, which wasn't true at all of course.
I got a bit put off by this, but I finally got back to the issue this week. My response went along the following lines:
Regardless if you acknowledge and accept the license under which protest is published, you should understand that any open-source software can be used by any employee at CompanyName at any time. I know for a fact that we/CompanyName is using open-source licenced software, indeed, we rely on it daily.
I'm not sure if this was I good idea or not.
Showing posts with label frustration. Show all posts
Showing posts with label frustration. Show all posts
Wednesday, January 16, 2013
Wednesday, June 27, 2012
Computer science is not physics
I've read Existential Type for a while and gotten to the post Languages and Machines, which discusses models of computation. It all makes a lot of sense: there are different ways of modelling computation and they are good for different things, the only catch is that all but one models a machine. A machine with computational units, storage, and what not.
A programming language is not used for manipulating registers and mutating memory cells, it's used for expressing thoughts and ideas. Thoughts and ideas does not live in the physical world (a least that's what I've heard) so why should we rely on a language (read: C/C++/Java/etc) that inherently is bound to a (more or less) physical machine?
No one questions that the field of maths is unrelated to the physical world we live in, right? Maths would exists with or without humans discovering mathematical truths and proofs. That's because maths uses some axiomatic system (that just happens to be very useful in the real world). However, I'd hope that no one in their right mind would argue that maths is about the physical realisations of the axioms, e.g., that 1 apple + 2 apples is 3 apples. Maths is not about apples -- apples just happen to fit in the axiomatic system.
Dijkstra famously said:
No one questions that the field of computer science is unrelated to the physical world we live in, right? Computer science would exists with or without humans discovering computational truths and proofs. That's because computer science uses some computational model (that just happens to be very useful in the real world). However, I'd hope that no one in their right mind would argue that computer science is about the physical realisations of the models, e.g., the NAND gate. Computer science is not about NAND gates -- NAND gates just happen to fit in the computational model.
So why not call it computation science, or automated maths? No one would question the above paragraph if I'd written automated maths instead of computer science.
A programming language is not used for manipulating registers and mutating memory cells, it's used for expressing thoughts and ideas. Thoughts and ideas does not live in the physical world (a least that's what I've heard) so why should we rely on a language (read: C/C++/Java/etc) that inherently is bound to a (more or less) physical machine?
No one questions that the field of maths is unrelated to the physical world we live in, right? Maths would exists with or without humans discovering mathematical truths and proofs. That's because maths uses some axiomatic system (that just happens to be very useful in the real world). However, I'd hope that no one in their right mind would argue that maths is about the physical realisations of the axioms, e.g., that 1 apple + 2 apples is 3 apples. Maths is not about apples -- apples just happen to fit in the axiomatic system.
Dijkstra famously said:
Computer science is no more about computers than astronomy is about telescopes.I guess an equivalent statement about maths would be maths is no more about numbers than astronomy is about telescopes... Let me now rephrase the previous paragraph to emphasis my point.
No one questions that the field of computer science is unrelated to the physical world we live in, right? Computer science would exists with or without humans discovering computational truths and proofs. That's because computer science uses some computational model (that just happens to be very useful in the real world). However, I'd hope that no one in their right mind would argue that computer science is about the physical realisations of the models, e.g., the NAND gate. Computer science is not about NAND gates -- NAND gates just happen to fit in the computational model.
So why not call it computation science, or automated maths? No one would question the above paragraph if I'd written automated maths instead of computer science.
Etiketter:
computer science,
frustration,
maths,
naming
Saturday, April 14, 2012
Java initialization chaos
I've recently started using Java again after going from C++ to Java to C++ and now Java again. What bums me out the most with Java is how it pretty surface hides a lot of chaos. Don't get me wrong, it's good that you don't have to see the chaos (too often), but it's terrible that the chaos is there at all. Java's static initialization is particular "interesting" topic.
What is a and b in the following code:
So, what do happen to a and b in the code above? How does the JVM resolve the cyclic dependency? It does it by initializing a and b to zero. In fact, all (static) fields are first initialized to zero (or null if the fields holds a reference type) when the JVM first sees the field during class loading. Then when the entire class is loaded, it's class initializer is called where the fields are initialized to the values written in the source code, e.g., 10 in the code below:
Now, what I've been asking my self is why it's done like this? Why is the fields initialized to zero (or null) and later assigned to their proper value (10 in the case of C.c). It would be extremely helpful to have the fields being tagged as uninitialized before a proper value is assigned to them. That would not require tonnes of extra logic in the JVM, and would capture circular dependencies upon startup of the application.
The JVM is nice and all, but it's not perfect. But then again, no one is arguing it is, right?
What is a and b in the following code:
class A { public static int a = B.b; }
class B { public static int b = A.a; }
I know its a contrived example, but non-obvious variants of this code pops up every now and then and brings havok to the dependency graph.So, what do happen to a and b in the code above? How does the JVM resolve the cyclic dependency? It does it by initializing a and b to zero. In fact, all (static) fields are first initialized to zero (or null if the fields holds a reference type) when the JVM first sees the field during class loading. Then when the entire class is loaded, it's class initializer is called where the fields are initialized to the values written in the source code, e.g., 10 in the code below:
class C { public static int c = 10; }
In other words, the code for C is equivalent to:
class C {
public static int c = 0;
static { c = 10; }
}
Which you can easily see by running javap on the C's .class-file.Now, what I've been asking my self is why it's done like this? Why is the fields initialized to zero (or null) and later assigned to their proper value (10 in the case of C.c). It would be extremely helpful to have the fields being tagged as uninitialized before a proper value is assigned to them. That would not require tonnes of extra logic in the JVM, and would capture circular dependencies upon startup of the application.
The JVM is nice and all, but it's not perfect. But then again, no one is arguing it is, right?
Thursday, October 20, 2011
std::fstream -- please leave!
During the last half year, there has been several bugs related to C++ file streams, e.g., std::ifstream. Most bugs have been resolved by adding calls to ios::clear to clear the error state of the stream, or similar fixes. Other bugs were fixed by using C FILE instead.
Why, why, why, is file I/O so hard to implement in a reliable way in C++? I haven't done much work with files in C, but the things I've done work well. Same thing with python. Java's file I/O is just a joke (why do I need three classes to read a file?), but at is more reliable than C++.
Think twice before I use std::fstream again. You code might be fine on implementation of the C++ standard library, but will fail on another. Sad.
Why, why, why, is file I/O so hard to implement in a reliable way in C++? I haven't done much work with files in C, but the things I've done work well. Same thing with python. Java's file I/O is just a joke (why do I need three classes to read a file?), but at is more reliable than C++.
Think twice before I use std::fstream again. You code might be fine on implementation of the C++ standard library, but will fail on another. Sad.
Thursday, July 28, 2011
Andersson's Law
Proebsting's law states compiler advances double computing power every 18 year --- a pretty depressing fact.
Another depressing fact is that the most used language appeared to the public in 1973 -- almost 40 years ago.
The second most used language is essentially a combination of language features developed in the 70th and 80th -- 30 to 40 years ago. This language appeared in 1995 -- 16 years ago.
The third most used language is 30 years old and is based on a 40 years old language with some added features developed 40 years ago.
And the list goes on... Here is a compilation of the ages of the top 10 most used programming languages:
What bothers me though, is the "new" languages, e.g., Java, C#, or Ruby, which don't really add any kind of innovation except new syntax and more libraries to learn. Come on, there are tonnes of more interesting problems to solve... There is still no way of automatically parallelize a sequential program for instance.
There seems to be a new law lurking in programming language development... I call it Andersson's Law: Modulo syntax, innovation in new programming languages approaches zero.
And here's the "proof":
Every year there are new programming languages, however, a wast majority of those are merely reiterations of features found in previous languages (except syntax). Thus, the number of unique features per new language approaches zero for each year, that is, innovation approaches zero.
Another depressing fact is that the most used language appeared to the public in 1973 -- almost 40 years ago.
The second most used language is essentially a combination of language features developed in the 70th and 80th -- 30 to 40 years ago. This language appeared in 1995 -- 16 years ago.
The third most used language is 30 years old and is based on a 40 years old language with some added features developed 40 years ago.
And the list goes on... Here is a compilation of the ages of the top 10 most used programming languages:
- 38 (C)
- 16 (Java)
- 28 (C++)
- 16 (PHP)
- 16 (JavaScript)
- 20 (Python)
- 10 (C#)
- 24 (Perl)
- 37 (SQL)
- 16 (Ruby)
What bothers me though, is the "new" languages, e.g., Java, C#, or Ruby, which don't really add any kind of innovation except new syntax and more libraries to learn. Come on, there are tonnes of more interesting problems to solve... There is still no way of automatically parallelize a sequential program for instance.
There seems to be a new law lurking in programming language development... I call it Andersson's Law: Modulo syntax, innovation in new programming languages approaches zero.
And here's the "proof":
Every year there are new programming languages, however, a wast majority of those are merely reiterations of features found in previous languages (except syntax). Thus, the number of unique features per new language approaches zero for each year, that is, innovation approaches zero.
Etiketter:
annoy,
C,
C++,
compilers,
frustration,
java,
jokes,
laws,
optimizing
Saturday, May 21, 2011
Dumbing it down, intelligently
There are problems that seems to be solvable only by throwing more and more code at it -- we call it the brute force method. It works in most situations, but it rarely produces elegant code that survives bit rotting. The brute force method is actually surprisingly common -- how many times haven't developers solved bugs by adding yet another if-statement?
Luckily, there is another way of solving such problems that result in simple and (potentially) elegant code -- here, I call it the dumb-it-down method. The dumb-it-down method achieves simplicity by solving a similar, more general, problem.
While the brute force method result in masses of code that covers every possible corner case of a very specific problem, the dumb-it-down method results in much less, more general, and simpler code. Let's take an example to clarify what I mean.
Let's assume there is a function has_python() that checks if python is installed on a machine. This function is used for making sure that a certain Python script can be executed. How is such function implemented? It needs to check permissions, check paths, check the python version, etc. There is a lot of intelligence needed to be implemented to make sure that the script can be executed, right? This is the brute force method.
Ok, now let's rewrite the problem slightly. Remember that has_python()'s purpose in life is to make sure that a certain python script can be executed. So, we can just as well write a function that executes the python script and return whether or not it succeeded, right? As it turns out, that function is called exec and is built into the standard C library. No need to write a single line of code!
In the above example we rewrote the problem slightly but kept the intention: execute the script only if it can be executed. This is the pattern; this is the idea behind the dumb-it-down method -- look at what the code tries to do on a more general level and find the a simple (or 'dumb' if you want) way of implementing that.
I think the dump-it-down method is an umbrella term covering many design strategies which aim are to produce simple long-living code. I've previously discussed this here.
It seems like software is in a never-ending spiral of complexity: software needs to grow more complex to manage other complex software. Why is an IDE larger than the operating systems from last decade? Are today's IDEs solving more complex problems than the operating systems did? How can the plug-in APIs of said editor be more complex than the C standard library? Aren't the APIs supposed to make it easy to implement plugins?
I've mentioned it before in this blog, but it needs to be repeated. We software developers, need to start looking at our work (and the result of out work) differently. Our skills are not measured by the complexity of the programs we write, but the simplicity of them (assuming equal functionality).
I think the dump-it-down method is an umbrella term covering many design strategies which aim are to produce simple long-living code. I've previously discussed this here.
It seems like software is in a never-ending spiral of complexity: software needs to grow more complex to manage other complex software. Why is an IDE larger than the operating systems from last decade? Are today's IDEs solving more complex problems than the operating systems did? How can the plug-in APIs of said editor be more complex than the C standard library? Aren't the APIs supposed to make it easy to implement plugins?
Etiketter:
brute force,
design,
dumb-it-down,
frustration
Wednesday, January 12, 2011
When all magic goes wrong: std::vector of incomplete type
I have recently been working on an API. I put great effort into separating the implementation from the interface, which in this case means that the header file of the API strictly contains declarations. No executable code at all. This makes it easier to hide implementation details, which is something we should always aim for, especially for APIs.
In C++ there are several ways to hide implementation. One way is to forward declare types and simply use pointers and references to those types in header files. However, when you need to use a type by-value it is not possible to use a forward declared. For example:
I needed to have a class with a field of type std::vector of an incomplete type, that is:
(Well, I guess you could argue that it should compile if C++ would be designed properly, but it not so let's not go into that...)
The reason the above code doesn't compile is because the following special methods are automagically generated by the compiler:
So to fix the compilation error given above, we simply need to declare these special methods, and provide an implementation to them in a separate .cc-file, where the declaration of IncompleteType is available.
I've been fiddeling with programming for more 15 years (professionally much shorter, though) and I've run into this problem several times before but never tried to understand the cause for it. Today I did.
In C++ there are several ways to hide implementation. One way is to forward declare types and simply use pointers and references to those types in header files. However, when you need to use a type by-value it is not possible to use a forward declared. For example:
class CompleteType { };
class IncompleteType;
class HoldsTheAboveTypes {
CompleteType value0; // Ok.
IncompleteType* pointer; // Ok.
IncompleteType value1; // Compilation error!
};
In my experience, there are usually ways to avoid having types by-value that are implementation details. Usually its a matter of thinking hard about the life-time or ownership of an object. However, when I implemented the API mentioned above I ran into a problem that seemed to be unsolvable.I needed to have a class with a field of type std::vector of an incomplete type, that is:
class StdVectorOfIncompleteType {
std::vector<IncompleteType> value;
};
This code fails to compile, though, giving some error message about "invalid use of incomplete type" (just as the code above). However, IncompleteType isn't used anywhere! So it should compile, shouldn't it?(Well, I guess you could argue that it should compile if C++ would be designed properly, but it not so let's not go into that...)
The reason the above code doesn't compile is because the following special methods are automagically generated by the compiler:
- zero-argument constructor
- copy constructor
- destructor
- assignment operator
So to fix the compilation error given above, we simply need to declare these special methods, and provide an implementation to them in a separate .cc-file, where the declaration of IncompleteType is available.
I've been fiddeling with programming for more 15 years (professionally much shorter, though) and I've run into this problem several times before but never tried to understand the cause for it. Today I did.
Etiketter:
C++,
compilers,
frustration,
languages,
solutions
Sunday, January 9, 2011
Design patterns are to software development what McDonald's is to cooking
I remember reading the GoF design patterns book and thinking gosh, this is really good stuff. Now I can write program like a real master! I liked the whole idea so much that I went on reading the xUnit Patterns book, and a few more like Refactoring to Patterns.
Looking back on these books now and what I learned from them, I realize that it's not the patterns described in the books that I value the most. It's the reason for their existents; the motivation for using them. For example, the Factory pattern exists because it's often desirable to separate object construction from domain logic. Why? Because it reduces coupling, which means code are easier to enhance, reuse, extend, and test. So when you understand why a pattern exists, then you know when to use and when not to use it.
The problem is that you don't need to understand why a design pattern is needed in order to use a design pattern in you code. Code with a misused design pattern is worse than code without that pattern. As an example, here is some code taken basically directly from an application I worked with:
This is one big problem I see with design patterns. It makes it easy to write code that looks good and professional, when in fact it's horribly bad and convoluted. The Design Patterns book is the software equivalent to MacDonald's Big Book of Burgers: you need to be a good cook/developer already in order to learn anything that will actually make you burgers/software skills better. A less-than-good cook/developer will only learn how to make burgers/software that look good on the surface.
I recently read Object-Oriented Design Heuristics by Arthur J. Riel, and I must say that this book is much better than the Design Patterns book. First of all, it more than just a dictionary of patterns, it's actually a proper book you can read (without being bored to death). Second, the design rules (what the author calls "heuristics") are much more deep and applicable than design patterns. These rules are like Maxwell's equations for good software design. Understand and apply them, and your software will be well designed.
Let me illustrate with an example how I think Riel is different from GoF. Where GoF say "this is a hammer, use it on nails", Riel says "to attach to wooden objects you can either use hammer+nails or screwdriver+screws, both has pros and cons." Sure GoF is easier to read and you'll learn some fancy word you can say if you running out of buzz-words, but Riel actually makes you understand. Understanding is underrated.
But let's give the GoF book et. al, some slack. To be honest I actually did learn something useful and important from those books, and I couldn't do my daily programming work properly without that knowledge:
Oh, there is actually one more idea you should keep in your head or you will definitely screw up big time (literately).
Actually, there is one thing that I learned from all those design-patterns books (not including Riel). They taught me something important that I could only have learned from few other places: I learned that if an author tries hard enough, (s)he can write a 500 pages book consisting of some common sense and two or three good ideas repeated over and over again. The xUnit Patterns book is the prime example of this. Don't read it. Read Riel.
Looking back on these books now and what I learned from them, I realize that it's not the patterns described in the books that I value the most. It's the reason for their existents; the motivation for using them. For example, the Factory pattern exists because it's often desirable to separate object construction from domain logic. Why? Because it reduces coupling, which means code are easier to enhance, reuse, extend, and test. So when you understand why a pattern exists, then you know when to use and when not to use it.
The problem is that you don't need to understand why a design pattern is needed in order to use a design pattern in you code. Code with a misused design pattern is worse than code without that pattern. As an example, here is some code taken basically directly from an application I worked with:
Thing t = new ThingFactory().create(arg);
with ThingFactory defined as
class ThingFactory {
Thing create(int arg) { return new Thing(arg); }
}
This is a prime example of code that misuses a design pattern. Clearly, (s)he who wrote this code did not understanding why and when a Factory should be used, (s)he simply used used a Factory without thinking. Probably because (s)he just read some fancy-named design-pattern book.This is one big problem I see with design patterns. It makes it easy to write code that looks good and professional, when in fact it's horribly bad and convoluted. The Design Patterns book is the software equivalent to MacDonald's Big Book of Burgers: you need to be a good cook/developer already in order to learn anything that will actually make you burgers/software skills better. A less-than-good cook/developer will only learn how to make burgers/software that look good on the surface.
I recently read Object-Oriented Design Heuristics by Arthur J. Riel, and I must say that this book is much better than the Design Patterns book. First of all, it more than just a dictionary of patterns, it's actually a proper book you can read (without being bored to death). Second, the design rules (what the author calls "heuristics") are much more deep and applicable than design patterns. These rules are like Maxwell's equations for good software design. Understand and apply them, and your software will be well designed.
Let me illustrate with an example how I think Riel is different from GoF. Where GoF say "this is a hammer, use it on nails", Riel says "to attach to wooden objects you can either use hammer+nails or screwdriver+screws, both has pros and cons." Sure GoF is easier to read and you'll learn some fancy word you can say if you running out of buzz-words, but Riel actually makes you understand. Understanding is underrated.
But let's give the GoF book et. al, some slack. To be honest I actually did learn something useful and important from those books, and I couldn't do my daily programming work properly without that knowledge:
- favor composition over inheritance,
- separating construction logic from domain logic, and
- code towards an interface, not an implementation.
Oh, there is actually one more idea you should keep in your head or you will definitely screw up big time (literately).
Actually, there is one thing that I learned from all those design-patterns books (not including Riel). They taught me something important that I could only have learned from few other places: I learned that if an author tries hard enough, (s)he can write a 500 pages book consisting of some common sense and two or three good ideas repeated over and over again. The xUnit Patterns book is the prime example of this. Don't read it. Read Riel.
Thursday, June 17, 2010
The only design pattern is small solutions
I just saw this TED talk (which you need to see too!). It's essentially about how we prefer big complex solutions to any problem we face. Why? Because it makes us feel smart and important. As my head is filled software thoughts, I started to think how this it relates to software design. We software developers really!, really!, really!, like big solutions to small problems: "Oh, you got a program that needs to store some data? You better get yourself a dedicated database machine, a persistence layer, and define a XML schema for communication data format."
We don't need big solutions to small problems. Big solutions are easy to find. Big solutions need man-hours but no understanding. We need small solution to big problems. Small solutions are hard to find. Small solutions need insight into the actual problem we're solving. The actual problem is what's left when we remove all accidental complexity, marketing buzz-words, etc, and think clearly about the original problem.
Small solutions are orthogonal to each other; big solutions are not, they interact in non-obvious ways. Thus, big solutions creates more problems, or as the american journalist Eric Sevareid, said:
Thinking small have big impact.
We don't need big solutions to small problems. Big solutions are easy to find. Big solutions need man-hours but no understanding. We need small solution to big problems. Small solutions are hard to find. Small solutions need insight into the actual problem we're solving. The actual problem is what's left when we remove all accidental complexity, marketing buzz-words, etc, and think clearly about the original problem.
Small solutions are orthogonal to each other; big solutions are not, they interact in non-obvious ways. Thus, big solutions creates more problems, or as the american journalist Eric Sevareid, said:
The chief cause of problems is solutionswhich is more true in software development than in most other areas. Implement small solutions to problems and your future self will thank you. Implement big solutions and you fall for the sirens' calls of the marketeers, or your own wishes to do seemingly cool stuff while looking smart doing it. Do you really need a DSL? A database? Web interface? Reflection? Operator overloading? Meta-programming? Code generation? Ruby? SOAP?
Thinking small have big impact.
Etiketter:
big vs. small,
frustration,
patters,
solutions,
talks
Sunday, April 18, 2010
Testing generated code

I started to write this post roughly one and a half years ago. I never finished writing it until now for whatever reason. Here's the post.
<one and a half years ago>
At work I'm currently devloping a little tool that generates Java code for decoding messages (deeply nested data structures) received over a network. To be more precise, the tool generates code that wraps a third-party library, which provides generic access to the messages' data structures. Slightly amusing is that the library consists of generated Java code.
The third-party library is... clumpsy to use, to say the least. Its basic problem is that is so generic/general that it's feels like programming in assembler when using it. Ok, I'm exaggerating a bit, but you get the point: its API is so general it fits no one.
There are several reasons for hiding a library like this:
- simplifynig the API such that it fits your purpose;
- removing messy boilerplate code that's hard to read, understand, and test;
- less code to write, debug, and maintain;
- introducing logging, range checks, improved exception handling, Javadoc, etc, is cheap;
- removing the direct dependency to the third-party library; and
- you get a warm fuzzy feeling knowing that 3 lines of trivial DSL code corresponds to something like 60-80 lines of messy Java code.
On the bad side is that I reqularly get confused about what types go where. This isn't to much of a problem from a bug point-of-view since test-cases catches the mistakes I make, but it's a bit of an annoyance if you're used to developing Java with Eclipse like I am. The Ruby IDE I'm using (Eclipse RDT) is not nearly as nice as Eclipse JDT, which is natural since an IDE for a dynamic language has less information available for refactorings, content assist, etc.
I've discovered that a functional style is really the way to go when writing code generators, especially when there are no requirements on performance. This is a nice fit, beacuse Ruby encurage a functional programming style -- or rather, it encurage me.
What keeps biting me when it come to code generators is how to test them. Sure, the intermediate steps are fairly easy to test, that is, while things are still objects and not just a chunk of characters. But how is the output tested?
Usually I test the output (the generated code) by matching it against a few regual expressions or similar, but this isn't a very good solutions as the test-cases are hard to read. Also, if an assertion fails the error message isn't very informative (e.g., <true> was not <false>). Furthermore, test-cases like these are still just an approximation how the code really should look like. For example, I've found no general and simple way of testing that all used classes (and no other classes) are imported. So, it is possible that a bug such as:
import tv.muppets.DanishChef;wouldn't be found by my test-cases, even though the resulting code wouldn't even compile (do'h! The chef's from Sweden not Denmark!).
Ok, this could be fixed by having some test-cases that actually compiles the output by invoking a Java compiler. Not very beautiful but still possible. Even better, the generated-and-compiled code could be tested with a few hand-written test-cases. These test-cases would test the generated code, alright, but would not document the code at all (since "the code" here means the uby code that genreated the executed Java code). This problem, I'm sad to say, I think is impossible to solve with reasonable effort.
This approach is good and covers a lot of pitfalls. However, it misses one important aspect: generated documentation. The code generator I wrote generated Javadoc for all the methods and classes, but how should such documentation be tested? The generated documentation is definitely an important part of the output since it's basically the only human-readable part of it (in other words, the generated code that implements methods are really hard to read and understand).
Another approach is to simply compared the output with the output from a previous "Golden Run" that is known to be good. The problem here is, of course, to know what is 'good'. Also, when the Golden Run needs to be updated, the entire output of the (potential) new Golden Run has to be manually inspected to be sure it really is good/a Golden Run. The good thing with a "Golden Run" is that documentation in the generated code is also tested and not only the generated code.
The approach I'm using is to have a lot of simple test-cases that exercises the entire code generator (from input to output). Each of these test-cases verifies a certain aspect of the generated code, for instance:
- number of methods;
- number of public methods;
- name of the methods;
- a bunch of key code sequences exists in method implementations (e.g., foo.doIt(), new Bar(beer), baz.ooka(boom), etc);
- for a certain method the code looks exactly like a certain string (e.g., return (Property) ((SomethingImpl) s).propertyOf(obj););
- name of the class;
- name of the implemented interfaces;
- number of imports;
- that used classes are imported;
- key frases exist in Javadoc.
</one and a half years ago>
Looking back at this story now, I would have done things differently. First of all I would not use Ruby. Creating (internal) DSLs with Ruby is really easy and most of the time turn out really nice. I've tried doing similar things with Python, Java and C++, but the syntax of these languages just isn't as forgiving as Ruby's. However, the internal DSL I created for this tool never used the power of Ruby, so it just be an external DSL instead. An external DSL could just as well be implemented in some other language than Ruby.
Second, I would consider just generating helper methods instead of entire classes and packages of classes. So instead of doing:
void setValueFrom(Message m) {
this.value = new GeneratedMessageWrapper(m).fieldA().fieldB().value();
}
you would do
void setValueFrom(Message m) {
this.value = getValue(m);
}
@MessageDecoder("Message.fieldA.fieldB.value")
int getValue(Message m) {
}
where the code for the getValue method would be generated and inserted into the file when the class is built. This way, much less code would be needed to be generated, no DSL would be needed (annotations are used instead), and things like name collisions would not be such a big problem as it was (believe it or not).
At any rate, writing this tool was a really valuable experience for meany reasons that I wouldn't wish to have undone. Considering how good (meaning how much code it saved us from writing) it was a sucess. On the other hand, considering how much time we spent on getting it to work it was a huge failure. As it turns out, this tool has been replaced with a much simpler variant written entierly in Java. Although simpler, it gives the user much more value, though, so its arguable much better then that stuff I wrote 1,5 years ago. My mistake.
Etiketter:
code generation,
DSL,
frustration,
functional programming,
java,
languages,
learning,
memories,
mistakes,
programming,
ruby,
testing
Wednesday, January 6, 2010
Monday, May 4, 2009
Java-compatible syntax for C++
Since I first realized how much more productive you are in Java compared to C++, it has bugged me that the syntactic difference is so small. Take this code as an example:
Is that C++ or Java? (Hint: add ":" and ";" and it becomes another language). The difference is syntactically tiny, but huge when you think of all the things you get from Eclipse when using Java.
So, this the idea: express C++ with a syntax that is compatible with Java. This Java-compatible syntax (JCS from now on), of course, requires a program to translate it to C++, but it will make it possible to use a number of tools currently only available for Java. Refactoring and code browser (which is reliable), for example.
Yeah, I hear you, "you can't express advanced-feature-X and meta-programming-feature-Y using a JCS". You're right; macros and advanced template-programming is far beyond the reach of a JCS, but that's not my point. My point is that the majority of C++ code could easily be expressed with a JCS. If 95% of your code could be refactored or browsed using Eclipse, thats much better than if 0% of your code could be refactored properly.
Actually, I think that the alot of the C++ language is an example of not keeping the corner-cases in the corner. Simple things like writing a script to list all defined functions in a source file is (in the general case) impossible because macros can possibly redefine the language... (thus all #include:ed header files has to be parsed, thus the entire build system with all its makefiles has to be known to the scrip.).
I know Bjarne Stroustrup had reasons for doing this (backwards compatability with C), but I think this was more of marketing reason than a technical reason. His new language could have been compatible with C (being able to call it, and be called from it, etc) without the new language having to be a syntactial superset of C. Anyway, back to JSC for C++.
Friends and colleagues have told me that the new CDT for Eclipse gives you refactoring, code completion, and browsing, but it works poorly from my experience. Perhaps I've failed to configure Eclipse correctly, or I'm using a crappy indexer to index my C++ code... but I can't refactor my C++ code the way I can with Java code. (Compare the number of available refactorings in Eclipse for Java and C++ if you like to have an objective measurement).
I've implemented a prototype that proves that it is possible to create a JCS that covers the most common part of C++. It works by traversing the Java AST (abstact syntax tree) and translates relevant nodes to its C++ representation. Example:
translates to
There are very much that's not covered with this prototype, and it's probably riddled with bugs... but it fulfills its purpose perfectly: proving that expressing C++ using a JCS is possible. The prototype is available here.
I'd love to make a real-world worthy implementation of this idea, but I'm afraid it will take up my entire spare-time... I have other things to think about! :)
class A {
public void a() { }
}
Is that C++ or Java? (Hint: add ":" and ";" and it becomes another language). The difference is syntactically tiny, but huge when you think of all the things you get from Eclipse when using Java.
So, this the idea: express C++ with a syntax that is compatible with Java. This Java-compatible syntax (JCS from now on), of course, requires a program to translate it to C++, but it will make it possible to use a number of tools currently only available for Java. Refactoring and code browser (which is reliable), for example.
Yeah, I hear you, "you can't express advanced-feature-X and meta-programming-feature-Y using a JCS". You're right; macros and advanced template-programming is far beyond the reach of a JCS, but that's not my point. My point is that the majority of C++ code could easily be expressed with a JCS. If 95% of your code could be refactored or browsed using Eclipse, thats much better than if 0% of your code could be refactored properly.
Actually, I think that the alot of the C++ language is an example of not keeping the corner-cases in the corner. Simple things like writing a script to list all defined functions in a source file is (in the general case) impossible because macros can possibly redefine the language... (thus all #include:ed header files has to be parsed, thus the entire build system with all its makefiles has to be known to the scrip.).
I know Bjarne Stroustrup had reasons for doing this (backwards compatability with C), but I think this was more of marketing reason than a technical reason. His new language could have been compatible with C (being able to call it, and be called from it, etc) without the new language having to be a syntactial superset of C. Anyway, back to JSC for C++.
Friends and colleagues have told me that the new CDT for Eclipse gives you refactoring, code completion, and browsing, but it works poorly from my experience. Perhaps I've failed to configure Eclipse correctly, or I'm using a crappy indexer to index my C++ code... but I can't refactor my C++ code the way I can with Java code. (Compare the number of available refactorings in Eclipse for Java and C++ if you like to have an objective measurement).
I've implemented a prototype that proves that it is possible to create a JCS that covers the most common part of C++. It works by traversing the Java AST (abstact syntax tree) and translates relevant nodes to its C++ representation. Example:
class A extends B implements C {
public int foo(@unsigned int[] a, boolean b) {
if (b) return a[1];
return 0;
}
}
translates to
class A : public B, public C {
public: virtual int foo(unsigned int* a, bool b) {
if (b) return a[1];
return 0;
}
};
There are very much that's not covered with this prototype, and it's probably riddled with bugs... but it fulfills its purpose perfectly: proving that expressing C++ using a JCS is possible. The prototype is available here.
I'd love to make a real-world worthy implementation of this idea, but I'm afraid it will take up my entire spare-time... I have other things to think about! :)
Etiketter:
C++,
code generation,
eclipse,
frustration,
java,
languages,
parser,
programming,
project: Cava,
prototype,
tools
Friday, March 20, 2009
It's 2009 and you can't read a forwarded mail
The other day, a college sent me a mail that I simply forwarded from Outlook to my Gmail. Today, when I finally had time to read it I opened it in Gmail. What do you think the mail contains? Nothing, except an attached file called smime.p7m. This file contains the encrypted mail, apparently, so I can't read it.
Oh, please! Come on! Why is a simple thing like this so hard?! Really... seriously, I'm failing to forwardning an e-mail...? Are we really making progress?
Yeah, I know that I should have forwarded it without encryption. But why is this something I need to know about? The mail client should tell me that the receiver won't be able to read the mail... It's freaking 2009! Not 1979!
Who knows... in 3009, perhaps we humans have evolved enough to have figured out and understand this whole send plain stupid text to another person-thingie. It's apparently too advanced to grasp for the current generation of humans.
(I have high hopes for the next-gen humans, though... No, really. I do!)
Oh, please! Come on! Why is a simple thing like this so hard?! Really... seriously, I'm failing to forwardning an e-mail...? Are we really making progress?
Yeah, I know that I should have forwarded it without encryption. But why is this something I need to know about? The mail client should tell me that the receiver won't be able to read the mail... It's freaking 2009! Not 1979!
Who knows... in 3009, perhaps we humans have evolved enough to have figured out and understand this whole send plain stupid text to another person-thingie. It's apparently too advanced to grasp for the current generation of humans.
(I have high hopes for the next-gen humans, though... No, really. I do!)
Subscribe to:
Posts (Atom)