Programmatically Speaking: DSL

Showing posts with label DSL. Show all posts

Friday, July 9, 2010

DSL: Domain Specific Logging

A few years ago, I was asked to to design a logging framework for our new Product X 2.0. Yeah, I know, "design a logging framework?". That's what I thought too. So I didn't really do much design; all I did was to say "let's use java.util.logging", and then I was done designing. Well kind of anyway...

I started to think back on my first impression of the logging framework in Product X 1.0. I remembered that I had a hard time understanding how it was (supposed) to be used from a developer's point-of-view. That is, when faced with a situation where I had to log, I could not easily answer:

to which severity category does the entry belong (FINEST, FINER, FINE, INFO, WARNING, SEVERE, ERROR, etc)
what data should be provided in the entry and how it should be formatted, and
how log entries normally is expressed textually (common phrases, and other conventions used).

In other words, it was too general (i.e., complex) with too much freedom to define your own logging levels, priorities, etc.

That's why I proposed to make logging easier, thus, the logging framework in Product X 2.0 has very few severity levels. This is implemented by simply wrapping a java.util.logging.Logger in a new class that only has the most necessary methods.

This simplification reduced the code noise in the logger interface, which made it a lot easier to choose log-level when writing a new log message in the source. This had an important implication (that we originally didn't think of): given that a certain message had to be logged, we (the developers) agreed more on which level that message should be logged. No more, "oh, this is a log message written by Robert -- he always use high log level for unimportant messages".

Of course, you could still here comments like "oh, this log message is written by John -- he always use ':' instead of '=' when printing variables". This isn't much of an issue if an somewhat intelligent being is reading the logs, e.g., a human. However, if the logs are read by a computer program, this can be a big problem -- this was the case for Product X 2.0.

This variable-printing business could be solved quite easily; we simply added a method to the logger class (MyLogger) called MyLogger variable(final String name, final Object value) that logged a variable. Example:

logger.variable("result", compResult).low("Computation completed.");

which would result in the following log message:


2008-apr-12 11:49:30 LOW: Compuation completed. [result = 37289.32];

When I did this, I began to think differently about log messages. It put me into a new mind-set -- the pattern matching mind-set. Patterns in log messages started to appear almost everywhere. Actually, most of our logs followed one of the following patterns:

Started X
Completed X
Trying to X
Failed to X
Succeeded to X

Here is an example:

try {
  myLogger.tryingTo("connect to remote host").variable(
                    "address", remoteAddress).low();
  // Code for connecting.
  myLogger.succeededTo("connect to remote host").variable(
                       "address", remoteAddress).low();
} catch (final IOException e) {
  myLogger.failedTo("connect to remote host").cause(e).medium();
}

To take this even further, it is possible to only allow certain combinations of succeededTo, tryingTo, cause, etc, by declaring them in different interfaces. Let's assume that it should only be possible to use cause if the failedTo method have been called. Here's the code for doing that:


interface MyLogger {
  // ... other logging methods.
  CauseLogger failedTo(String msg);
}

interface CauseLogger {
  LevelChooser cause(Throwable cause);
  LevelChooser cause(String message);
}

interface LevelChooser {
  void low();
  void medium();
  void high();
}

This is essentially what's called fluent interfaces. It complicates the design of the logging framework, but also makes it impossible to misuse it. Good or bad? It's a matter of taste, I think, but it's a good design strategy to have up your sleeve.

Sunday, April 18, 2010

Testing generated code

I started to write this post roughly one and a half years ago. I never finished writing it until now for whatever reason. Here's the post.

<one and a half years ago>
At work I'm currently devloping a little tool that generates Java code for decoding messages (deeply nested data structures) received over a network. To be more precise, the tool generates code that wraps a third-party library, which provides generic access to the messages' data structures. Slightly amusing is that the library consists of generated Java code.

The third-party library is... clumpsy to use, to say the least. Its basic problem is that is so generic/general that it's feels like programming in assembler when using it. Ok, I'm exaggerating a bit, but you get the point: its API is so general it fits no one.

There are several reasons for hiding a library like this:

simplifynig the API such that it fits your purpose;
removing messy boilerplate code that's hard to read, understand, and test;
less code to write, debug, and maintain;
introducing logging, range checks, improved exception handling, Javadoc, etc, is cheap;
removing the direct dependency to the third-party library; and
you get a warm fuzzy feeling knowing that 3 lines of trivial DSL code corresponds to something like 60-80 lines of messy Java code.

I'm developing the code generator using Ruby, which have had its pros and cons. The good thing with Ruby is that is easy to prototype things and ou can do fairly complex stuff really easy.

On the bad side is that I reqularly get confused about what types go where. This isn't to much of a problem from a bug point-of-view since test-cases catches the mistakes I make, but it's a bit of an annoyance if you're used to developing Java with Eclipse like I am. The Ruby IDE I'm using (Eclipse RDT) is not nearly as nice as Eclipse JDT, which is natural since an IDE for a dynamic language has less information available for refactorings, content assist, etc.

I've discovered that a functional style is really the way to go when writing code generators, especially when there are no requirements on performance. This is a nice fit, beacuse Ruby encurage a functional programming style -- or rather, it encurage me.

What keeps biting me when it come to code generators is how to test them. Sure, the intermediate steps are fairly easy to test, that is, while things are still objects and not just a chunk of characters. But how is the output tested?

Usually I test the output (the generated code) by matching it against a few regual expressions or similar, but this isn't a very good solutions as the test-cases are hard to read. Also, if an assertion fails the error message isn't very informative (e.g., <true> was not <false>). Furthermore, test-cases like these are still just an approximation how the code really should look like. For example, I've found no general and simple way of testing that all used classes (and no other classes) are imported. So, it is possible that a bug such as:
import tv.muppets.DanishChef;
wouldn't be found by my test-cases, even though the resulting code wouldn't even compile (do'h! The chef's from Sweden not Denmark!).

Ok, this could be fixed by having some test-cases that actually compiles the output by invoking a Java compiler. Not very beautiful but still possible. Even better, the generated-and-compiled code could be tested with a few hand-written test-cases. These test-cases would test the generated code, alright, but would not document the code at all (since "the code" here means the uby code that genreated the executed Java code). This problem, I'm sad to say, I think is impossible to solve with reasonable effort.

This approach is good and covers a lot of pitfalls. However, it misses one important aspect: generated documentation. The code generator I wrote generated Javadoc for all the methods and classes, but how should such documentation be tested? The generated documentation is definitely an important part of the output since it's basically the only human-readable part of it (in other words, the generated code that implements methods are really hard to read and understand).

Another approach is to simply compared the output with the output from a previous "Golden Run" that is known to be good. The problem here is, of course, to know what is 'good'. Also, when the Golden Run needs to be updated, the entire output of the (potential) new Golden Run has to be manually inspected to be sure it really is good/a Golden Run. The good thing with a "Golden Run" is that documentation in the generated code is also tested and not only the generated code.

The approach I'm using is to have a lot of simple test-cases that exercises the entire code generator (from input to output). Each of these test-cases verifies a certain aspect of the generated code, for instance:

number of methods;
number of public methods;
name of the methods;
a bunch of key code sequences exists in method implementations (e.g., foo.doIt(), new Bar(beer), baz.ooka(boom), etc);
for a certain method the code looks exactly like a certain string (e.g., return (Property) ((SomethingImpl) s).propertyOf(obj););
name of the class;
name of the implemented interfaces;
number of imports;
that used classes are imported;
key frases exist in Javadoc.

In addition to these test-cases, there is are test-cases that simply generates a lot of Java classes and compiles them. The test-cases pass if everything compiles. Finially, some of these generated classes are tested using hand-written JUnit test-cases. For me, this approach worked good and I didn't experience any problems (meaning more problem than normally) with testing the generated code. For the generated documentation, however, there were much more bugs. These bugs didn't get fixed either because it wasn't high priority to fix them. Too bad.
</one and a half years ago>

Looking back at this story now, I would have done things differently. First of all I would not use Ruby. Creating (internal) DSLs with Ruby is really easy and most of the time turn out really nice. I've tried doing similar things with Python, Java and C++, but the syntax of these languages just isn't as forgiving as Ruby's. However, the internal DSL I created for this tool never used the power of Ruby, so it just be an external DSL instead. An external DSL could just as well be implemented in some other language than Ruby.

Second, I would consider just generating helper methods instead of entire classes and packages of classes. So instead of doing:
void setValueFrom(Message m) {
this.value = new GeneratedMessageWrapper(m).fieldA().fieldB().value();
}
you would do
void setValueFrom(Message m) {
this.value = getValue(m);
}
@MessageDecoder("Message.fieldA.fieldB.value")
int getValue(Message m) {
}
where the code for the getValue method would be generated and inserted into the file when the class is built. This way, much less code would be needed to be generated, no DSL would be needed (annotations are used instead), and things like name collisions would not be such a big problem as it was (believe it or not).

At any rate, writing this tool was a really valuable experience for meany reasons that I wouldn't wish to have undone. Considering how good (meaning how much code it saved us from writing) it was a sucess. On the other hand, considering how much time we spent on getting it to work it was a huge failure. As it turns out, this tool has been replaced with a much simpler variant written entierly in Java. Although simpler, it gives the user much more value, though, so its arguable much better then that stuff I wrote 1,5 years ago. My mistake.

Thursday, February 25, 2010

Talks by Herb Sutter

I wish to recommend two very good talk by Herb Sutter; Machine Architecture: Things Your Programming Language Never Told You and Concur ad C++ Futures.

Monday, May 11, 2009

Using Office as an IDE (was: Making word documents machine-readable)

Ok, I admit it: this is a bit crazy. :) I think this is a very good quote:

Get your data structures correct first, and the rest of the program will write itself.
David Jones

especially in the context of this post.

That enough fluff, now let's get to the stuff.

Let's assume that you get some kind of specification in a simple computer readable format, e.g., comma-seperated-values or XML. There are several things that you can do when provided with such specification:

generate code, e.g., interfaces or test-cases
automatically check you code (e.g., the states of a finite state machine handles the events they should and no more)
automatic formal analysis of the specification (e.g., the finite state machine does not have any unreachable states)
make sure the user documentation contains all parts of the documentation (e.g., a chapter for each foo and bar specified).

Nice, stuff. But what if you get the specification in a less computer friendly format like MS Word format? Luckily, OpenOffice can read Word files and convert those to an easier format like HTML or plain text format (this should be possible to do via command line according to this, although I haven't tried it). OpenOffice can also do similar thing with Excel files.

Ok, now you've got the Word or Excel file converted to plain text, now it's time to write that anayzer, code generator, or whatever you need. Code on!

Actually, when thinking about this I realized that a very cool thing to do would be to trigger the convert-to-text-and-generate-code process when the Word/Excel document is saved. This way, whoever is updating the Word/Excel file will immediately know if the document is consistent (if the process analyzed its content) or if the tests passed (of the process generated test-cases). Would that be awesome?

In some sense, this is using MS Office (or OpenOffice for that matter) like an IDE. Sound like madness to me, but perhaps its useful someone. Although I have to say that writing some document in MS Office, saving it, and a few seconds later getting an indication saying "document analysis result: 1 warning: no test-case for feature 'Foo is sent to Bar'." would be really cool. Or when saving an Excel file getting an indication saying "Data in cell (5, 4) made test-case 'Foo can hold configured number of Bar:s' fail."... that would be awesome.
Why? Dunno, it just would. :)

Sunday, August 3, 2008

Fear and Loathing in Parse Vegas

Martin Fowler writes about parser fear and I have to say "guilty as charged". I've done a few DSL but all have be so simple that I could hand-write a parser using regular expressions and other string manipulations. To be honest, the resulting parser would probably be easier to understand, maintain, etc, if it was developed using a proper grammar and a parser generator. Despite (knowing) this, I kept writing those convoluted hand-written parsers.

I did the compiler class at the university and I'm intressted in most things programming langage related, e.g., compilers and parsers. Despite this I never actually done a parser (with proper grammar) by my self. Why? I had parser fear.

Fowler writes:

So why is there an unreasonable fear of writing parsers for DSLs? I think it boils down to two main reasons.
You didn't do the compiler class at university and therefore think parsers are scary.
You did do the compiler class at university and are therefore convinced that parsers are scary.

I think the last bullet explains why I never did a proper-grammar-parser by my self.

However, the last time I had to write a parser I (finally) realized that a proper-grammar-parser was a better idea than trying to hand-write something convoluted. The parser should be implemented in Ruby, so I googled (is that a verb now?) and found a generic recursive decent parser -- all I had to do was to write the grammar, which was straight forward.

There were several resons that finally made me take the step to use a proper parser:

The language was complex enough to make my old approach unsuitable
The parser was really easy to integrate with my other Ruby code
No separate step for generating the parser (i.e. short turn-around time, and less complexity because there is no code generation)

In essense: it was easy to use and test. That was the cure for my (irrational) anxiety towards parsers.

Monday, April 28, 2008

The Zen of regular expressions

I'm a proud owner, and sometimes wearer, of this (scroll down to Regular Expressions Shirt). On my way home from work today I started to think about whether I really know regular expressions. Sure, I can write expressions that match fairly complex patterns... but do I really know them? I came to the conclusion that I know regexp in the same sense as most seven-year-olds (i.e., first graders) can read and write: they know letter and short words, but not much more.

The funny thing is that if I had been asked this question a few years ago I would have answered of course I know regexps without much thought. Does that mean that I know less about regular expression now than I did then? No, I know more. I now know enough to know that I don't know them.

The Zen of regular expressions:

The first step towards knowing regular expressions is to realize you do not know them.

Since I'm just starting to reach this first step, I cannot tell what the next step will be... or how many steps there are. :)