Monday, April 28, 2008

The Zen of regular expressions

I'm a proud owner, and sometimes wearer, of this (scroll down to Regular Expressions Shirt). On my way home from work today I started to think about whether I really know regular expressions. Sure, I can write expressions that match fairly complex patterns... but do I really know them? I came to the conclusion that I know regexp in the same sense as most seven-year-olds (i.e., first graders) can read and write: they know letter and short words, but not much more.

The funny thing is that if I had been asked this question a few years ago I would have answered of course I know regexps without much thought. Does that mean that I know less about regular expression now than I did then? No, I know more. I now know enough to know that I don't know them.

The Zen of regular expressions:
The first step towards knowing regular expressions is to realize you do not know them.

Since I'm just starting to reach this first step, I cannot tell what the next step will be... or how many steps there are. :)

Monday, April 21, 2008

Boring stuff you have to implement: Configuration, part 2

A while ago I wrote a post where I proposed an easy way of specifying the configuration of an application. The idea is basically to define a configuration parameter by annotating a methods with information that describes the parameter. Of course, the value of the parameter is retrieved by calling the annotated method. My previous post contains an example.

To implement this easy-configuration-thingie I use dynamic proxies. If you haven't heard of dynamic proxies you have missed one of Javas powerful facilities for metaprogramming. Under the circumstances (e.g., static type-checking) I think it's pretty easy to use too.

The basic idea behind dynamic proxies is quite simple: let all calls to methods of an interface be delegated to another method. This method is called invoke and is declared in java.lang.reflect.InvocationHandler.

As you may suspect, the invoke method receives the all arguements given to the method defined in the interface (i.e., the method that delegated to invoke). It also receives an arguments that describes which method that was called; this is an java.lang.reflect.Method object, which among other things, contains the method's annotations.

Back to the original topic: configuration. How can all this annotation stuff and proxy fluff be used to define and read configuration?

Well, as the example in my earlier post shows, the interface that defines the configuration is annotated with the name and the type of the configuration parameter. Since the method's annotations are available to the invoke method, invoke can use the parameter name to look up its value (in a hashmap, or similarly) and return it. It's as simple as that!

I've made a simple implementation of this availble here (follow the instructions on Google Code if you wish to check-out the entire Eclipse project).
Note that some more development is needed before this code is useful, since it does not
read any configuration from file (only default values can be read). 

In general, I tend to think that annotations simply are additional arguments to the annotated method (although a bit harder to use than ordinary arguments). This way of looking at annotations is even more suitable when used together with dynamic proxies, I think.

You probably already have thought of this, but there are several other ways of using annotations + dynamic proxies: I've used it to parse binary messages and command line arguments (before I know about JewelCLI), and I guess you can come up with several other examples...

Thursday, April 17, 2008

Oups, sorry.

While updating my blog (I realized that I misspelled 'programmatically') I accidentily changed the address (on www.javablogs.com) to my other blog (in swedish), which does not discuss Java. My apologies to www.javablogs.com.

Wednesday, April 16, 2008

Inheritance is overrated

I like the object-oriented way of developing software -- especially if there is some functional flawor in it. In most language it is fairly easy to at least emulate a functional programming language by simply changing the way you think about the problem and the solution.

When I think in an functional way about a problem I have to solve using an object-oriented language, objects become collections of related functions (with this I mean pure function, i.e., they have no side effects). That is, I think about the program as lambdas that is passed around, rather than instances of classes. This may sound like a trivial and superficial difference but it is not.

I have found that if I solve a problem in a functional way, the components of the solution (functions, classes, etc) are less coupled than if I solve it in an object-oriented way. Why is this?

One reason it that an object A is provided with objects B..Z that A needs for doing whatever it needs to do. That is, A only relies on that it gets something that it useful for its purposes, instead on relying on a particular implementation. Another reason is that classes' methods are often pure functions, which decreases coupling because classes does not depend on the state of another class or in which order methods are called.

Enough rambling. Now to the point. The first reason basically says that a functional mind-set results in an structure of has-a relations between objects, instead of the "object-oriented way" is-a. With "is-a" I mean class-inheritance (the extends keyword in Java), which is the strongest way of coupling two classes and the most difficult to reuse, refactor, and understand -- at least for me.

On the other hand, I find interface-inheritance (the implements keyword in Java) very useful and I rely on it dayly.

If find it a bit funny that during the years I have used object-oriented languages, I have not once used inheritance... without regretting it. I'm getting better and better, of course, and during the last year I haven't used inheritance at all... and I'm not regretting it.

Maybe it just me, but I find inheritance overrated.

Saturday, April 12, 2008

Making deactivated logging 100 times faster

I think the java.util.logging is a nice logging framework: it's easy to do simple things yet it is not limited. You can easily tweak it using custom filters, formatters, and handlers. One thing I do not like with it, however, is its performance.
 
The problem with logging
I have no problem with the performance of java.lang.logging when the logging is activated. It's the performance when logging is deactivated that is an issue for me. The problem, as I see it, is that when
  logger.info("someting: " + something.toString());
is executed, the argument to info is created (by concatinating two string, which is computationally heavy) despite logging being deactivated. This means that a string will be created and then directly thrown away without being used. To make things even worse, there is even a greater performance penalty if the toString method of something is computionally heavy.

This is not problem with java.util.logging per se, but rather a problem with the Java language. Don't get me wrong -- I like Java -- but in certain areas Java is simply too limited/limiting. I see at least three way of solving the problem described above:
  1. introducing some kind of macros to the language,
  2. using aspect-oriented programming, or
  3. performing string concatination lazily.
Personly, I think that the common variant of macros (the C/C++-kind) it a Bad Thing. On the other hand, the other variant of macros (the Lisp-kind) does not fit nicely in the Java languange because those kinds of macros operate on the AST (this is perfectly ok in Lisp because Lisp does not have any syntax -- you're actually creating the AST when you write the program).

The second solution to the problem is aspect-oriented programming. To be honest, I don't know enough about that to be able to discuss it here. With the limited knowledge I do have, however, I think that it should be possible to instrument the piece of code above such that you get the following sematic:
  if (logger.logsAtLevel(Level.INFO) {
    logger.info("someting: " + something.toString());
  }

The third solution -- performing string concatination lazily -- is the solution I will discuss for the rest of this post. I'm assuming that the methods used to create the log message, e.g., the toString method, are are pure functions, i.e., has no side-effect. This is a perfectly legitimate assumption because deactivated logging should have no side effect as it is.

Lazy string concatination
Ok, so how can we make string concatination in Java lazy? In C++ we could have overloaded the operator +, but this is not possible in Java. One hacker-ish solution would be to implement new String and StringBuilder classes (which the compiler uses to implement string concatination) which performs concatination lazily, but this not trivial... (I have actually tried... (and failed)). Instead, we can implement a thin wrapper around java.util.logging.Logger with the following methods:
  MyLogger log(Object msg);
  MyLogger log(Object msg1, Object msg2);
  MyLogger log(Object msg1, Object msg2, Object msg3);
  // ... and so on.
  void info(Object msg);
  // ... and all the the other levels.
which is used like this:
  myLogger.log("Received message: ", msg, " from ").info(msgProvider);
which is the equivalent of
  logger.info("Received message: " + msg + " from " + msgProvider);
when using a java.util.logging.Logger. The log methods is simply implemented by storing the references to the objects given as arguments. The info method is implemented by calling toString on its argument and the argument given to log if the logging is actived, otherwise it does nothing.

I (kind of) have implemented such class; the difference is that instead of wrapping a java.util.logging.Logger my class uses a java.util.logging.Handler directly. The interface of this class, which I named Ln4j (pun definitely intended), is the same as MyLogger above, however.

Performance measuments
So, what kinds of performance numbers can we expect? <disclamer>I'm definitely no expert in measuring performance, but I have tried my best to create fair benchmarks.</disclamer> These are the benchmarks:
  • logging single constant string,
  • concatinating two constant string and log the result,
  • concatinating a constant string and a variable string and log the result,
  • concatinating six short (4 characters) variable strings,
  • concatinating six long (40 characters) variable strings,
  • concatinating a constant string and an int and log the result,
  • concatinating a constant string and a List<double> (of length 8) and log the result.

I ran these benchmars with and without the -server switch to the JVM and with logging activated and with logging deactivated. This is the result.

In summary: with logging activated ln4j performs a bit faster than java.util.logging. However, since ln4j is quite simple (e.g., it has no log levels) this small performance advantage would probably disapper if ln4j implemented all functionality provided by java.util.logging.Logger.
When running the benchmarks with logging deactivated, there is usually considerable performance gains (no, the post title is no exaggeration). Of course, exact numbers depend on what is logged. When logging a single constant ln4j is actually somewhat slower. However, in the benchmark that logs a list, ln4j is 600-700 times faster than java.util.logging. That optimzation for ya!

I hope this post was informative and that you have learned something from reading it. I learned a lot when experimenting with lazy string concatination; let's hope it will native in Java 8. :)

Oh, I almost forgot, here and here are the source used in the benchmarks.

Wednesday, April 9, 2008

Making MBean names first-class

Time for yet another problem that have annoyed me: names of MBeans. First of all, I find the something:key=value-notation noisy and non-intuiative in comparison to the dot-notation normally used in Java. This is, however, something I have gotten used to and have accepted.

What I have not accepted is that the MBeans are mere java.lang.String, which, to use an understatement, is not good because it forces developers to keep track on naming conventions, etc.

So, how to solve this? Easy, let's make MBean names first-class. This way, IDEs will help developers by suggesting possible keys and values in MBean name. Also, refactoring tools can be used to rename key and values, etc. Great stuff, I say!

Using some annotation tricks and reflection, I've made it possible to annotate an MBean with a special kind of annotation, which makes it is possible to do:

@something(key=value)
final class MyBeanImpl implements MyBean {
  // Code goes here.
}

which means that the name of MyBean is something:key=value. My current implementation takes an annotated class and returns the distinguised name; continuing the the example above you whould do like this to get the name of MyMbeanImpl:

final String myName =
  new DistinguishedName(MyBeanImpl.class).name();


I'm sure my implementation of this needs to be improved, but the concept is implemented by this class (see test-case for documentation), and this is how the @something looks like (well not quite, the linked code has different name, keys, etc, but I'll think you get it anyway).

Sunday, April 6, 2008

Approximating other people and dynamic scoping

I have a theory (or rather a hypothesis) that you can approximate how other people react, think, do, etc, in a given situation by asking yourself: what would I have done in the same situation.

Yeah, I know its sound pretty stupid... because we're all different, right? But for small things like "It such nice weather. I really like an icecream. I wonder if I have to stand in a long queue to buy one" it work fairly well. In this example, I probably would have to wait a while to get an icecream, because if I want an icecream other people will as well.

To get to the point, when applying this "theory" on my latest micro-project I realized that having to call the done() method to close a dynamic scope annoy a lot of people. Why? Because it annoys me. Here are a few reasons for that:
  • it's an implementation detail that is irrelevant to the service the Scope class provides;
  • in some sense it exposes implementation;
  • it's a detail that is easy to forget;
  • forgetting to call done() will not break your code in all cases, thus, doing so is a hard-to-find bug.
In summary, the Scope class sucks. Let's make suck a bit less.

Instead of explicitly pushing and poping objects to the Stack<Object> that Scope contains I'm using the call stack of the current thread. That is, when the Scope.of method is called it looks at the current call stack and finds where a new scope was created. This makes the done() method redundant and fixes the problems with the Scope class that annoyed me.

Searching through the call-stack is heavier on the CPU, but it's easier on the programmer - a trade-off I'm willing to make. There is also a bit of memory overhead because Scope now contains a map that holds object that otherwise would not exist at all or be possible to garbage collect. Again, a trade-off I'm willing to make.

To conclude, with the new version of Scope it's now possible to do
new Scope(someObject) { {
  methodCallingMethodCallingMethodCallingMethodUsingThatObject();
} };

instead of
new Scope(someObject) { {
  methodCallingMethodCallingMethodCallingMethodUsingThatObject();
} }.done();

which is a Good Thing.

Friday, April 4, 2008

Dynamic scoping as alternative to Singletons

The Singleon pattern is one of the most misused design pattern. Singletons is basically gloryfied static methods and global data, which makes the code hard to test, hard to extend/inherit/reuse, hard to multi-thread, etc.

Most of the time singletons is not necessary since a single instance of the class can be created at start up and then passed to objects that need it. This has the downside that you have to pass the used-to-be-singleton-object to a class A just because it creates a class B which needs the used-to-be-singleton-object. Yuck!

Seriously, designing software properly takes enough time as it is, I don't need more tedious details to worry about. Just make it work! Just give class B an object that provides it with the services it needs.

So, what's the alternative then (hint: title)? Dynamic scoping in Java, of course!

I think a disclamer is in order: I don't consider dynamic scoping to be the best solution to the problem described above. If it is possible redesigning the code such that its maintanability is improved without usins dynamic scoping, then that is much better. If this is not possible, however, then dynamic scoping may be the solution you seek. Now, let's discuss how to implement this scoping business.

I would like a way to express "from now on, every time I ask for a class A, give me the instance of that class that is on the top of the stack", where stack means the stack where all the variable in the dynamic scope is stored. Also, I would like to express that an object is pushed to the dynamic scoping stack like this:

// place 'object' in the dynamic scope
scope (object) {
  // code calling code calling code calling code using 'object'.
}
// 'object' is not in the dynamic scope anymore

where scope is a new keyword I made up for the sake of the discussion. Allrighty then, how can that be done i simple Java? Simple answer: it can't, Java isn't close to expressive enough to let the programmer define new control structures and keywords. We have to do like this instead:

new Scope(object) { {
  // code calling code calling code calling code using 'object'.
} }.done();

Neat! But wait a minute... how do we get an object that is placed in the dynamic scope? It easy, like this:

ClassOfObject object = Scope.of(ClassOfObject.class);

As you can see there is no way to say "I want that (points with finger) instance of ClassOfObject", you can only say "yeah, whatever, give me something I can do X, Y, and Z with". This may appear to be a limitiation, but its actually a feature: it keeps encapsulation and its overridable (it's possible to push another instance of ClassOfObject to the dynamic scoping stack, which then will be returned whenever someone (inside that scope) calls Scope.of(ClassOfObject.class)).

And this is how the Scope class is implemented, and here are some simple test-cases.

The Scope class is, of course, not thread-safe, because it would be a mistake from my side to even try accomplish thread-safety...

Thursday, April 3, 2008

Boring stuff you have to implement: Configuration, part 1

I don't need to tell you that configuration is a must-have for any application; if it doesn't have any configuration it is either extremely dumb, or it is is extremely smart (i.e., figuring out how to configure itself at runtime).

I don't consider my applications dumb enough to not need configuration, and I don't consider myself smart enough to develop applications that doesn't need configuration. So, where does that leave me? In the realm of not-so-expressive syntaxes with implicit semantics and hardcoded defaults scattered and hidden deep inside the source code, of course. Fasten your seat belts -- configuration hell, here we come!

Ok, to get to the point, this serie of posts will focus on how to abstractly express the configuration needed by a piece of source code within the source code itself (locality is the shit). Details, such as how to read configuration files, are way too boring for me to discuss on my spare time... yeah, really. How to handle a read configuration, on the other hand, that's interesting enough for me.

I guess that you, like me, often see code like this:

/**
 * The configuration of the result/output of the application.
 */
public interface ResultConfiguration {

  /**
   * Get the filename of the file to write the result to.
   * This is configured by the user before startup.
   * If not set, /dev/null is returned.
   */
  String nameOfOutputFile();
}

which is actually quite nice because it's an interface that can be stubbed in tests, and its also quite well documented in a way that is understandable for someone who has not seen the code before.

What's not so very nice is that the description of the configuration is implicitly given in comments. The same is true for the description of the class, and, even worse, for the default value which is likely to change.

Ok, so documenting the configuration is good, but its bad to use comments. How do we get the best of both worlds? We could use java.util.Properties or simething similar, and specify default values in the source; but how fun is that? Not at all. Programmers just want to have fun, as Cyndi sang back in '84. Let's use annotations!

The code above can be expressed as:

@ConfigCategory(
  description = "Controls various aspects of the output.",
  name = "result/output")
public interface ResultConfiguration {


  @ConfigParam(description = "The name of the output file.",
    settable = Settable.BeforeStartUp,
    defaultValue = "/dev/null")
  String nameOfOutputFile();
}


I'll go into details later how to actually use the annotations, right now I'll just say that it involves reflection and dynamic proxies. Or, to paraphrase Fermat, "I have a truly marvellous implementation of this interface which this post is too short to contain." :)

Tuesday, April 1, 2008

Programatically Speaking

This is not the first submission I make to Programatically Speaking; neither is it the first blog about programming and programming languages I have started...  well, kind of anyway.

I think the first thing I ever wrote that can be considered programming was back in 1989 when me and my two older brothers bought a Commodore 64. The Commodore, its Datassette, and I was on the floor of our living room and I typed some strange words I had found in the User's Manual into the computer. When I was done I typed RUN and hit RETURN.

On to the blue screen came I'VE GOT THE NUMBER. WHAT'S YOUR GUESS?. I joyfully played this number-guessing-game until I got bored, which probably took about ten mintes or so. Then I started fiddling with the program instructions, which most of the time resulted in the typical ?SYNTAX ERROR IN 50 error message.

Now, about 20 years later, I and available programming languages, have evolved and improved considerably (i.e., using agile test-first methodologies, I now develop object-oriented multi-platform number-guessing-games :))

During these years I have learn how not to develop software, and I still learn how not to develop. Luckily, I've also picked up some neat ways for how to develop software. To bad I pick up these good things after I have learn how not to do things (or is learning how not do things actually a good thing?).

By the way, to be completely honest this is actually the first thing I submit to PS (which, as it turns out, is actually the first blog about programming I have started). I hope I learn to write good post about programming a bit faster than I learn how to develop, otherwise this blog will contain numerous misstakes for the next 20 years -- and, of course, many many years after that as well.