Monday, November 9, 2009

State of the art C compiler optimization tricks

I found this via Lambda the Ultimate: State of the art C compiler optimization tricks. It's a good read, though I would really like to see the presentation rather than just reading the slides. Also, is this really 'state of the art'? Sure, some of the optimizations the compiler does is pretty impressive, but some are... well... not. Is common subexpression elimination really state of the art? Seems like 1970-ish to me...

Thursday, October 29, 2009

alias norris=sudo

Joke of the day. :)

Thursday, October 22, 2009

Lecture from MIT: Cache oblivious algorithm

I just watched a lecture about cache oblivious algorithm from MIT's course Introduction to Algorithms. It's really nice that things like this is free on the net. I found it via Good coders code, great reuse. Check it out!

Sunday, August 30, 2009

Code generation: C++ vs. Java

Code generation can be very useful by reducing the amount of code that is needed to be written, tested, debugged, maintained, etc. But if you develop the code generation yourself, code generation can be a nightmare. Getting the code generator right can be very hard, especialley so if the generator needs to support more than one target language (the output language).

The last code generator I wrote was for a private project. I've always wanted to design and implement my own programming language, and this spring I finally started to implement a compiler (actually it was source-to-source translator that outputted C++ code) for my language, which I call Fit. This was the first translator/code generator I've written that generates C++ code.

I'm not a big fan of C++, I think it's a too complex language that forces you to think about low-level implementation details. Thus, since my brain is limited, I have less brain-cycles to think about the high-level design of the application.

So why did I choose to translate Fit to C++ if C++ is so hard? Well, my other alternaive was Java, but since I already had written a couple of generator for Java I thought that a generator for C++ would be a fun experience.

Actually it was a really nice experience! Having access to all the low level details (e.g., pointer arithmitics and goto) and the high-level constructs (e.g., operator overloading) made it relatively easy to translate Fit to C++. When implementing the Fit-to-C++ translator I really saw the difference in power between C++ and Java.

Java, although a good language, is not a powerful language. In Java you can do certain very easy, but as soon you need to do things the language wasn't designed for you will have a bad experience. Ever tried to write bit manipulations in Java? It's not fun I tell you. In C++ at least you have the option to implement it using operator overloading and templates to make is less of a burden. I Java, that option is not available.

To take a concrete example from the Fit compiler/translator: yield is translated to (efficient) C++ using a switch and a goto. I'm not sure how to implement it in Java, but I guess you would have to do some work-around using a do-while construct.

So what am I'm saying? Is C++ a good language? Well, no it's not good if a human being writes it, but it is actually good if a code generator writes it. All the low-level stuff actually simplifies the generator.

Thursday, August 13, 2009

Everything you ever need to know about concurrent programming

I usually don't link to other blogs without adding some of my thought on the topic, but this time I make an exception. Herb Sutter has written 25+ articles in Dr. Dobb's Journal, all about different aspects of cuncurrent programming. Check them out. They are a good read even if you're not really into concurrency.

Sunday, June 7, 2009

Give me a smarter compiler! How? Here's how!

In dark ages writing code and compiling code was two separate processes. As we get better languages and faster hardware and compilers, writing code and compiling code becomes more and more integrated. For instance, when programming Java in Eclipse the source is compiled in the background every time the file is saved. Very nice because it gives fast feedback!

This means that since the code is compiled so often there usually only a small difference between the previous compiled version of the code and the current. This got me thinking: would it be possible to improve the error messages from the compiler by comparing the current code with the previously compiled code?

Let's take an example. Assume that there is an interface I with a method void m(int) that is implemented by a class C. The code compiles without errors. Then you change the signature of m in C to void m(long), which breaks the code of course. In this case the compiler could give an error message something like "Changing C.m(int) to C.m(long) makes non-abstract class C abstract, because I.m(int) is no longer implemented" instead of the error message given by today's compilers which is something like "Non-abstract class C is missing implementation for I.m(int)".

For this small example it may seem like a small improvement, but I think that if the compiler has access to the latest working code, then it can help in several way. Better error messages is only one improvement. Other improvements could be to give warnings when the semantics of the code has changed in an uncommon way. Let's take an example of this.

Let's say an instance of the java.lang.Thread class is created and its void start() method is called. During refactoring, the call to start is removed by mistake. The compiles knows that its a common pattern to instantiate a java.lang.Thread and then call start on it, thus, the compiler gives a warning indicating the (potential) problem. For instance something like: "Thread is created but not started because the call to Thread.start() has be removed between the two latest version of the code." It's almost like having a conversation with the compiler.

Another nice thing with this kind of warning is that it only appears when the code has changed. In other words, it is the difference between two consecutive versions of the code that triggers the warning, not the individual versions.

If the programmers intent is that the thread should not be started, then the warning can be ignored and the next time the code is compiled the compiler will not warn about the "missing" call to Thread.start(), because from the compiler's perspective the call is not missing anymore.

This idea can of course be useful for static code analyzers, such as FindBugs, as well.

Tuesday, June 2, 2009

The missing level of testing for software reuse

I've noticed a pattern. For software projects that start doing unit testing, unit testing is often paired with (manual) system testing only. Here's the pattern I've notices a few times:
  1. Project members think unit testing and (traditional manual) system testing is good enough.
  2. Project members want the simplicity of unit testing for system testing.
  3. Project members try to automate system testing, but realize its hard or impossible to come close to the simplicty of unit tests.
  4. Project members realize there is a missing level of testing between unit testing and system testing.
Here project members means developers, architects, testers, managers, etc.

The funny thing with this is that the missing level of testing often has been mentioned in discussions several time before in the project, but said to be impossible to implement because there is not time to do it. However, after being bitten by bugs found late in the project work on it is finally started.

This missing level of testing tests such a large part of the application that it can be run indepenently. However, hardware and OS depenent parts are stubbed, configuration is (thus) simplified, user interaction is replaced with test-case input and assertions, and so on. There are several names for this level of testing: subsystem testing, multi-component testing, module testing, etc.

There is a important difference between unit tests and system tests: unit tests live inside the code, while system tests live outside the code. When you write a unit test you write the code in parallel, you rewrite to code to make it testable, you refactor the test code and the production code at the same time. System tests, on the other hand, is often written in a completely other language (if automatic at all).

This missing level of testing I'm talking about here also lives inside the code. Those tests are also refactored when the production code are, for instance. This is important. Being inside the code means these tests are easy to run, update, and write. Being inside the code is the thing that make this level of testing work.

Essentially, these tests are unit tests in many aspects except that they test much larger chunks than a 'unit' (which is often said to be a single class).

If done well, I think there is an interesting side-effect of this level of testing: it's easier to adapt larger chunks of code to work under different environments or assumptions (this can be seen for unit-tested classes. but for smaller chunks). If unit testing encourage interfaces and dependency injection, then this level of testing encourage a similar mind-set on larger chunks of code. For instance, configuration could be done in such as way that it easy to configure the application to use some kind of stub (e.g., saying PROTOCOL_TO_USE=TCP instead of USE_TCP=TRUE, because then it's simple to add a stub protocol)

Seeing how much code is written that essentially reimplements existing application just because some small part of the application does not meet some requirement, this style of testing (if it improves reuseability, as I think it does) can be worth doing for more reasons than quality.

Is testability what we should really aim at if we wish to make our code reusable? If so, then we need to test code in chunks that we think is valuble for reuse. In other words, the levels of testing we have defines the chunks of code that can be (easily) reused.