Programmatically Speaking: 2009

Monday, November 9, 2009

State of the art C compiler optimization tricks

I found this via Lambda the Ultimate: State of the art C compiler optimization tricks. It's a good read, though I would really like to see the presentation rather than just reading the slides. Also, is this really 'state of the art'? Sure, some of the optimizations the compiler does is pretty impressive, but some are... well... not. Is common subexpression elimination really state of the art? Seems like 1970-ish to me...

Thursday, October 29, 2009

alias norris=sudo

Joke of the day. :)

Thursday, October 22, 2009

Lecture from MIT: Cache oblivious algorithm

I just watched a lecture about cache oblivious algorithm from MIT's course Introduction to Algorithms. It's really nice that things like this is free on the net. I found it via Good coders code, great reuse. Check it out!

Sunday, August 30, 2009

Code generation: C++ vs. Java

Code generation can be very useful by reducing the amount of code that is needed to be written, tested, debugged, maintained, etc. But if you develop the code generation yourself, code generation can be a nightmare. Getting the code generator right can be very hard, especialley so if the generator needs to support more than one target language (the output language).

The last code generator I wrote was for a private project. I've always wanted to design and implement my own programming language, and this spring I finally started to implement a compiler (actually it was source-to-source translator that outputted C++ code) for my language, which I call Fit. This was the first translator/code generator I've written that generates C++ code.

I'm not a big fan of C++, I think it's a too complex language that forces you to think about low-level implementation details. Thus, since my brain is limited, I have less brain-cycles to think about the high-level design of the application.

So why did I choose to translate Fit to C++ if C++ is so hard? Well, my other alternaive was Java, but since I already had written a couple of generator for Java I thought that a generator for C++ would be a fun experience.

Actually it was a really nice experience! Having access to all the low level details (e.g., pointer arithmitics and goto) and the high-level constructs (e.g., operator overloading) made it relatively easy to translate Fit to C++. When implementing the Fit-to-C++ translator I really saw the difference in power between C++ and Java.

Java, although a good language, is not a powerful language. In Java you can do certain very easy, but as soon you need to do things the language wasn't designed for you will have a bad experience. Ever tried to write bit manipulations in Java? It's not fun I tell you. In C++ at least you have the option to implement it using operator overloading and templates to make is less of a burden. I Java, that option is not available.

To take a concrete example from the Fit compiler/translator: yield is translated to (efficient) C++ using a switch and a goto. I'm not sure how to implement it in Java, but I guess you would have to do some work-around using a do-while construct.

So what am I'm saying? Is C++ a good language? Well, no it's not good if a human being writes it, but it is actually good if a code generator writes it. All the low-level stuff actually simplifies the generator.

Thursday, August 13, 2009

Everything you ever need to know about concurrent programming

I usually don't link to other blogs without adding some of my thought on the topic, but this time I make an exception. Herb Sutter has written 25+ articles in Dr. Dobb's Journal, all about different aspects of cuncurrent programming. Check them out. They are a good read even if you're not really into concurrency.

Sunday, June 7, 2009

Give me a smarter compiler! How? Here's how!

In dark ages writing code and compiling code was two separate processes. As we get better languages and faster hardware and compilers, writing code and compiling code becomes more and more integrated. For instance, when programming Java in Eclipse the source is compiled in the background every time the file is saved. Very nice because it gives fast feedback!

This means that since the code is compiled so often there usually only a small difference between the previous compiled version of the code and the current. This got me thinking: would it be possible to improve the error messages from the compiler by comparing the current code with the previously compiled code?

Let's take an example. Assume that there is an interface I with a method void m(int) that is implemented by a class C. The code compiles without errors. Then you change the signature of m in C to void m(long), which breaks the code of course. In this case the compiler could give an error message something like "Changing C.m(int) to C.m(long) makes non-abstract class C abstract, because I.m(int) is no longer implemented" instead of the error message given by today's compilers which is something like "Non-abstract class C is missing implementation for I.m(int)".

For this small example it may seem like a small improvement, but I think that if the compiler has access to the latest working code, then it can help in several way. Better error messages is only one improvement. Other improvements could be to give warnings when the semantics of the code has changed in an uncommon way. Let's take an example of this.

Let's say an instance of the java.lang.Thread class is created and its void start() method is called. During refactoring, the call to start is removed by mistake. The compiles knows that its a common pattern to instantiate a java.lang.Thread and then call start on it, thus, the compiler gives a warning indicating the (potential) problem. For instance something like: "Thread is created but not started because the call to Thread.start() has be removed between the two latest version of the code." It's almost like having a conversation with the compiler.

Another nice thing with this kind of warning is that it only appears when the code has changed. In other words, it is the difference between two consecutive versions of the code that triggers the warning, not the individual versions.

If the programmers intent is that the thread should not be started, then the warning can be ignored and the next time the code is compiled the compiler will not warn about the "missing" call to Thread.start(), because from the compiler's perspective the call is not missing anymore.

This idea can of course be useful for static code analyzers, such as FindBugs, as well.

Tuesday, June 2, 2009

The missing level of testing for software reuse

I've noticed a pattern. For software projects that start doing unit testing, unit testing is often paired with (manual) system testing only. Here's the pattern I've notices a few times:

Project members think unit testing and (traditional manual) system testing is good enough.
Project members want the simplicity of unit testing for system testing.
Project members try to automate system testing, but realize its hard or impossible to come close to the simplicty of unit tests.
Project members realize there is a missing level of testing between unit testing and system testing.

Here project members means developers, architects, testers, managers, etc.

The funny thing with this is that the missing level of testing often has been mentioned in discussions several time before in the project, but said to be impossible to implement because there is not time to do it. However, after being bitten by bugs found late in the project work on it is finally started.

This missing level of testing tests such a large part of the application that it can be run indepenently. However, hardware and OS depenent parts are stubbed, configuration is (thus) simplified, user interaction is replaced with test-case input and assertions, and so on. There are several names for this level of testing: subsystem testing, multi-component testing, module testing, etc.

There is a important difference between unit tests and system tests: unit tests live inside the code, while system tests live outside the code. When you write a unit test you write the code in parallel, you rewrite to code to make it testable, you refactor the test code and the production code at the same time. System tests, on the other hand, is often written in a completely other language (if automatic at all).

This missing level of testing I'm talking about here also lives inside the code. Those tests are also refactored when the production code are, for instance. This is important. Being inside the code means these tests are easy to run, update, and write. Being inside the code is the thing that make this level of testing work.

Essentially, these tests are unit tests in many aspects except that they test much larger chunks than a 'unit' (which is often said to be a single class).

If done well, I think there is an interesting side-effect of this level of testing: it's easier to adapt larger chunks of code to work under different environments or assumptions (this can be seen for unit-tested classes. but for smaller chunks). If unit testing encourage interfaces and dependency injection, then this level of testing encourage a similar mind-set on larger chunks of code. For instance, configuration could be done in such as way that it easy to configure the application to use some kind of stub (e.g., saying PROTOCOL_TO_USE=TCP instead of USE_TCP=TRUE, because then it's simple to add a stub protocol)

Seeing how much code is written that essentially reimplements existing application just because some small part of the application does not meet some requirement, this style of testing (if it improves reuseability, as I think it does) can be worth doing for more reasons than quality.

Is testability what we should really aim at if we wish to make our code reusable? If so, then we need to test code in chunks that we think is valuble for reuse. In other words, the levels of testing we have defines the chunks of code that can be (easily) reused.

Monday, May 11, 2009

Using Office as an IDE (was: Making word documents machine-readable)

Ok, I admit it: this is a bit crazy. :) I think this is a very good quote:

Get your data structures correct first, and the rest of the program will write itself.
David Jones

especially in the context of this post.

That enough fluff, now let's get to the stuff.

Let's assume that you get some kind of specification in a simple computer readable format, e.g., comma-seperated-values or XML. There are several things that you can do when provided with such specification:

generate code, e.g., interfaces or test-cases
automatically check you code (e.g., the states of a finite state machine handles the events they should and no more)
automatic formal analysis of the specification (e.g., the finite state machine does not have any unreachable states)
make sure the user documentation contains all parts of the documentation (e.g., a chapter for each foo and bar specified).

Nice, stuff. But what if you get the specification in a less computer friendly format like MS Word format? Luckily, OpenOffice can read Word files and convert those to an easier format like HTML or plain text format (this should be possible to do via command line according to this, although I haven't tried it). OpenOffice can also do similar thing with Excel files.

Ok, now you've got the Word or Excel file converted to plain text, now it's time to write that anayzer, code generator, or whatever you need. Code on!

Actually, when thinking about this I realized that a very cool thing to do would be to trigger the convert-to-text-and-generate-code process when the Word/Excel document is saved. This way, whoever is updating the Word/Excel file will immediately know if the document is consistent (if the process analyzed its content) or if the tests passed (of the process generated test-cases). Would that be awesome?

In some sense, this is using MS Office (or OpenOffice for that matter) like an IDE. Sound like madness to me, but perhaps its useful someone. Although I have to say that writing some document in MS Office, saving it, and a few seconds later getting an indication saying "document analysis result: 1 warning: no test-case for feature 'Foo is sent to Bar'." would be really cool. Or when saving an Excel file getting an indication saying "Data in cell (5, 4) made test-case 'Foo can hold configured number of Bar:s' fail."... that would be awesome.
Why? Dunno, it just would. :)

Monday, May 4, 2009

Java-compatible syntax for C++

Since I first realized how much more productive you are in Java compared to C++, it has bugged me that the syntactic difference is so small. Take this code as an example:


class A {
public void a() { }
}

Is that C++ or Java? (Hint: add ":" and ";" and it becomes another language). The difference is syntactically tiny, but huge when you think of all the things you get from Eclipse when using Java.

So, this the idea: express C++ with a syntax that is compatible with Java. This Java-compatible syntax (JCS from now on), of course, requires a program to translate it to C++, but it will make it possible to use a number of tools currently only available for Java. Refactoring and code browser (which is reliable), for example.

Yeah, I hear you, "you can't express advanced-feature-X and meta-programming-feature-Y using a JCS". You're right; macros and advanced template-programming is far beyond the reach of a JCS, but that's not my point. My point is that the majority of C++ code could easily be expressed with a JCS. If 95% of your code could be refactored or browsed using Eclipse, thats much better than if 0% of your code could be refactored properly.

Actually, I think that the alot of the C++ language is an example of not keeping the corner-cases in the corner. Simple things like writing a script to list all defined functions in a source file is (in the general case) impossible because macros can possibly redefine the language... (thus all #include:ed header files has to be parsed, thus the entire build system with all its makefiles has to be known to the scrip.).

I know Bjarne Stroustrup had reasons for doing this (backwards compatability with C), but I think this was more of marketing reason than a technical reason. His new language could have been compatible with C (being able to call it, and be called from it, etc) without the new language having to be a syntactial superset of C. Anyway, back to JSC for C++.

Friends and colleagues have told me that the new CDT for Eclipse gives you refactoring, code completion, and browsing, but it works poorly from my experience. Perhaps I've failed to configure Eclipse correctly, or I'm using a crappy indexer to index my C++ code... but I can't refactor my C++ code the way I can with Java code. (Compare the number of available refactorings in Eclipse for Java and C++ if you like to have an objective measurement).

I've implemented a prototype that proves that it is possible to create a JCS that covers the most common part of C++. It works by traversing the Java AST (abstact syntax tree) and translates relevant nodes to its C++ representation. Example:


class A extends B implements C {
public int foo(@unsigned int[] a, boolean b) {
  if (b) return a[1];
  return 0;
}
}

translates to


class A : public B, public C {
public: virtual int foo(unsigned int* a, bool b) {
  if (b) return a[1];
  return 0;
}
};

There are very much that's not covered with this prototype, and it's probably riddled with bugs... but it fulfills its purpose perfectly: proving that expressing C++ using a JCS is possible. The prototype is available here.

I'd love to make a real-world worthy implementation of this idea, but I'm afraid it will take up my entire spare-time... I have other things to think about! :)

Wednesday, April 22, 2009

Unit testing makes manually managed memory simple

I recently started working on a project that does test-first development in C++. I have mostly done TDD in Java, but I've done a lot of C++ before that so neither the language nor methodology is new to me. However, after two-three years of unit testing in Java and a few scripting languages, I have (luckily) learned a few things. So this time around I hope to avoid one of the problems I had earlier with C++: the problem with ownership of newed objects, that is, which object owns another dynamically allocated object?.

When I was rewriting some classes to be testable I, as you usually do, introduced interfaces, used factories, injected dependencies, etc. In one case I rewrote the class under test (CUT) to use a factory instead of calling new directly. This made it easy to test that the CUT allocated an object correctly. But how to test that the CUT deletes that allocated object?

Well, it was quite simple: simply add a method to the factory called destroy that takes one argument which is the object to be destroyed. The method destroy is used to tell the factory I'm done with this object, do what ever you like with it... delete it it you want to. The destroy method corresponds to the make method, which of course allocates an object.

This is probably not some great new discovery I've made; most of you who have done test-first development in languages with manual memory management have probably already done this kind of thing. This was a new thought for me, though.

I also realized how clear the ownership of the object (created by the factory) had become. It was obvious from reading the production code and/or test-case that it was the factory that owned the object; all it did was to lend the object to some other class until destroy was called. This was an insight for me.

The fact that the factory owns the objects it creates means that it is trivial to replace the memory allocation scheme used by the factory. Only the factory is needed to be changed, e.g., if there is a need for pooling objects. Awesomeness.

I guess there are cases when this approach is not possible to use, but in cases where it is I think its a good pattern to use.

Wednesday, April 8, 2009

Movable proxy: a design pattern for hiding dirty secrets

The Proxy design pattern is a useful pattern. For my style of programming it's quite common, although I seldom actually name the proxy-class SomethingProxy. Instead I try to come up with a name for its actual role or responsibility. An example of this kind of proxy could be the services provided by a package in some Java code. All classes of the package is package private (default visibility) except for one, which is the proxy towards the package (there are of course a bunch of public interfaces).

Traditionally, though, a proxy is an object used to accessing something complex in a easier way. For instance a proxy for sending messages to, or retrieving the state of, a remote process. At work, where I'm working with a distributed Java application, we have a bunch of such proxies for accessing the various distributed subsystems/processes. All these proxies have some protocol for talking to the remote subsystem, e.g., homebrew protocol N, standard protocol M, etc.

This is all good, except for the fact that both the client subsystem and the server subsystem need to share a dirty secret: which protocol that is used to communicate. A sane design will hide this dirty secret inside some class (e.g., a proxy!), such that the bulk of the code doesn't need to know the secret. But still, some part of the client subsystem must know it, otherwise it cannot establish a connection to the server.

Or does it?

Wouldn't it be great if it was possible for the client to ask the server for the proxy instance? That is instead of doing:


final Proxy forTalkingToServer = new ServerProxy();

the client does:


final Proxy forTalkingToServer = server.proxyForTalkingToServer();

Ok, that sound like a nice idea. But wait, how is the proxyForTalkingToServer() method implemented? Doesn't the implementation of that method need to communicate with the server? Well, yes. But this bootstrap problem is easily solved by having a standardized protocol for sending java objects between different Java systems, e.g., RMI.

The sequence is something like this:


      Client                      Server
      ------                      ------
        |    Proxy request [RMI]    |
        |-------------------------->|
        |                           |
        |    Proxy confirm [RMI]    |
        |<--------------------------|
        |                           |
+------------------+                |
| client has proxy |                |
+------------------+                |
        |                           |
        |    Whatever request [?]   |
        |-------------------------->|
        |                           |
        |    Whatever confirm [?]   |
        |<--------------------------|

Where Proxy request is what happens when the proxyForTalkingToServer() method is called, and Proxy confirm is the methods' return value, that is, the proxy the client should use for talking with the server. The Whatever request and Whatever confirm messages are messages sent using some protocol (symbolized by [?]) that the server decided to use.

In addition to having the client completely unaware of the actual protocol used for communicating with the server, the proxy can be pass around (using, e.g., RMI) the entire distributed system and it will still be possible to access the server using the proxy. It is also possible to update the protocol without updating the clients, or to let the proxy instance implement multiple interfaces (one new-and-improved and one legacy for backwards compatibility). It is also great for testability/debugging: you can easily replace a troublesome protocol with a simpler variant.

Neat.

Wednesday, April 1, 2009

Happy new year!

One year ago I wrote my first post for this blog! I've been writing about Java and testing, published a few small programming project, and screamed in frustration.

There have also been some (slightly) philosophical posts, seemingly trivial ideas, my old sweatheart Lisp, and my views on OO. I have also faced one of my many fears and helped you making your faviourite IDE a bit better.

At late, I've even had some insights and publishing one of my coolest ideas (with a crappy implementation) :)

Monday, March 23, 2009

Code generation using reflection

A couple of month ago, in late November I think, I got an idea when I was riding my bike home from work: using code generation to optimize code that normally relies on reflection. (To understand this post, you should at least now the basics basics of dynamic proxies. Check this out if you don't.)

A common pattern for me is to have an interface and generating the class(es) that implements it at runtime using a Dynamic Proxy. The behavior of the class defined by code that is parameterize by annotations on the interface. Example:


interface CommandLineArguments {

@ArgumentName({"c", "config"})

String configFileName();

@Argument({"x", "max")

String maxValue();

@Argument({"n", "min"})

String minValue();
}

The annotations on the methods in the interface defines what they should do simply by giving the name of the corresponding command line switch. Extremely terse, readable, and flexible code. If you ask me, this (annotated interfaces + dynamic proxies) is one of the best thing in the Java language.

The code that actually gets executed when the methods in the interface is called often rely heavily on reflection. Reflection, as everyone know who have ever used it, is very powerful but can also be very slow. Is there some way to make it a bit faster? Yes, there certainly is: code generation.

Lets take a simple but realistic example: for any interface, create a wrapper that print the name of the method and delegates to some other class that implements the same interface. The code for doing this look something like this:


import java.lang.reflect.InvocationHandler;

class InvocationPrinter implements InvocationHandler {

private Object delegateTo;

InvocationPrinter(Object delegateTo) {

this.delegateTo = delegateTo;

Object invoke(Object proxy, Method method, Object[] args) throws Throwable {

System.out.println(method.getName() + " called.");

return method.invoke(delegateTo, args);

}
}

This is a general but, unfortunately, slow. It is trivial to speed up, but this requires us to hand-write every method for every interface we wish to use this way. Another way, which gives the same speed-up, is to generate the same code dynamically. Literally the same code (except for indentation and such). Then using Javassist, this code can be compiled to a class at run-time, resulting in bytecode with the same performance as your hand-written code.

I have prototyped this approach for generating code by providing wrapper classes that looks and behaves just like java.lang.reflect.Method, java.lang.reflect.Constructor, etc, except that they also stores how they were used. For example, the class (called BIMethod) that corresponds to java.lang.reflect.Method stores the arguments used when invoking it and the returned object. By doing this you can write normal Java code that uses reflection (via these provided wrappers), but also generate (at run-time) the Java code that implements the same functionality. In fact, since the wrappers keep track of returned values, and created object (via Constructor.newInstance) it is possible to do fairly complex stuff like:


void doSomeReflectionStuff(Object[] args) {

DummyInterface obj = null;

 for (final BIConstructor c : factory.constructors(Dummy.class)) {

try {

obj = (DummyInterface) c.newInstance(args[1], args[2]);

break;

} catch (final Exception e) {

final BIMethod someMethod = getSomeMethod();

return someMethod.invoke(obj, 0, args[0]);
}

That is, you reflectively invoke an object created using reflection. In addition, a constructor matching the arguments (the Object[] args) is found automatically by checking if the constructor threw an exception of not. The generated Java code for this will look something like this:


Dummy variable0 = new Dummy(arg1, arg2);
return variable0.theChosenMethod(0, arg0);

If you wish to take a peek at the prototype, just go ahead. Be aware, though, that is is probably the least tested stuff I'll written in quite a while... there a probably heaps of bugs... :) Despite this, I think it's worth taking a look at if you need to get more performance out of your reflection-based code. Please contact me if you have any questions or ideas.

Friday, March 20, 2009

It's 2009 and you can't read a forwarded mail

The other day, a college sent me a mail that I simply forwarded from Outlook to my Gmail. Today, when I finally had time to read it I opened it in Gmail. What do you think the mail contains? Nothing, except an attached file called smime.p7m. This file contains the encrypted mail, apparently, so I can't read it.

Oh, please! Come on! Why is a simple thing like this so hard?! Really... seriously, I'm failing to forwardning an e-mail...? Are we really making progress?

Yeah, I know that I should have forwarded it without encryption. But why is this something I need to know about? The mail client should tell me that the receiver won't be able to read the mail... It's freaking 2009! Not 1979!

Who knows... in 3009, perhaps we humans have evolved enough to have figured out and understand this whole send plain stupid text to another person-thingie. It's apparently too advanced to grasp for the current generation of humans.

(I have high hopes for the next-gen humans, though... No, really. I do!)

Saturday, March 14, 2009

Contracts? Test-driven? Insight!

The other day I pair-programmed with a new guy at work. He showed my a class and its tests he and another guy had written a few week earlier. I don't recall exactly what the class did, but it was quite simple, hence the tests was short and simple. Overall well written tests if you ask me.

As we looked at the tests we had the following conversation, which afterwards gave me an new insight to an idea I've had a long while: tests are contracts.
He: Most of these tests are for testing that the class logs correctly...
Me: Yes, is that a problem?
He: Well, I know TDD says that you need to write a failing test before you're allowed to write any production code. But isn't testing logging a bit over-kill?
Me: I understand what you mean. What kind of logging is this? Why does the class need to log?
He: We needed it to understand the code. For debugging.
Me: Is the logging needed now? Is there some script that parses these logs, for example?
He: No, it's not needed any more.
Me: In that case I'd say that these tests isn't needed.

I could go even further and say that those tests shouldn't be written at all and should be removed. I actually think that tests like these are more confusing then anything else. I'm not saying those two guys who wrote these tests did anything wrong; they were doing TDD and was doing TDD right. What I'm saying is that, in my opinion, TDD isn't ideal.

Yeah, I hear you're cries: What?! Heresy! Calm down. I'll try to explain.

My opinion is that a class' tests should define what the class has to fulfill to be considered correct. To be precise, with 'correct' I mean 'what makes all things that depend on the class behave correctly'. (I realize that this is an recursive definition of 'correct', but you're a human being so you can handle it. :))

Recall that my pair-programming partner said that the logging wasn't needed any more. This means that we could remove the part of the code that logs and the class would still be correct according to the definition above. However, the class would not pass its tests because of the tests that tests the logging. This means that the class is over-specified. This is bad. The solution? Ditch the logging tests!

And so my fellow pair-programmer did.

I often say that test-methods should be named shouldFoo since it makes you focus on the behavior that is tested instead of the part that is tested (the tested method for example). I'm thinking of extending this tip to nameing test-methods as shouldFooBecauseBar. If this convention was followed, the the test-methods that tests the logging whould be named shouldLogBecauseWeNeedItforDebugging. That name sound a bit silly, doesn't it? That because it is silly to test it!

As I said, a class' tests defines what the class has to fulfill to be correct. In other words, the tests is the contract that the class must fulfill. Having tests that define contracts is much better, I think, than having tests for every little piece of functionality a class have (i.e., TDD). One reason is that it makes it easier to understand how you can change the class without breaking anything.

Now don't get me wrong, TDD is great, really great. But is it perfect? Of course not, that would be very naive to think (not to mention boring: Nothing more to do here, we've find the ideal solution! Now, let's drink tea all day!).

Is contracts ideal? Probably not. It contracts better? Yes, I think so.

Saturday, March 7, 2009

Pythonic parsing and keeping corner-cases in the corners

I've been fiddeling with Python for a while, especially a nice library called Pyparsing. I have posted some stuff about parsers before, and I have tried ANTL for a private project for parsing and translating Java code. Anyway, Pyparsing has to be the most intuitive and easy to use parser library I have used.

In my opinion, a common problem with many libraries, programming languages, etc., is that they are not opted for the most common, simple, cases. Rather, they make the most common cases just as hard as the most weird corner-cases you can possibly think of. Take this Java code for reading an entire file into a String:


FileInputStream fstream = new FileInputStream("filename.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
StringBuffer content = new StringBuffer();
while ((strLine = br.readLine()) != null)   {
content.append(strLine);
}
in.close();
return content.toString();

Why, oh why, do I have to write all this code when all I want is (in Python):


return open('filename.txt', 'r').read()

or (in Perl):


open FILE, "filename.txt";
$string = <file>;

The Java API for opening and reading files seems to be focused on covering all possible use-cases. Covering all use-cases is of course a good thing, but not on the expense of common simple cases. It is trivial for me to add a few convenience methods/classes to cover the common cases. But why aren't these methods/classes in the API from the beginning?

There are several other examples of this screw-the-common-cases-and-make-the-api-super-generic-mentality. Reflection in Java throws a gazillion exceptions, for example, and in most cases you don't need to know what went wrong, only that it did go wrong.

So, anyway, let's get back to the Pyparsing library. As I said, it is very easy to use and the common cases are straight-forward to implement. For example, there are helper classes/methods for parsing a string while ignoring up-/downcase, for matching one (and only one) of a set of grammar rules, etc. In addition to this the +, ^, | operators, etc, are overloaded so a grammar rule normally looks something like this:


greet = Word( alphas ) + "," + Word( alphas ) + "!"

Awesomeness.

So, what is this post all about? Pyparsing or bad libraries? Both. There are so many bad libraries out there that aren't ment to be used by human programmers. That is, the most simple things in the library are hard to use just becase the harders things are har to use. Pyparsing, on the other hand, is a joy to use. I was suprised how often I thought oh, it would be nice to have such-and-such helper function now and after looking in the Pyparsing documentation though yey, there it is!