Saturday, March 7, 2009

Pythonic parsing and keeping corner-cases in the corners

I've been fiddeling with Python for a while, especially a nice library called Pyparsing. I have posted some stuff about parsers before, and I have tried ANTL for a private project for parsing and translating Java code. Anyway, Pyparsing has to be the most intuitive and easy to use parser library I have used.

In my opinion, a common problem with many libraries, programming languages, etc., is that they are not opted for the most common, simple, cases. Rather, they make the most common cases just as hard as the most weird corner-cases you can possibly think of. Take this Java code for reading an entire file into a String:

FileInputStream fstream = new FileInputStream("filename.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
StringBuffer content = new StringBuffer();
while ((strLine = br.readLine()) != null) {
return content.toString();

Why, oh why, do I have to write all this code when all I want is (in Python):

return open('filename.txt', 'r').read()

or (in Perl):

open FILE, "filename.txt";
$string = <file>;

The Java API for opening and reading files seems to be focused on covering all possible use-cases. Covering all use-cases is of course a good thing, but not on the expense of common simple cases. It is trivial for me to add a few convenience methods/classes to cover the common cases. But why aren't these methods/classes in the API from the beginning?

There are several other examples of this screw-the-common-cases-and-make-the-api-super-generic-mentality. Reflection in Java throws a gazillion exceptions, for example, and in most cases you don't need to know what went wrong, only that it did go wrong.

So, anyway, let's get back to the Pyparsing library. As I said, it is very easy to use and the common cases are straight-forward to implement. For example, there are helper classes/methods for parsing a string while ignoring up-/downcase, for matching one (and only one) of a set of grammar rules, etc. In addition to this the +, ^, | operators, etc, are overloaded so a grammar rule normally looks something like this:

greet = Word( alphas ) + "," + Word( alphas ) + "!"


So, what is this post all about? Pyparsing or bad libraries? Both. There are so many bad libraries out there that aren't ment to be used by human programmers. That is, the most simple things in the library are hard to use just becase the harders things are har to use. Pyparsing, on the other hand, is a joy to use. I was suprised how often I thought oh, it would be nice to have such-and-such helper function now and after looking in the Pyparsing documentation though yey, there it is!


Marius Gedminas said...

Please drop the ", 'w'" from your Python open(filename) example. Now you're opening it for writing (incidentally destroying the contents).

Togge said...

Oups, my mistake. Thanks for pointing it out!