Tuesday, May 28, 2013

Pretty ugly hacks

Recently I've (re-)discovered the more hacky kind of programming. It started after that I implemented my first bloom-filter. Somehow I ended up reading about all kinds of bit twiddling hacks, and finally I found myself in the realm of xor double linked lists.

Doubled linked list is a data structure where each node in the list has two pointers; one to the next node, and one to the previous node. The size of each node is thus 3 pointers. An xor linked list has the same logical function, but a compress representation -- it only requires 2/3 of the size of a normal linked list. This is achieved by observing that the only way to reach a node in a double linked list is either from the back or from the front of the list. Thus, it's enough to store the xor-sum of the pointer to the next node and the previous node, since either of those pointers is known. Anyway, the linked Wikipedia article explains it much better.

Well, the neighboring country to xor-lists is Tagged Pointer-land. In Tagger Pointer it's ok to do all kinds of crazy things to pointers as long as you know what you're doing. For instance you can exploit that dynamically allocated memory on most systems are aligned to 16-bits, 32-bits, 64-bit or even 128-bits borders. What can this be used for? Distinguishing between native integer values and arbitrary-precision integers for instance, or for putting small data structures (e.g., short strings) in the "pointer" rather than having to dynamically allocating a few bytes of memory.

It's actually pretty useful stuff, although -- as the title says -- pretty ugly. That is, the pretty kind of ugly.

Sunday, May 26, 2013

Features are social processes


For a while now I've been thinking about how human languages evolve compared to how computer languages are designed and how that relates to the features of the respective languages. In this post I will ramble at bit about how the meaning of software related terms is defined. I'll also discuss hard and soft features in programming languages and how the community surrounding the language affects, and is affected by, those features.

This is mostly a bunch of ideas and observations that I'm trying to put into words to 1) make me understand them better, and 2) make sure I don't forget them. If you expect a scientific survey, then I'm sorry to disappoint you. Maybe, though, you'll find food for your own thought and ideas.

What's a language?

As I see it, languages (be it the human or programming kind) are mutual agreement between the communicating parts of what a statement means. If everyone have a different opinion of what the following statement means, then it effectively doesn't have any meaning at all since we can't use it to communicate with anyone:
You need wood from a forest to create a fire.
or in computer-speak:
Fire fire = new Fire(forest.getWood());
On the other hand, when we agree upon what this phrase mean, then we can do all kinds of things: discuss its correctness, use it in another context, abstract it to cover more and general cases, write a compiler for it, etc.

For example, the common breed of C compiler accepts a lot of code most C programmers won't. It's the users of the C language that defines the subset of allowed-by-the-compiler-C that is acceptable for us to use. In other words, the C standard can say all it wants; its the C users who in the end defines the (practical) C language. It a bit like if physics would say "Our universe have 11 dimensions. Go use all of them!", but all known users of the universe are three-dimensional beings, thus, the accepted subset of the universe is three-dimensional. Sorry to break it to you, physics, but that's how it is.

Language features are all about what the community around the language make of the language. In a way, language features are just as much a technical aspect of the language as a social aspect of it. Example: is a language feature that's not accepted by the community really a feature, or is it just useless language complexity? More extreme example: a language without users but with extremely powerful features; effectively, does that language have any features at all?

I would also say that anyone aiming to develop a successful programming language (without the backing of a *caugh* Sun huge *caugh* Microsoft corporation) needs to have equally good eye for the technical aspect as well as the social aspect. (S)he needs to understand the social processes involved for getting a community of users who agree (hopefully with the language designer) on how the language should be used. (I think python is good example of such community, by the way).

What about software?

Developing software is also a social process. For example, you get requirements from your customer, you discuss the requirements in order to understand them, and you implement them. Implementing requirement are also a social process: you design the code by discussing it with your colleagues. And what words do you use for doing that?

You use words like object, generic, sort, inheritance, stack, tree, operation, method, message, reuse, client, algorithm, allocation, port, framework, mapping, service, channel, process, decoupled, assumption, resource, provider, input, interface... I could go on forever, but the point is that none of these words really mean anything if we humans don't agree on what they mean. The computer, framework, or programming language has no opinion on what "decouple the client's mapping algorithm from the port allocation" means, but programmers do. It's important it means that same to all programmers involved.

Soft and hard

How does this relate to programming language features? I think there two different kinds of features: hard features that was (deliberately) designed into the language, and soft features that are concepts and idioms that have evolved from using the language.

Hard features are concrete. You can make a list of hard features by reading the language specification. Soft features, on the other hand, are not. They are embedded in the community and to enumerate them you need to vibe with it for a while. Hard features are taught in classes and in books; soft features are learned by hacking, hacking, living and breathing, and some more hacking.

Example: C++ templates. Originally intended to provide better type-safety when writing generic code, like std::vector. The C++ community has then discovered that templates can be used for much, much more (like Boost.Spirit). There are a lot of template code written to implement various kinds of abstract features, e.g., compile-time if, compile-time strings, domain specific languages, etc. The hard feature is "write type-safe generic code". The soft features are "compile-time if", "embedded DSL", and even "it's hard, but you can evaluate everything at compile-time".

The D language took these soft features of C++ templates (e.g., compile-time if, embedded DSL) and integrated them into the core language. Thus, enabled more programmers to use them, because of easier syntax, error messages, compiler support, documentation, etc.

So when a C++ programmer talks about enabling or disabling some piece of code (s)he needs to think about some abstract concept like the enable-if template, while a D programming just thinks "static if". In fact, I don't think the D programmer even thinks "static if" because it's so natural to them, just as the more common "dynamic if" is so natural to all of us. The D programmer probably thinks in entirely other abstract concepts because his/her mind is free from the horrible details of C++ templates meta-programming.

You may argue that our mind is very good at abstracting and that this isn't a problem in practice, but I don't think that's true at all. Example: very few C++ programmer have every done something like an template computing the sinus of an angle, so when they're told to optimize a piece of code doing trigonometry what they'll do is to use some kind of table look-up. A D programmer, on the other hand, will simply slap together a sinus function that can be evaluated statically by the compiler because compile-time evaluation is nothing magic to her/him. (In fact, in D 2.0 match function will automatically be evaluated at compile-time if their arguments are constants).

What I'm saying here is that compile-time evaluation is a (hard) language feature of D but not of C++ (where it is an soft feature). Sure, you can in theory do compile-time evaluation of everything you need in C++ (templates are Turing complete), but not in practice because it's so hard that you actively avoid it. Thus, in most programmers conscious mind, C++ does not have compile-time evaluation. Similarly, you can do object-oriented programming in C, but you probably don't because it's hard and littered with horrible details. Thus, C is in most programmers mind not a object-oriented language. It's possible to do structured programing in Commodore BASIC, but most of us don't think of it like that. Got my point yet? Good.

Ok, so what?

By now you are probably thinking "that's all very interesting, but how is this useful?". Well, I did say that this post was a rambling, didn't I? :)

Seriously though, I don't really think any of this will make you a better software developer, but I think it could be useful if you are developing an tool and there's a community of users around it. Be sure to note what kinds of words the users uses, what features they ignore, what idioms they invent, etc. Integrate the words and idioms into the tool and it's documentation to make the experience for a new user of the application more consistent.

Friday, May 24, 2013

Small features distinguishes Protest

Protest is a unit testing framework for C++ that make testing fun, fast, and simple (see introduction). Recently, after using it on one of my own project, I've added a few more features to it.

First, I improved the compilation time for compiling a .cc file containing a test cases. This was embarrassingly bad earlier, but now I think its pretty good. The issue was that there were a lot functions defined in the header. I did a bit of preprocessing magic to avoid unnecessary function definition (in favor of function declarations) and that lower the compilation times dramatically.

Second, I made the check macro a bit more intelligent. Protest's check macro was already quite smart: it's capable of dealing with matchers, and splitting the left hand side and the right hand side in a comparison (e.g., foo != bar), which means that there is no need for several check macros as is common for C++ unit test frameworks.
Anyway, while working on some bit twiddling heavy code my tests tended to look something like this:
    test("twiddling #2") {
        check(f(10) == 0x1f);
        check(f(11) == 0x2f);
    }
and when a test failed the following was printed:
    test.cc:9: f(10) == 0x1f (30 == 31) failed. [suite/twiddling #2][].
which is all pretty nice, isn't it? Well, not really.

I found myself having to convert the decimal print-outs to hexadecimal before the error made sense to me -- bit twiddling is not easy in base-10... It didn't take long until I wished that my favorite unit test framework did this automatically for me. How nice then that I had the author of it so conveniently close. Sitting on the very same chair even!

Fast forward one hour and Protest now gives the following print-out for the example above
    test.cc:9: f(10) == 0x1f (0x1e == 0x1f) failed. [suite/twiddling #2][].
that is, Protest recognizes that I used hexadecimal literals in the check and changes the base of the integers in the print-out appropriately.

Pretty convenient.

Sunday, May 5, 2013

pythlog -- python on constraint-logic-programming steroids

I've programmed in Python for about 5 years and I've programmed in Prolog for about half a year. I know my way around python-land reasonably well, but I'm only beginning to learn Prolog. What I have learnt in the last half year, though, is how hard it is to be taken serious when you tell another developer we should really use Prolog to solve this problem. Prolog is simply a too far-off world if you're used to python-land.

A lot about learning a new tool or language is how easy it is to leverage existing knowledge. And Prolog does not fair well in this regard. No side effect? No classes? No for statement? What's this funky if? Where's my list comprehension? Recursion -- for real?? You have to be kidding me! These are all responses of a hypothetical, but likely, programmer that learns Prolog.

Even though I'm advocating learning Prolog, there are some thing that just seem to fundamentally be designed to work against me, and I keep thinking if Prolog only was a tiny bit forgiving and accepted me for the programmer I am... and my tiny brain. In addition, Prolog lacks a lot of syntactic sugar for common operation, such as string concatenation.

A snake around a piece of wood. Get it?
As an experiment of how many of Prolog good properties can be moved into python-land, I've started working on a python-to-prolog compiler. The intention is primarily very egoistic: I want a language that makes my life as a programmer better. The way I see it marrying Python with Prolog (or the other way around?) can potentially bring a very powerful language. I'm calling this marriage pythlog and here's the techy elevator pitch:

pythlog is a language that melds the power of constraint logic programming with the ease of development of Python.

What does this mean? It means that not only can a lot of traditional Python code be compiled and run, such as

    def mersenne_prime(n):
        return 2 ** n - 1
    def format_string(a, b):
        return "[" + a + b + 2 * a + "]"
 but also search/equation solving such as:

    n = free # solve this variables
    assert mersenne_prime(n) == 618970019642690137449562111
    a = free
    b = free
    assert format_string(a, b) == "[abc, abcabc]"
where n, a, and b are free variables that will be bound to values by solving the above equations. In this example n is solved to 89 and a to "abc"b to ", ".
Before you get too excited, please note that state of pythlog is proof-of-concept: there is basic integer arithmetic, string, and list support. There is also some support for user defined classes.

What can be expected to be supported of the Python language? Well, so far my prototyping indicates that at least integers, strings, lists, sets, dicts, and user-defined classes can be supported. I've done a lot of prototyping to find out what can be done and what can't be done, and I think there is a lot of potential here. In fact, by using existing tools pythlog will support compilation to native code (using the gprolog compiler), which is cool in itself.