Programmatically Speaking: logging

Showing posts with label logging. Show all posts

Saturday, September 10, 2011

Dynamic scoping in C++

I've had ideas for dynamic scoping before. There are pros and cons with dynamic scoping, as is explained at the emacs wiki.

Last time I implemented it in Java, this time I'm trying to get something more primitive (compared to the Java implementation) working in C++ (it should be straight-forward to port to C).

The use-case I had in mind when implementing this was how to distribute loggers or hierarchical configurations to every part of an application. In such code is is usually required to pass a Logger or a Configuration around to basically every function. However, with the approach described here, there is no need to pass stuff around like that, instead one uses stack-searching technique.

This little demo demonstrates the idea:

int func1() {
  TaggedInt i1(1);
  printf("i1 = %d\n", findClosestTaggedInt()); 
}

int func0() {
  printf("i0 = %d\n", findClosestTaggedInt()); 
  func1();
}

int main() {
  TaggedInt i0(0);
  func0();
}

When run, the following is printed:
i0 = 0
i1 = 1
Ok, what happens here? It's pretty simple. The class TaggedInt is a class that holds two values. A tag and an int. The function findClosestTaggedInt searches backwards through the stack until it finds the tag of a TaggedInt object and returns the corresponding int of that object.

In a real application the int would probably be replaced by something else, like a pointer to a LogConfig object or something similar. Further more, the findClosestTaggedInt would be replaced by findClosestTaggedLogConfig and probably not be used explicitly like in this demo, but rather implicitly in a constructor of some other object such as Logger. For instance:

void func() {

  Logger logger; // findClosestTaggedLogConfig is called by the ctor.

  logger.warning("Something is happening!");

}

int main() {

  LogConfig config(std::cerr); // This is a tagged object.

  func();

}

Now, there is a problem with this approach. Searching through the stack is an expensive operation, and not something you should do often. That implies that findClosetsTaggedXYZ should only be rarely called, or when the program is not in a time-critical phase. However, since findClosestTaggedXYZ is only used when distributing an object throughout the application (not when the object is used) this can usually be done once at application start-up and never again.

To sum up, here's a the rest of the code for running the demo above:
#include <string.h>
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>

static const unsigned TAG_SIZE = 16;
static const char TAG[TAG_SIZE] = "my magic string";

// A tagged int is an magic tag followed by an int. The tag
// *must* be at zero offset relative to the 'this' pointer.
class TaggedInt {
char tag[TAG_SIZE + sizeof(void*)];
public:
int variable;
TaggedInt(int variable) : variable(variable) {
    // Setup the tag.
    memcpy(tag, TAG, TAG_SIZE);
    TaggedInt* self = this;
    memcpy(&tag[TAG_SIZE], &self, sizeof(self));
}
};

// Find which direction the stack grows.
int stackDirection() {
int first;
int second;
int diff = intptr_t(&second) - intptr_t(&first);
return diff > 0 ? 1 : -1;
}

// The maximum number of bytes to search backwards through
// the stack for the tagged int before giving up.
static const unsigned STACK_SIZE = 1024 * 1024;

// Search backwards through the stack for the first tag.
// Note: this might be possible to optimize by a large factor
// by knowing the alignment of stack allocated variables.
int findClosestTaggedInt() {
static const unsigned ALIGMENT = 1;
int direction = stackDirection();
char start;
for (char* ptr = &start; ptr != &start - STACK_SIZE;
       ptr -= direction * ALIGMENT) {
    if (0 == strcmp(TAG, ptr)) {
      TaggedInt* tagged;
      memcpy(&tagged, &ptr[TAG_SIZE], sizeof(tagged));
      return tagged->variable;
    }
}
assert(false);
}
With this code you should have a fairly good start for implementing something more useful, such as a distributor of Loggers or Configurations.

One last thing, I would be really interested in hearing from any anyone that figures out how to compute the ALIGMENT constant in findClosestTaggedInt to increase performance while still making sure no tag is missed.

Friday, July 9, 2010

DSL: Domain Specific Logging

A few years ago, I was asked to to design a logging framework for our new Product X 2.0. Yeah, I know, "design a logging framework?". That's what I thought too. So I didn't really do much design; all I did was to say "let's use java.util.logging", and then I was done designing. Well kind of anyway...

I started to think back on my first impression of the logging framework in Product X 1.0. I remembered that I had a hard time understanding how it was (supposed) to be used from a developer's point-of-view. That is, when faced with a situation where I had to log, I could not easily answer:

to which severity category does the entry belong (FINEST, FINER, FINE, INFO, WARNING, SEVERE, ERROR, etc)
what data should be provided in the entry and how it should be formatted, and
how log entries normally is expressed textually (common phrases, and other conventions used).

In other words, it was too general (i.e., complex) with too much freedom to define your own logging levels, priorities, etc.

That's why I proposed to make logging easier, thus, the logging framework in Product X 2.0 has very few severity levels. This is implemented by simply wrapping a java.util.logging.Logger in a new class that only has the most necessary methods.

This simplification reduced the code noise in the logger interface, which made it a lot easier to choose log-level when writing a new log message in the source. This had an important implication (that we originally didn't think of): given that a certain message had to be logged, we (the developers) agreed more on which level that message should be logged. No more, "oh, this is a log message written by Robert -- he always use high log level for unimportant messages".

Of course, you could still here comments like "oh, this log message is written by John -- he always use ':' instead of '=' when printing variables". This isn't much of an issue if an somewhat intelligent being is reading the logs, e.g., a human. However, if the logs are read by a computer program, this can be a big problem -- this was the case for Product X 2.0.

This variable-printing business could be solved quite easily; we simply added a method to the logger class (MyLogger) called MyLogger variable(final String name, final Object value) that logged a variable. Example:

logger.variable("result", compResult).low("Computation completed.");

which would result in the following log message:


2008-apr-12 11:49:30 LOW: Compuation completed. [result = 37289.32];

When I did this, I began to think differently about log messages. It put me into a new mind-set -- the pattern matching mind-set. Patterns in log messages started to appear almost everywhere. Actually, most of our logs followed one of the following patterns:

Started X
Completed X
Trying to X
Failed to X
Succeeded to X

Here is an example:

try {
  myLogger.tryingTo("connect to remote host").variable(
                    "address", remoteAddress).low();
  // Code for connecting.
  myLogger.succeededTo("connect to remote host").variable(
                       "address", remoteAddress).low();
} catch (final IOException e) {
  myLogger.failedTo("connect to remote host").cause(e).medium();
}

To take this even further, it is possible to only allow certain combinations of succeededTo, tryingTo, cause, etc, by declaring them in different interfaces. Let's assume that it should only be possible to use cause if the failedTo method have been called. Here's the code for doing that:


interface MyLogger {
  // ... other logging methods.
  CauseLogger failedTo(String msg);
}

interface CauseLogger {
  LevelChooser cause(Throwable cause);
  LevelChooser cause(String message);
}

interface LevelChooser {
  void low();
  void medium();
  void high();
}

This is essentially what's called fluent interfaces. It complicates the design of the logging framework, but also makes it impossible to misuse it. Good or bad? It's a matter of taste, I think, but it's a good design strategy to have up your sleeve.

Saturday, April 12, 2008

Making deactivated logging 100 times faster

I think the java.util.logging is a nice logging framework: it's easy to do simple things yet it is not limited. You can easily tweak it using custom filters, formatters, and handlers. One thing I do not like with it, however, is its performance.

The problem with logging
I have no problem with the performance of java.lang.logging when the logging is activated. It's the performance when logging is deactivated that is an issue for me. The problem, as I see it, is that when


  logger.info("someting: " + something.toString());

is executed, the argument to info is created (by concatinating two string, which is computationally heavy) despite logging being deactivated. This means that a string will be created and then directly thrown away without being used. To make things even worse, there is even a greater performance penalty if the toString method of something is computionally heavy.

This is not problem with java.util.logging per se, but rather a problem with the Java language. Don't get me wrong -- I like Java -- but in certain areas Java is simply too limited/limiting. I see at least three way of solving the problem described above:

introducing some kind of macros to the language,
using aspect-oriented programming, or
performing string concatination lazily.

Personly, I think that the common variant of macros (the C/C++-kind) it a Bad Thing. On the other hand, the other variant of macros (the Lisp-kind) does not fit nicely in the Java languange because those kinds of macros operate on the AST (this is perfectly ok in Lisp because Lisp does not have any syntax -- you're actually creating the AST when you write the program).

The second solution to the problem is aspect-oriented programming. To be honest, I don't know enough about that to be able to discuss it here. With the limited knowledge I do have, however, I think that it should be possible to instrument the piece of code above such that you get the following sematic:


  if (logger.logsAtLevel(Level.INFO) {
    logger.info("someting: " + something.toString());
  }

The third solution -- performing string concatination lazily -- is the solution I will discuss for the rest of this post. I'm assuming that the methods used to create the log message, e.g., the toString method, are are pure functions, i.e., has no side-effect. This is a perfectly legitimate assumption because deactivated logging should have no side effect as it is.

Lazy string concatination
Ok, so how can we make string concatination in Java lazy? In C++ we could have overloaded the operator +, but this is not possible in Java. One hacker-ish solution would be to implement new String and StringBuilder classes (which the compiler uses to implement string concatination) which performs concatination lazily, but this not trivial... (I have actually tried... (and failed)). Instead, we can implement a thin wrapper around java.util.logging.Logger with the following methods:


  MyLogger log(Object msg);
  MyLogger log(Object msg1, Object msg2);
  MyLogger log(Object msg1, Object msg2, Object msg3);
  // ... and so on.
  void info(Object msg);
  // ... and all the the other levels.

which is used like this:


  myLogger.log("Received message: ", msg, " from ").info(msgProvider);

which is the equivalent of


  logger.info("Received message: " + msg + " from " + msgProvider);

when using a java.util.logging.Logger. The log methods is simply implemented by storing the references to the objects given as arguments. The info method is implemented by calling toString on its argument and the argument given to log if the logging is actived, otherwise it does nothing.

I (kind of) have implemented such class; the difference is that instead of wrapping a java.util.logging.Logger my class uses a java.util.logging.Handler directly. The interface of this class, which I named Ln4j (pun definitely intended), is the same as MyLogger above, however.

Performance measuments
So, what kinds of performance numbers can we expect? <disclamer>I'm definitely no expert in measuring performance, but I have tried my best to create fair benchmarks.</disclamer> These are the benchmarks:

logging single constant string,
concatinating two constant string and log the result,
concatinating a constant string and a variable string and log the result,
concatinating six short (4 characters) variable strings,
concatinating six long (40 characters) variable strings,
concatinating a constant string and an int and log the result,
concatinating a constant string and a List<double> (of length 8) and log the result.

I ran these benchmars with and without the -server switch to the JVM and with logging activated and with logging deactivated. This is the result.

In summary: with logging activated ln4j performs a bit faster than java.util.logging. However, since ln4j is quite simple (e.g., it has no log levels) this small performance advantage would probably disapper if ln4j implemented all functionality provided by java.util.logging.Logger.
When running the benchmarks with logging deactivated, there is usually considerable performance gains (no, the post title is no exaggeration). Of course, exact numbers depend on what is logged. When logging a single constant ln4j is actually somewhat slower. However, in the benchmark that logs a list, ln4j is 600-700 times faster than java.util.logging. That optimzation for ya!

I hope this post was informative and that you have learned something from reading it. I learned a lot when experimenting with lazy string concatination; let's hope it will native in Java 8. :)

Oh, I almost forgot, here and here are the source used in the benchmarks.