April 2009 - Posts

Weekends are for hacking up crazy ideas (and so are evenings). This weekend was no different; amongst other things I got around to implement an Unlambda interpreter in C#. A what? Let’s talk about that first.

Warning: Reading this post can cause permanent brain damage. It definitely helps to have some understanding of lambda calculus, SKI combinators, continuations, and more.

 

What’s Unlambda?

Unlambda is an esoteric programming language designed by David Madore. It’s based on functional programming ideas, very minimalistic (in terms of the number of operations) and very obscure (because of its minimalistic nature and the syntax). What’s it good for? Fun, nothing more. Or maybe some understanding of combinators. Let me cite the Unlambda website to summarize the above:

This means that the language was deliberately built to make programming painful and difficult (i.e. fun and challenging).

I guess I should start believing what some colleagues attribute to me: I like pain. A while back I added the creation of an Unlambda interpreter to my list of planned Crazy Sundays projects. (Unfortunately the list always grows and never shrinks…)

So, what’s in a name? Unlambda is based on functional programming, which finds its foundations in lambda calculus. We all know by now what lambda calculus embodies, don’t we? Essentially, it’s “just” a way to declare functions and manipulate them. The lambda notation of functions comes in two parts: the arguments and the body. This is shown below with a couple of samples:

clip_image002
clip_image004
clip_image006

Here I have declared three functions. The first is the identity function: given x, it simply returns x. The second one takes two arguments and drops the second one, just to return the first one. The last function is more subtle. Remember that all we manipulate in lambda calculus are functions: the arguments of the last function are named x, y and z and are all functions. Functions can be applied to other functions: this is simply written by juxtaposition (putting symbols next to one another). For example xy stands for apply x to y. This happens in a left-associate manner: xyz means (xy)z, so the function resulting from evaluating the application of x to y is next applied to argument z. The last function I’ve shown above does use parentheses to change the evaluation order: x is applied to z and that result is applied to the result of applying y to z (still follow?).

But those functions are not just samples: they’re actually the basic building blocks of combinatory logic. In essence this means you can get rid of all lambdas, replacing them by combinations of those essential building blocks:

clip_image002[4]
clip_image004[4]
clip_image006[4]

I is known as the identity function. It’s actually redundant, as SKK is equal to I (to try that, evaluate SKKx = Kx(Kx) = x, therefore SKKx = Ix and thus SKK = I through extensionality).

K is the constant function: it produces a constant output (given by its first argument) no matter what it’s applied to next: Kx(whatever) always produces x. It’s a simple as that.

S is the substitution function. Instead of applying x to y directly, it first applies z to both functions (producing xz and yz) and then applies those two to one another. So the third argument, z, gets “substituted” in both x and y through application.

Having just S and K makes a language Turing-complete. All good things boil down to two: binary, duality, S and K. So, all we need to write all sorts of programs is a machine with two instructions: S and K. (Question for the reader: Can we do with less?) Needless to say this leads to obscure code, but that’s what unlambda stands for. And now you see why the language is called Unlambda: there’s no lambda operation (abstraction) left, everything goes through the combinators.

Oh, and by the way, if you’re given a program based on lambdas, you can “compile” those away by applying a technique called “abstraction elimination”, resulting in S, K, I and parentheses:

clip_image002[8]
clip_image004[8]
clip_image006[8]
clip_image008[5]
clip_image010[4]
clip_image012[5]

clip_image002[22]

Ignore this curiosity for now; just keep in mind it’s a “handy” way to write programs using plain old lambda calculus and make it fit in S, K and I combinators.

 

The syntax

There’s no reason for me to duplicate the Unlambda Reference here. The main operators are ` (backtick, for application), s, k and i. A few others are provided to make the language more usable (., r and @ for input and output, as well as ? and | to deal with input state). And as Unlambda is meant to be a pain in the neck, David added some interesting complications such as Call/CC (call with current continuation, known from Scheme) and promises (a way to delay evaluation).

With this in place, we can start writing applications (note it’s normal for your head to explode):

```s``s``sii`ki `k.*``s``s`ks ``s`k`s`ks``s``s`ks``s`k`s`kr``s`k`sikk `k``s`ksk

This one prints the Fibonacci numbers by means of asterisks (*, *, **, ***, *****, etc) separated by newlines. Try to spot the output in the see of combinators: .* prints the asterisks and r prints the newlines.

Can I explain the program above? I could try, but am not going to: “all” there is to it is executing it blindlessly using the rules for S, K and I. The backticks are applications and make it possible to drop all parentheses from the syntax (they are not allowed in fact, making it very easy to parse Unlambda programs btw). How was this program written (by the way, I didn’t write it – it’s taken from the Unlambda homepage)? One necessary ingredient when working with numbers is Church numerals (which represents numbers and operations on those, as we need things like addition). Below are a few representations of numbers 0, 1 and 2 in Church numerals, translated to SKI:

clip_image002[10]
clip_image004[10] 
clip_image002[14]

clip_image008[7]
clip_image010[6]
clip_image002[16]

clip_image014
clip_image016
clip_image018
clip_image020
clip_image022
clip_image024
clip_image026
clip_image028
clip_image030
clip_image032
clip_image034 
clip_image002[18]

Next, we also need some operators. On such operator is increment (+ 1):

clip_image002[20]
clip_image004[12]
clip_image006[12]
clip_image008[9]
clip_image010[8]
clip_image012[9]
clip_image014[4]
clip_image016[4]
clip_image018[4]
clip_image020[4]
clip_image022[4]
clip_image024[4]
clip_image026[4]
clip_image028[4]
clip_image030[4]
clip_image032[4]

It’s trivial to see that applying inc to 1 (represented as I), yields 2. I’ll leave it to the reader to verify that applying inc to 0 results in 1. Also calculate 3 from 2 using the inc function.

One more operator while we’re at it: plus. This one is defined in terms of inc as follows (why?):

clip_image002[24]
clip_image004[14]
clip_image006[14]
clip_image008[11]
clip_image010[10]
clip_image012[11]

Exercise: Apply plus to 1 and 2 and check it corresponds to the result of inc 2 (previous exercise).

 

Implementing Unlambda in C#

Now to the real topic of this post. How to implement Unlambda in its entirety (including “c” Call/CC and “d” promises)? Before we get there, we need to think about parsing input first. But let’s start with the Main method:

Main

Main is trivial. It validates arguments, parses the input file, and feeds the resulting expression tree to an Execute method. And, hacky as this is, we do some ugly not-so-much recommended exception handling as well:

static void Main(string[] args)
{
    if (args.Length != 1 || string.IsNullOrEmpty(args[0]))
    {
        Console.WriteLine("Usage: {0} <inputfile>", Assembly.GetEntryAssembly().GetName().Name);
        return;
    }

    string file = args[0];
    if (!File.Exists(file))
    {
        Console.WriteLine("File not found: {0}", file);
        return;
    }

    try
    {
        Execute(Parse(file));
    }
    catch (Exception ex)
    {
        Console.WriteLine("Error: " + ex.Message);
    }
}

Parse

Parse isn’t hard either. The entry-point to the parser simply opens the file and gets a StreamReader:

static Expression Parse(string file)
{
    using (FileStream fs = File.OpenRead(file))
    using (StreamReader sr = new StreamReader(fs))
        return Parse(sr);
}

Next, the real parser. Simply a mapping from characters onto expressions:

static Expression Parse(StreamReader sr)
{
    char ch;
    do
    {
        ch = ReadNextChar(sr);
        if (ch == '#')
        {
            sr.ReadLine();
            ch = '\n';
        }
    } while (char.IsWhiteSpace(ch));

    switch (ch)
    {
        case '`':
            return new ApplyExpression(Parse(sr), Parse(sr));
        case 'S':
        case 's':
            return new FunctionExpression(S);
        case 'K':
        case 'k':
            return new FunctionExpression(K);
        case 'I':
        case 'i':
            return new FunctionExpression(I);
        case 'V':
        case 'v':
            return new FunctionExpression(V);
        case 'D':
        case 'd':
            return new DelayedFunctionExpression(D);
        case 'C':
        case 'c':
            return new FunctionExpression(CallCC);
        case 'E':
        case 'e':
            return new FunctionExpression(Exit);
        case 'R':
        case 'r':
            return new FunctionExpression(Print('\r'));
        case '@':
            return new FunctionExpression(Read);
        case '.':
            return new FunctionExpression(Print(ReadNextChar(sr)));
        case '|':
            return new FunctionExpression(Reprint);
        case '?':
            return new FunctionExpression(Compare(ReadNextChar(sr)));
        default:
            throw new Exception("Unrecognized input symbol: " + ch);
    }
}

static char ReadNextChar(StreamReader sr)
{
    int c = sr.Read();
    if (c == -1)
        throw new Exception("Unexpected end of file.");
    return (char)c;
}

But this deserves a bit of explanation. First, the comment-and-whitespace ignoring loop: this simply scans input for whitespace and if an # is encountered, drops the rest of the line.

The big switch statement recognizes all valid input symbols, case-insensitively. There are three sorts of expressions being built:

  • Function expressions correspond to built-in functions. The arguments to the constructor will be explained next.
  • The delayed function expression for “d” promises. Essentially the same as a regular function expression but a different type to recognize during evaluation.
  • The apply expression that takes two expressions and applies the first one to the second one. This corresponds to the ` operator: `AB means applying whatever A is to whatever B is.

The ReadNextChar helper tries to read one character and throws if the end of the file is reached. This is fine, it will only be called when we really expect a character in the input stream. Notice how functions like . and ? read one character ahead to get their corresponding character (to print it, ., or compare it, ?, respectively).

The expression objects

So, the output of the parser is an Expression object, and we know there are three subtypes. How do those look?

abstract class Expression
{
    public abstract Task Eval(Continuation c);
}

class ApplyExpression : Expression
{
    internal ApplyExpression(Expression f, Expression arg)
    {
        Operator = f;
        Operand = arg;
    }

    public Expression Operator { get; private set; }
    public Expression Operand { get; private set; }

    public override Task Eval(Continuation c)
    {
} } class FunctionExpression : Expression { internal FunctionExpression(Function f) { Function = f; } public Function Function { get; private set; } public override Task Eval(Continuation cont) {
… } } class DelayedFunctionExpression : FunctionExpression { internal DelayedFunctionExpression(Function f) : base(f) { } }

Simple, isn’t it? As you can see, each expression can be evaluated through an Eval method. We’ll talk about that next, in addition to the Function and Continuation types.

Continuation-passing style

In order to allow Call/CC (call with current continuation), we need to use Continuation Passing Style (CPS). The idea of this weird-sounding animal is simple: a function takes in a “continuation function” that will get the result of the function’s execution. So, every such function can be void-returning:

void Add(int a, int b, Action<int> continueWith)
{
    continueWith(a + b);
}

This results in a tail-recursive setting. How this enables (and what is) Call/CC is something I won’t elaborate on this time. You’ll see it implemented in just a minute. The sample above could be called like:

Add(1, 2, sum =>
    Add(sum, 3, sum2 =>
        { Console.WriteLine(sum2); }));

We’ll call the continuation function (Action<int> above) a Continuation:

delegate Task Continuation(Function f);

This delegate takes in a Function (the result of every computation in Unlambda is a function, remember there’s nothing but functions) and returns a Task. Where does that come from?

Well, there’s one problem with our CPS-based solution: it exhausts the call-stack. Yes, the CLR a tail call instruction, but no it’s not exposed through C#. We need something else here, which we’ll call tasks. This idea is that a task can be applied to yield the subsequent task. Sounds weird? Stay with me for a moment and have a look at the Execute method:

static void Execute(Expression ex)
{
    Continuation exit = f => { Environment.Exit(0); return null; };
    for (Task task = ex.Eval(exit); true; task = task())
        ;
}

Say we have an expression that computes some function: we call it by means of the Eval method. Eval wants a continuation, essentially saying what to do next after Eval completes. We begin by saying the thing to do after the top-level expression (i.e. representing the whole program) completes, is to exit the interpreter. The “return null” statement will never be reached. An alternative implementation would be to break from the loop in a null-task and make exit simply return null.

Conceptually, Eval should be called recursively: to do an application of A to B, we need to evaluate A first, then B and finally call the result of evaluating A with the result of evaluating B. But this would be one source of stack exhaustion. So instead of causing recursive calls, we make Eval return a Task object. Our main loop then gets that task, and calls it. The result of such a call is the next task to carry out:

delegate Task Task();

So, how does ApplyExpression do its eval?

class ApplyExpression : Expression
{
    internal ApplyExpression(Expression f, Expression arg)
    {
        Operator = f;
        Operand = arg;
    }

    public Expression Operator { get; private set; }
    public Expression Operand { get; private set; }

    public override Task Eval(Continuation c)
    {
        return () => Operator.Eval(f => () => Operator is DelayedFunctionExpression ? c(Program.D1(Operand)) : Operand.Eval(arg => () => f(arg, c)));
    }
}

Ah, we’re getting somewhere: basically the evaluation of an applications hands back to the caller a task (notice the () => … lambda to new up a Task) that will “recursively” call the Operator’s Eval method to get the function to be applied (e.g. ` (SKK) (KSK) will first evaluate the operator (SKK), to I that is, and then evaluate the operand (KSK), to S that is).

How the inner Operator.Eval works is irrelevant for now. The idea is that Operator.Eval will return a Task and the thing we return from the ApplyExpression’s Eval method needs to be a Task too. So, you see that the result of calling Eval on the application is a new task with the first thing it will do being the evaluation of the operator. This allows the main loop to continue calling through the resulting task objects, and you see the call stack never grows deeper than a single eval.

So, we know how a Task works and how a Continuation works. Recapitulate:

delegate Task Task();
delegate Task Continuation(Function f);

A continuation is something that can execute a function and return a task to continue executing. A task is something that can execute, producing its successor task. We still need to know how a Function works. Well, everything is a function in Unlambda so not unsurprisingly the argument to a Function is, well, a Function. And as we’re using CPS, the next argument needs to be a Continuation object:

delegate Task Function(Function arg, Continuation c);

Wow, three mutually recursively defined delegate types. A note on functions only taking in one argument, this can be achieved by currying: every function of multiple arguments can be rewritten as a function that eats one argument at a time:

K x y = x
K x = y –> x
K = x –> y –> x

S x y z = x z (y z)
S x y = z –> x z (y z)
S x = y –> z –> x z (y z)
S = x –> y –> z –> x z (y z)

A look at the SKI combinators

Let’s make this whole thing concrete by looking at how the functions S, K and I are defined. First I, as that’s the simplest one:

static Function I = (x, c) => c(x);

Remember a function takes in a function as its argument (here x) and a continuation (here c). All I does is taking its argument and passing it to its continuation as-is. This should be simple to understand: we just continue the computation with I’s argument, as the effect of applying I is nothing but returning its argument (identity function behavior).

K is a bit more complicated, but not too hard either:

static Function K = (x, c) => c((y, c1) => c1(x));

Here we curry K, a function that takes in two argument x and y, to eat one argument at a time. Every argument comes with the continuation for that application. That explains the (x, c) and (y, c1) input pairs. Recall what K does: given two arguments, it simply returns the first one (x). So, once we have the two arguments (one at a time, through currying), we apply the first argument (x) to the continuation we have at hand. That’s what c1(x) does.

Notice how the first continuation is called, passing in the curried function:

(y, c1) => c1(x)

This is nothing but the curried version of K. The y parameter is irrelevant as K doesn’t look at it. So in essence: calling K with the first argument returns a Task that results from calling the supplied Continuation with a new (curried K) Function that on its turn evaluates its continuation to x, the surviving argument from K.

Now the beauty of all: S. Here we need to curry twice:

static Function S = (x, c) => c((y, c1) => c1((z, c2) =>
{
    Expression arg = new FunctionExpression(z);
    return () => new ApplyExpression(
        new ApplyExpression(
            new FunctionExpression(x),
            arg),
        new ApplyExpression(
            new FunctionExpression(y),
            arg)
        ).Eval(c2);
}));

First x is supplied with its continuation c, then y with c1 and finally z with c2. Once we’re there we can evaluate what S is supposed to do:

S x y z = x z (y z)

This consists of three applications: first x to z, then y to z and finally the results of the first application (x z) to the result of the second one (y z). But we already have the ability to apply functions by means of our expression tree model, so we just new up an expression tree that represents the computation (x z (y z)). Then we call Eval on it, passing in the current continuation at hand, so it will eval the whole thing and pass its result (a Function, as everything in Unlambda is such a thing) to the continuation c2.

I bet this will be overwhelming, but that’s what esoteric languages are all about: an impractical nature :-).

Void

“Ignore whatever is passed to me”, is what void does. It simply says: give me something and I’ll do my own thing going forward. To accomplish this, it passes itself to the continuation:

static Function V = (x, c) => c(V);

Quiz: Why does the above work? We’re declaring V and using it on the right-hand side. Would the following work? Why (not)>

void DoSomething()
{
    Function V = (x, c) => c(V);
}

Some other, simple functions

To regain breath, let’s look at a few input/output functions which are fairly trivial. To make those work, we have the concept of an input stream (simply Console.In) and a character that was last read:

static TextReader In = Console.In;
static char? lastChar = null;

Three functions deal with this piece of global state: Read (@), Reprint (|) and Compare (?). Those are shown below:

static Function Read = (x, c) =>
{
    int ch = In.Read();
    lastChar = ch == -1 ? null : (char?)ch;
    return () => x(lastChar != null ? I : V, c);
};
static Function Reprint = (x, c) => () => x(lastChar != null ? Print(lastChar.Value) : V, c);
static FC<char> Compare = ch => (x, c) => () => x(lastChar == ch ? I : V, c);

Read takes input from the input stream, sees whether we reached the end or not, and sets the lastChar – a nullable char – accordingly. Next, it returns a task that will call the function following the Read (@) call, i.e. x. The argument to that function becomes either the identity function (causing execution to continue in essence) or void (something bad happened), depending on the input that was read. And obviously it passes on the continuation.

Reprint is very similar and defined in terms of a yet-to-be-defined built-in Print function. But again, it either passes on the Print function or void (in case no character is available yet, or anymore).

Compare is a function that takes an argument in the program code (e.g. ?b to compare the last read character to b), so we create an FC or function constructor that can be called by the Parser:

case '?':
    return new FunctionExpression(Compare(ReadNextChar(sr)));

So, the next character in the program text becomes the character to compare to, the the “real” Compare function is parameterized in this:

static FC<char> Compare = ch => (x, c) => () => x(lastChar == ch ? I : V, c);

with

delegate Function FC<T>(T t);

yields the following given ‘b’:

Function compB = (x, c) => () => x(lastChar == ‘b’ ? I : V, c);

which again follows the same pattern as the Read function: if equal, I is returned, otherwise V is returned. This allows for some control flow constructs.

Finally, there’s the Print function, again parameterized by source text (e.g. .* prints a ‘*’):

static FC<char> Print = ch => (x, c) => { Console.Write(ch); return c(x); };

Prints to the console as a side-effect and continues execution by calling the continuation. Notice ‘r’ is implemented in terms of Print, simply by printing ‘\n’ to write a new line:

case 'R':
case 'r':
    return new FunctionExpression(Print('\n'));

Call/CC and delay/promises

The most esoteric parts of Unlambda are maybe the Call/CC and delay functions (c and d):

static Function CallCC = (x, c) => () => x((f, c2) => c(f), c);

This is Call/CC: a function that returns a new task (() => …) that calls the given function (x) with the current continuation (c), but also causes the function’s argument to become a function that ignores its own continuation (c2) and instead uses the current one. That’s the crux at the heart of Call/CC: it ignores that passed-in continuation, and substitutes it by the current one. David has a good overview of Call/CC if that seems exotic to you.

Another one we should mention here is “d”, maybe even more mind-boggling. It essentially delays the evaluation of its argument: `FG where F evaluates to the delay function (d) does not evaluate G straight away. Instead it returns a function that, when applied, will evaluate G. E.g. if ``FGH is evaluated, and F is ‘d’, G doesn’t get evaluated till H is applied to the delayed `dG function. Therefore, this is an exception to the regular rules of evaluation in Unlambda, and not surprisingly it is a bit involved to write down:

static Function D = (f, c) => c(D1(new FunctionExpression(f)));

This is the easy part. I’ve factored out D1, the curried form, as it needs to be reused somewhere else – see further. I just wished I could explain how D1 works in a crystal-clear fashion. Let me try…

internal static FC<Expression> D1 = ex => (f1, c1) => () => ex.Eval(f => () => f(f1, c1));

In essence, D1 is again a function constructor, this time based on a FunctionExpression. Our function expression here is based on “f”, the thing to be delayed. In here, we create a new Function that given an argument f1 and a continuation c1 will produce a task (() => …) that will evaluate the delayed expression (ex), passing in a continuation. This is the most complex aspect of D1, at least to understand it (because, again it’s quite simple in its implementation). The way FunctionExpression.Eval works is shown below:

class FunctionExpression : Expression
{
    internal FunctionExpression(Function f)
    {
        Function = f;
    }

    public Function Function { get; private set; }

    public override Task Eval(Continuation cont)
    {
        return cont(Function);
    }
}

It simply calls the continuation with the supplied function. Put this together with how D defers evaluation of its argument f by new’ing up a FunctionExpression (ex) that subsequently gets called in D1 through ex.Eval, and we come to the conclusion that the argument of that continuation is no other than the deferred f:

static Function D = (f, c) => c(D1(new FunctionExpression(f)));

internal static FC<Expression> D1 = ex => (f1, c1) => () => ex.Eval(f => () => f(f1, c1));

So the task returned by the Eval’s continuation simply calls the deferred function f and passes in f1 as its argument, also asking to be continued with D1’s continuation.

ApplyExpression revisited

The final thing to look at is ApplyExpression’s Eval method (again):

class ApplyExpression : Expression
{
    internal ApplyExpression(Expression f, Expression arg)
    {
        Operator = f;
        Operand = arg;
    }

    public Expression Operator { get; private set; }
    public Expression Operand { get; private set; }

    public override Task Eval(Continuation c)
    {
        return () => Operator.Eval(f => () => Operator is DelayedFunctionExpression ? c(Program.D1(Operand)) : Operand.Eval(arg => () => f(arg, c)));
    }
}

Application is the thing that gets influenced by delayed functions, as created by the “d” operator. In essence, if the operator passed to an ApplyExpression is delayed, we should not evaluate the argument either. Rather we should create a promise – as represented by D – for it:

Operator is DelayedFunctionExpression ? c(Program.D1(Operand)) : …

Only when D1 gets called, as explained in the previous paragraph, evaluation of the operand will be forced (as promised).

If the operator passed to an ApplyExpression was not delayed, we just proceed regularly:

Operator.Eval(f => () => Operator is DelayedFunctionExpression ? … : Operand.Eval(arg => () => f(arg, c)));

That is, we evaluate the operand by passing in a continuation that subsequently will call the function f with the evaluated argument, passing in the continuation. If you read the whole (eager) evaluation from left to right you can see we first evaluated the operator, then the operand, and finally called the function. So, there are three nested tasks here (look for () => …).

Note: In theory it would be possible to eliminate the Expression types too, turning them in lambdas as well (representing their Eval method, hence of type Continuation –> Task). There’s one direct implementation worry though: the type-check for an Operator against “a delayed expression type”. Delegates have no tweakable type hierarchy, and there’s no way to add properties (like IsDelayed) to them (yes, you could keep a dictionary to maintain that property for each expression). You could hack it up in IL if you want, but let’s keep that as a challenge to the reader. Essentially you want to replace Expression by something like this:

delegate Task Expr(Continuation c) {
public bool IsDelayed { get; set; } // Hypothetical syntax
};
static Func<Function, Expr> FuncExpr = f => c => c(f);
static Func<Expr, Expr, Expr> ApplExpr = (Operator, Operand) => c => () => Operator(f => () => Operator.IsDelayed ? c(Program.D1b(Operand)) : Operand(arg => () => f(arg, c)));

Testing it

Phew, never though I’d make it this far. We’ve just written a humble 186 lines of dense C# code (no comments). Let’s see it works, with our Fibonacci sample:

image

Sweet – looks right, doesn’t it? 1, 1, 2, 3, 5, 8, 13, 21, etc.

I tried the interpreter code on all the samples that come with the Unlambda distribution, and it seems to pass fine:

image

This said, it’s always possible I made some mistake somewhere, so don’t make your next space shuttle project depend on this code :-).

 

Download

Gentle as I am, I’ve made the code available over here. Enjoy <g>!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Declarative language constructs like query comprehension syntax often worries imperatively trained developers. I hear this quite a bit, and the excuse of “It Just Works” is often not satisfactory for most of them :-). Combine this with interesting behavioral differences like lazy evaluation and lots of developers get lost in paradise.

Actually the perceived problem is not with LINQ itself, typically a lack of solid understanding about the query execution model causes grief. So one thing that can help to address this problem is a better visualization of how a query executes. Advanced query “providers” like LINQ to SQL offer logging capabilities to inspect what’s going on, but LINQ to Objects lacks such a capability.

In this post, we’ll have a look at possible approaches to make debugging LINQ to Objects (and hence LINQ to XML) queries easier. At the end of the day you’ll come to the conclusion it all boils down to knowing precisely what the semantics of the various operators are and how the execution works in the face of laziness etc.

 

Visual Studio 2008 support

First things first, so let’s have a look at what Visual Studio 2008 gives us with regards to debugging LINQ queries. To begin with, you can set breakpoints on entire query comprehensions (or the statement in which they occur):

image

This is not useful for the debugging of the query itself. Even more, it’s actually distracting: breaking on the line marked above and continuing execution (F10) doesn’t really do much interesting things. That is, the query doesn’t execute here. LINQ queries are evaluated lazily, only when someone starts to consume them (like foreach or some greedy operator like ToArray, ToList, etc) things of interest start to happen.

The following setup is more useful:

image

Notice how the “Results View” reveals the lazy nature of the query. I’m actually breaking inside the foreach loop, but this pays us little or nothing: at that point we already observe the results and the fact we’re debugging a query expression likely reveals that we’re not satisfied by some of the results produced. We need something in between showing the results and the declaration of the query…

Note: In the screenshot above you can see some weird animal – WhereSelectArrayIterator. This reflects performance optimizations carried out in the 3.5 SP1 release where consecutive where and select clauses are optimized into one iterator. In addition, the knowledge the query is carried out over an array can be used to exploit characteristics of arrays to boost performance as well.

One thing you can do, is set breakpoints on the bodies of the various clauses in the query expression, like this:

image

To set those breakpoints, put your cursor somewhere in the clause’s body and press F9.

image

Now you can see when certain clauses hit, e.g. where in the case above, and see how objects flow through the query. For example, once we get to Split we’ll first hit the where clause and then the select clause with the same object:

image

As the object has flown through the whole query it finally reaches the body of the foreach-loop:

image

While this can definitely help, together with conditional breakpoints and such, it might become tedious when dealing with more advanced queries and/or big input sequences. Also, nested queries and joins can be puzzling when you observe the execution behavior this way.

 

A naive logger

One thing that can help to get more global insight in the query’s execution is to have a global kind of logger. Essentially we want to add inspection points somewhere in the query (much like doing electronics by measuring certain connection points) and log output so we can see how each object flows through the query execution pipeline.

Let’s start by looking at the query above in a more under-the-hood manner, by expanding the query syntax into a chain of query operator method calls. This reveals little dots which you can conceptually compare to our desired measurement points (almost as in electronics again :)).

image

How can we make those dots observable? Obviously, the measurement should not change the observation (ignoring Heisenberg’s uncertainty principle here). In essence, we want to inject some observer to watch the flow of the data. One way to do this is by defining our own extension method. Here’s what you might think would be meaningful:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return input;
    }
}

With the following use:

var res = typeof(string).GetMethods()
    .Where(m => m.Name.StartsWith("S"))
    .Watch()
    .Select(m => m.Name)
    .Watch();

Not really useful actually because we don’t observe objects flowing through the pipeline. We’ll just get one call to Watch, when the query expression itself is executed. We really want to participate in the query execution by pulling items from the source (whatever occurs on the left-hand side of the Watch call) and yielding them back to our downstream consumer as-is. For example, if a foreach starts pulling on “res”, it pulls on Watch, which should pull on “Select” yielding back its results one by one, again causing the previous Watch the be pulled, and so on. Here’s a better attempt:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        foreach (var item in input)
            yield return item;
    }
}

Next we need to do something meaningful inside the loop. Okay, we could set a breakpoint, but our starting point of this whole exercise was to create a logger so we can analyze a whole query execution run afterwards. To keep things simple, we’ll just log to the console. We could simply do a ToString or we could take in a function that allows overriding the “printf” behavior (as objects might be complicated or might not have ToString overridden in a meaningful way):

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return Watch(input, item => item != null ? item.ToString() : "(null)");
    }

    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input, Func<T, string> toString)
    {
        foreach (var item in input)
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine(toString(item));
            Console.ResetColor();
            yield return item;
        }
    }
}

This should look relatively straightforward. Let’s give our query a dry-run:

image

Not too bad; yellow is logging output and white is the program’s main output. Using our overload we could add more diagnostic information as well, and let’s go crazy by adding a watch to the output of the original source:

 

var res = typeof(string).GetMethods()
    .Watch(m => "SOURCE: " + m)
    .Where(m => m.Name.StartsWith("S"))
    .Watch(m => "WHERE:  " + m.Name)
    .Select(m => m.Name)
    .Watch(m => "SELECT: " + m /* already name, see output of Select! */);

This produces the following:

image

Sweet – we get to see where objects get dropped and such. But we want more, don’t we? A first limitation of this approach is that we need to fall back to non-LINQ syntax, which makes it hard to instrument a query “on the spot”. How to solve this? We can’t add keywords obviously, so what about piggybacking on an existing keyword? Let’s pick one that preserves the type of its input when producing output. What about Where:

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate);

The predicate isn’t very satisfactory though. We need something else there, a “string”-valued predicate more specifically. Not really a predicate after all, as we’re overloading Where for debugging purposes only. If you think that every string is “true”, you have an all-pass-filter really. Let’s implement it:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return Where(input, item => item != null ? item.ToString() : "(null)");
    }

    public static IEnumerable<T> Where<T>(this IEnumerable<T> input, Func<T, string> toString)
    {
        foreach (var item in input)
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine(toString(item));
            Console.ResetColor();
            yield return item;
        }
    }
}

Hah – just change Watch into Where and you’re done. Now you get to write something like this:

var res = from m in typeof(string).GetMethods()
          /* log */ where "SOURCE: " + m
          where m.Name.StartsWith("S")
          /* log */ where "WHERE:  " + m.Name
          select m.Name into m
          /* log */ where "SELECT: " + m
          select m;

Cool – inline logging. To put a logger at the tail of the query, I had to continue the query using “into”, but that’s a relatively little detail (and at that point it doesn’t matter much as the results of the query will be fed in to the consumer as-is). And sure enough, if you try to run this, you’ll get precisely the same output as before.

 

A sophisticated logger

Maybe the above suffices, but as usual we ask ourselves the question whether we can do better than this? What would better mean anyway? Well, what about a less invasive way of doing this kind of debugging? Typically you want everything logged to make total sense of the output, so why do we have to put the “watch” between every two clauses? In addition, we might want to log information about the query too, such as string-friendly representations of predicates or so. Why? If we have a lot of those and every Where call would be logged as “where” it’s hard to distinguish them (or search for them, for that manner).

Another thing that’s useful to log is the thread of execution. LINQ is really a virtually multi-threaded execution environment. Not in the classic OS sense of the word, but in a conceptual sense. Every sequence of objects is something that can make progress independently from one another; e.g. Join can pull on two sources causing both to make progress. And those two sources by themselves might come from yet other origin sequences, such as another Join or a Concat or a SelectMany or what have you. Seeing exactly which “thread” is producing values becomes useful information to follow the flow of objects.

So, what if we could re-instrument our original query in a manner as straightforward as this?

var res = from m in typeof(string).GetMethods().Debug("source")
          where m.Name.StartsWith("S")
          select m.Name;

Here the string passed to Debug represents a friendly name for the “thread”. If we were to join in another source we could mark that one with a different name to distinguish them. We’ll come to that point later, but first: how does this work?

Obviously we need some Debug extension method on IEnumerable<T>, but how can that influence all downstream query operators? The answer is we need to get rid of the IEnumerable<T> and replace it by something more specific, so that method resolution for the query operator methods picks our hijacked versions. Doesn’t make sense yet? Let’s have a look:

public static class DebugEnumerable
{
    public static EnumerableDebugger<T> Debug<T>(this IEnumerable<T> input, string name)
    {
        return Debug<T>(input, name, Console.Out);
    }

    public static EnumerableDebugger<T> Debug<T>(this IEnumerable<T> input, string name, TextWriter logger)
    {
        return new EnumerableDebuggerSource<T>(input, logger, name);
    }

Notice the return type of the Debug<T> method. This means that the source part of our original query, as shown below, will have EnumerableDebugger<MethodInfo> as its type, instead of IEnumerable<MethodInfo>:

typeof(string).GetMethods().Debug("source")

Consecutive query operator calls will now be resolved against this EnumerableDebugger<T>, the most specific type the compiler knows about, first. Because we want the result still to be enumerable, EnumerableDebugger<T> still implements IEnumerable<T>. So all we need to do now is “overload” the query operator methods on our debugger type.

To begin with, let’s have a peek at the EnumerableDebugger<T> type itself:

public class EnumerableDebugger<T> : IEnumerable<T>
{
    protected IEnumerable<T> _source;
    protected string _name;
    private TextWriter _logger;

    internal EnumerableDebugger(IEnumerable<T> source, TextWriter logger, string name)
    {
        _source = source;
        _logger = logger;
        _name = name;
    }

    internal TextWriter Log { get { return _logger; } }
    internal string SourceName { get { return _name; } }

    public IEnumerator<T> GetEnumerator()
    {
        return GetEnumeratorCore();
    }

    protected virtual IEnumerator<T> GetEnumeratorCore()
    {
        return _source.GetEnumerator();
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    internal void LogOperator(string format, params object[] args)
    {
        Log.WriteLine("~" + _name + " " + string.Format(format, args));
    }
}

A fairly straightforward type, supporting enumeration and keeping track of a few pieces of state. The important bit is that it encapsulates an existing IEnumerable<T> and surfaces it again in some way, to be specified further. The LogOperator method logs a message, prefixing it with the source name, which indicates the thread of execution (notice the ~ syntax).

But our Debug<T> method on IEnumerable<T> returned an EnumerableDebuggerSource<T>, didn’t it? Exactly, basically to override the meaning of the enumeration code in order to do some logging:

public class EnumerableDebuggerSource<T> : EnumerableDebugger<T>
{
    internal EnumerableDebuggerSource(IEnumerable<T> source, TextWriter logger, string name)
        : base(source, logger, name)
    {
    }

    protected override IEnumerator<T> GetEnumeratorCore()
    {
        foreach (var item in _source)
        {
            LogOperator("-> {0}", item);
            yield return item;
        }
    }
}

This guy doesn’t do anything but enumerating over the original source, much like our Watch method did, using the logger infrastructure to write information about items to the screen. We use an item’s ToString method here to keep things simple (and actually specifying a special “ToString” with this approach gets harder compared to the previous “naive logger” attempt…).

So far, everything is just an all-pass-filter again. But now we need to tackle the query operators and do meaningful logging for those as well. Let’s begin with the Where operator:

public static EnumerableDebugger<T> Where<T>(this EnumerableDebugger<T> input, Expression<Func<T, bool>> predicate)
{
    string s = predicate.ToString();
    var f = (Func<T, bool>)predicate.Compile();
    Func<T, bool> log = t =>
    {
        var res = f(t);
        input.LogOperator(".Where({0}) ({1}) {2}", s, t, res ? "-> " + t : "");
        return res;
    };
    return new EnumerableDebugger<T>(Enumerable.Where(input, log), input.Log, input.SourceName);
}

Overwhelming? Not really. Follow me along. First we change the predicate to an Expression<Func<T, bool>> instead of just a Func<T, bool>. This turns the lambda expression fed in to the second argument (really the first when calling the method using extension method syntax) into an expression tree. All we need this for is to be able to “ToString” it in order to have a recognizable print-out of the operator we’re processing (e.g. when we have multiple Where calls on different “threads”). Notice we don’t care much about performance in the context of a debugger extension…

Now the implementation itself. First we compile the predicate to a delegate so we can execute it just as if it were a Func<T, bool> to begin with. This essentially invokes the C# compiler logic at runtime (the tip of the Compiler as a Service, or CaaS, iceberg). I told you the only reason to have an expression tree was to print it here, so we get rid of the code-as-data representation as soon as we can.

The lambda expression, named log, on the third line is where the magic happens. Here we close over the outer scope to evaluate the incoming item “t” (of type T) just using the predicate (“f”) the user specified, which gives us a Boolean value “res”. All we do with that is returning it, but before we do so – and that’s the crux – we emit a log entry:

.Where({0}) ({1}) {2}

with {0} the string-representation of the predicate (something like m => m.Name.StartsWith(“S”)), {1} substituted for the ToString of the item being processed and {2} the outcome of the predicate check, followed with the item again in case it succeeded. This gives you output of the form:

.Where(m => m.Name.StartsWith(“S”)) (String.Split) –> String.Split

So, the –> is generalized as “produces”. By logging through the LogOperator method, we’ll get a prefix that indicates the source we’re dependent on.

Finally, our Where method hijacked predicate is fed into the real LINQ to Objects implementation by calling Enumerable.Where:

return new EnumerableDebugger<T>(Enumerable.Where(input, log), input.Log, input.SourceName);

A few remarks here:

  • As we pass in the altered-by-adding-logging-code predicate lambda, Enumerable.Where will start calling that one, causing logging to happen. That’s what this diversion trick is all about.
  • The result of Enumerable.Where is an IEnumerable<T>, which we should not let go as-is. Why? Because in that case, subsequent query operators (like Select in the sample) would go straight to LINQ to Objects. Instead, we want to hijack that one again, so we wrap the output of Enumerable.Where in one of our EnumerableDebugger<T> objects again.
  • We pass on the source name as-is, because Where doesn’t Join existing sources or so; we’re basically still on the thread of execution that ultimately causes the original source to be pulled (more about this further on).

You can guess what Select does…

public static EnumerableDebugger<R> Select<T, R>(this EnumerableDebugger<T> input, Expression<Func<T, R>> project)
{
    string s = project.ToString();
    var f = (Func<T, R>)project.Compile();
    Func<T, R> log = t =>
    {
        var res = f(t);
        input.LogOperator(".Select({0}) ({1}) -> {2}", s, t, res);
        return res;
    };
    return new EnumerableDebugger<R>(Enumerable.Select(input, log), input.Log, input.SourceName);
}

Same deal in principle. Turn it into motion:

image

We hit the breakpoint for the “Split” method, so let’s see what has been printed to the screen before the breakpoint was hit:

image

Quite diagnostic, isn’t it? Here you can see none of the previous methods passed the Where filter. Split was the first one to go through and enter the Select projection ultimately to reach our foreach loop consuming the query. Okay, this quite verbose, but here’s everything you can dream of.

Now it becomes interesting to think about how to log other operators. After all, we need to hijack most of the standard query operators to reach full “debuggability” here. For example, a possible Skip “override” would be:

public static EnumerableDebugger<T> Skip<T>(this EnumerableDebugger<T> input, int count)
{
    string s = count.ToString();
    int n = 1;
    Func<T, T> log = t =>
    {
        input.LogOperator(".Skip({0}) #{1}{2}", s, n, (n <= count ? "" : " -> " + t));
        n++;
        return t;
    };
    return new EnumerableDebugger<T>(Enumerable.Select(input, log).Skip(count), input.Log, input.SourceName);
}

This logs numbering of items that flow through it, indicating when the threshold is reach and when the operator starts producing results. Take is obviously the dual of this.

More interesting operators are the ones that cause the query source to evaluate eagerly, because all results are needed to continue processing. A sample includes OrderBy*. To illustrate this, we can tag the output of the sorting operation by a new “source name”, so that subsequent query operators seem to pull from the intermediate result instead:

public static EnumerableDebugger<T> OrderByDescending<T, K>(this EnumerableDebugger<T> input, Expression<Func<T, K>> keyExtractor)
{
    string s = keyExtractor.ToString();
    var f = (Func<T, K>)keyExtractor.Compile();
    Func<T, K> log = t =>
    {
        var res = f(t);
        input.LogOperator(".OrderByDescending({0}) ({1}) -> {2}", s, t, res);
        return res;
    };
    return new EnumerableDebuggerSource<T>(Enumerable.OrderByDescending(input, log), input.Log, input.SourceName + ".OrderByDescending(" + s + ")");
}

So, a source tagged “~source” will be eaten by the OrderByDescending (in order to the ordering across all items in it), starting a new “virtual source” which is now tagged as “~source.OrderByDescending(m => m.Name)” for instance.

Similar situations occur when doing things like joins or concatenations:

public static EnumerableDebugger<T> Concat<T>(this EnumerableDebugger<T> first, EnumerableDebugger<T> second)
{
    return new EnumerableDebugger<T>(Enumerable.Concat(first, second), first.Log, first.SourceName + "&" + second.SourceName);
}

I’ll leave the implementation of other query operators to the reader. Below is a sample of a more sophisticated query debugger run:

var res2 = (from m in Enumerable.Range(0, 10).Debug("first")
.SkipWhile(m => m <= 5)
.TakeWhile(m => m <= 15)
.Concat(Enumerable.Range(1, 5).Debug("second")) where m % 2 == 0 select m * 2).Skip(2).OrderByDescending(m => m).Take(2).Sum(x => x + 1); Console.WriteLine(res2);

(Ugh)

image

 

Exploring LINQ further

Actually, this post is not just meant as a debugging aid, it’s also meant as a gateway to an introspection tool for LINQ queries, which I plan to refer to in future posts. One of the most exotic aspects of LINQ is its ability to have multiple range variables in scope, e.g. when having multiple from clauses, or when using a join, or by using the let keyword. Here we get in a very fascinating topic of transparent identifiers. I’m not going into this right now, but suffice to show a query using “let” with the corresponding debugger output:

var res1 = from x in Enumerable.Range(1, 2).Debug("rng")
           let y = x
           let z = x * 2
           select x + y;
Console.WriteLine(res1.Sum());

(Drums please…)

image

I’m not going to cover this today, but expect to hear more about this (and potentially on how to make this debugger tool better in the face of those foreign transparent identifiers) in the future.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Last time around in this blog category we looked at using Z3 for testing satisfiability of simple Boolean logic expressions. Obviously Z3 is capable of doing much more, so this time we’ll reach out to the domain of integer value arithmetic. Not only that, last time I subtly hinted how this story could continue … with “a bit of LINQ”. So I’m very happy to present to you today an “esoteric LINQ binding” implementation: LINQ to Z3.

 

Esoteric LINQ bindings

According to the Merriam-Webster Online Dictionary, “estoric” means:

1 a: designed for or understood by the specially initiated alone <a body of esoteric legal doctrine — B. N. Cardozo>
   b: requiring or exhibiting knowledge that is restricted to a small group <esoteric terminology> ; broadly : difficult to understand <esoteric subjects>
2 a: limited to a small circle <engaging in esoteric pursuits>
   b: private, confidential <an esoteric purpose>
3: of special, rare, or unusual interest <esoteric building materials>

While point 1 might be true (but you’ll only know for sure after reading this and subsequent posts), I’ll try to avoid it where possible: LINQ is a democracy after all. Point 2 has been true before the publication of this post, but as I’m writing this it becomes invalid: LINQ to Z3 for the masses. The final point is maybe the most accurate: “of special, rare, or unusual interest”.

Therefore I’d define esoteric LINQ bindings as “LINQ to the unexpected”. Most people live in the mindset of LINQ providers reaching out to “some collection of data locally or remotely”, in the IEnumerable<T> or IQueryable<T> sense of the word. Nothing is further away from the truth; LINQ is nothing more than a syntactical pattern that glues things – operators that is – together in a syntactically silent (as opposed to noisy) way. I sometimes use this comparison in my “LINQ in Breadth” talks:

image

What I’m talking about in the context of “esoteric LINQ bindings” is the Syntactose part – the use of language integrated keywords that reduce the noise that would be imposed on the developer otherwise:

from x in y where x.Bar == 123 select x;

becomes (and yes, in this case the select just falls away):

y.Where(x => x.Bar == 123);

Look how much noise (indicated in red) goes away. But what’s even more important is that the lambda expressions used in the queries can be translated into gigantic expressions trees to open up for runtime code-as-data inspection. In the world of C# 2.0, the above might have looked as ugly as this:

ParameterExpression x = Expression.Parameter(typeof(Demo), “x”);
Extensions.Where(y, Expression.Lambda(Expression.Equal(Expression.Property(x, typeof(Demo).GetProperty(“Bar”)), Expression.Constant(123)), x));

You’ll agree that’s not very readable. So in essence, the LINQ syntax is a very convenient gateway to simple meta-programming facilities where expression trees play a central role. And the target of such “queries” is not necessarily some persistent data store. The sky is the limit and that’s what we exploit in esoteric LINQ bindings.

Notice I’m speaking about “bindings” not “providers”, the latter term being reserved for “query providers” typically (but not necessarily) in the IQueryProvider sense.

 

Our mission

With LINQ to Z3 my goal is to abstract away from the low-level Z3 APIs and provide nice syntax with rich static typing support on top of it. The basic idea is the following:

  • Define the “shape” of a theorem, e.g. what symbolic names (and their types) to use;
  • Express constraints over those symbolic names using LINQ syntax (more specifically the where keyword);
  • Ask the resulting object for a solution, resulting in an instance of the “shape” type specified at the start.

An example:

var theorem = from t in ctx.NewTheorem<Symbols<int, int, int, int, int>>()
              where t.X1 - t.X2 >= 1
              where t.X1 - t.X2 <= 3
              where t.X1 == (2 * t.X3) + t.X5
              where t.X3 == t.X5
              where t.X2 == 6 * t.X4
              select t;
var solution = theorem.Solve();
Console.WriteLine("X1 = {0}, X2 = {1}, X3 = {2}, X4 = {3}, X5 = {4}", solution.X1, solution.X2, solution.X3, solution.X4, solution.X5);

In here ctx stands for a Z3Context object that can be used to do logging of the internal Z3 expressions being generated by the binding (just like *DataContext is used in LINQ to *). We ask the context to new up a Theorem<T> by calling NewTheorem<T>. Here we use a canned generic Symbols type that has five symbolic names (expressed as properties), each of which are typed to be int in this case.

When writing the LINQ query over this we get full IntelliSense and static typing:

image

Finally we invoke the Solve method on the theorem; notice our “LINQ expression” is therefore lazy, just like “regular” LINQ expressions are. This also means theorems are composable; we could refine the theorem above by writing one or more query expressions over the existing one:

var stronger = from t in theorem
               where t.X4 > 0
               select t;

The Solve<T> method returns an instance of T, where T in our sample stands for the Symbols<int, …, int> generic type, so we can print the results to the screen:

image

As I’ve enabled logging, you can see the Z3 expressions that were translated from the query expression. You can verify the result printed on the last line satisfies the constraints of the theorem, with many thanks to the Z3 engine.

Ultimately we even want to express more complex theorem solving domains, completely statically typed, over LINQ to Z3. Examples include solving puzzles like Sudoku, Kakuro, KenKen, etc. The sample below illustrates a LINQ query used to solve a Kakuro puzzle.

var u1 = from k in ctx.NewTheorem<Kakuro>()
         where k.Cell11.IsBlack
         where k.Cell14.IsBlack
         where k.Cell15.IsBlack
         where k.Cell24.IsBlack
         where k.Cell48.IsBlack
         where k.Cell51.IsBlack
         where k.Cell61.IsBlack
         where k.Cell85.IsBlack
         where k.Cell12.VerticalSum == 23
         where k.Cell13.VerticalSum == 30
         where k.Cell16.VerticalSum == 27
         where k.Cell17.VerticalSum == 12
         where k.Cell18.VerticalSum == 16
         where k.Cell21.HorizontalSum == 16
         where k.Cell25.VerticalSum == 17
         where k.Cell25.HorizontalSum == 24
         where k.Cell31.HorizontalSum == 17
         where k.Cell34.VerticalSum == 15
         where k.Cell34.HorizontalSum == 29
         where k.Cell41.HorizontalSum == 35
         where k.Cell47.VerticalSum == 12
         where k.Cell52.HorizontalSum == 7
         where k.Cell55.VerticalSum == 7
         where k.Cell55.HorizontalSum == 8
         where k.Cell58.VerticalSum == 7
         where k.Cell62.VerticalSum == 11
         where k.Cell63.VerticalSum == 10
         where k.Cell63.HorizontalSum == 16
         where k.Cell71.HorizontalSum == 21
         where k.Cell76.HorizontalSum == 5
         where k.Cell81.HorizontalSum == 6
         where k.Cell86.HorizontalSum == 3
         select k;

The textual representation of this Kakuro is shown below:

+--------+--------+--------+--------+--------+--------+--------+--------+
| ****** | 23     | 30     | ****** | ****** | 27     | 12     | 16     |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     16 |        |        | ****** | 17  24 |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     17 |        |        | 15  29 |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     35 |        |        |        |        |        | 12     | ****** |
+--------+--------+--------+--------+--------+--------+--------+--------+
| ****** |      7 |        |        | 7    8 |        |        | 7      |
+--------+--------+--------+--------+--------+--------+--------+--------+
| ****** | 11     | 10  16 |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     21 |        |        |        |        |      5 |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|      6 |        |        |        | ****** |      3 |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+

And here’s the output of our solver tackling the theorem. Notice the amount of constraints (it scrolls up!) inferred from this purely declarative Kakuro specification using LINQ:

image

By the end of this series you’ll know exactly how all of this works. We’ll go even more crazy and use MGrammar to parse textual representations of puzzles like Kakuro to turn them into an executable piece of code that dynamically builds up LINQ expression trees that are then fed to LINQ to Z3 in order to be solved. And, for the Sudoku and KenKen puzzles we’ll read the specification from regular Word tables:

image

Yes, we’ll “execute” the document above to solve the KenKen puzzle inside it using Z3:

var k3 = KenKen4By4.FromWord(ctx, @"C:\Users\Bart\Desktop\KenKen.docx");
Console.WriteLine(k3.Solve());
Console.WriteLine();

and here’s the verbose logging output with the result printed at the bottom:

image

Word-to-KenKen-to-LINQ-to-Z3 that is: composability pushed to its limits. But let’s try to walk before we start running; first things first: the query pattern.

 

The LINQ to Z3 query pattern

Today we’ll have a look at how we establish the query pattern with all IntelliSense while writing the query expression. As we’ve seen before, all we need is the where operator to add constraints to a theorem. But before we can go there, we need to know how to represent a Theorem type. Every theorem has a variable number of symbols it puts constraints on: some theorem might have two Boolean symbols, another might have five integer values, or any mix of the two. There are more types the theorem prover can deal with, such as uninterpreted functions, but to keep things simple we’ll omit those for now.

 

Instantiating Theorem objects – Option 1

In order for the user to declare the “environment” of a theorem, i.e. the symbols and their types, we can take two different approaches. One is based on nominal types, where the user creates a Theorem<T> with T being a type that has the environment defined as a set of properties or so:

class TwoBools
{
    public bool x { get; set; }
    public bool y { get; set; }
}

I’m violating the design guidelines by having properties that do not begin with a capital letter (Pascal-casing), as theorem variables have a mathematical meaning.

But having to define such types every time you want to write a theorem can be painful, so we want more convenient ways to set the scene. One approach is to have a couple of canned generic types with 1 to x number of symbols pre-defined using generic parameter, likeso:

/// <summary>
/// Pre-defined enviroment with three symbols.
/// </summary>
/// <typeparam name="T1">Type of the first symbol.</typeparam>
/// <typeparam name="T2">Type of the second symbol.</typeparam>
/// <typeparam name="T3">Type of the third symbol.</typeparam>
public sealed class Symbols<T1, T2, T3> : Symbols
{
    /// <summary>
    /// Provides access to the first symbol.
    /// </summary>
    /// <remarks>
    /// Used within definition of theorem constraints and to print the theorem prover's result.
    /// </remarks>
    public T1 X1 { get; private set; }

    /// <summary>
    /// Provides access to the second symbol.
    /// </summary>
    /// <remarks>
    /// Used within definition of theorem constraints and to print the theorem prover's result.
    /// </remarks>
    public T2 X2 { get; private set; }

    /// <summary>
    /// Provides access to the third symbol.
    /// </summary>
    /// <remarks>
    /// Used within definition of theorem constraints and to print the theorem prover's result.
    /// </remarks>
    public T3 X3 { get; private set; }
}

The base type simply overrides ToString to provide pretty printing of all the type’s properties:

/// <summary>
/// Base class for symbol container types.
/// </summary>
public class Symbols
{
    /// <summary>
    /// Returns a friendly representation of the symbols and their values.
    /// Used to print the output of a theorem solving task.
    /// </summary>
    /// <returns>Friendly representation of the symbols and their values.</returns>
    public override string ToString()
    {
        var propertyValues = from prop in GetType().GetProperties()
                             let value = prop.GetValue(this, null)
                             select prop.Name + " = " + (value ?? "(null)");
        return "{" + string.Join(", ", propertyValues.ToArray()) + "}";
    }
}

While this is already a bit more convenient, it still gets quite verbose and the symbol names are fixed, not quite aligning with the theorem to be expressed:

var theo = from t in ctx.NewTheorem<Symbols<int, int, int, int, int>>()
           where t.X1 - t.X2 >= 1
           where t.X1 - t.X2 <= 3
           where t.X1 == (2 * t.X3) + t.X5
           where t.X3 == t.X5
           where t.X2 == 6 * t.X4
           select t);

Maybe the names X1 to X5 are quite accurate here but we need any number of generic Symbols<…> types (for a Sudoku we’d need one with 81 parameters, and X10 would stand for second row, first column, kind of ugly), so we’d also like to be able to use C#’s capability to create anonymous types (which are somewhat the opposite of nominal types, although they’re not quite structural either – the CLR doesn’t have structural typing after all – so let’s just continue to call them anonymous types because that’s what they are :-)) on the fly.

So, instead of specifying a type by its name, we’d like to be able to specify the skeleton “inline”:

ctx.NewTheorem<class { int X; int Y; }>

doesn’t work though, so we need something a bit smarter. Instead we want the type inference algorithm of C# to help us out. Say we have a NewTheorem<T> method that takes in something of that type T, then we can call the method as follows:

ctx.NewTheorem(123)

without having to specify the type T explicitly as it gets inferred from the first argument. In the sample above it would be inferred to “int”. This also allows us to specify an anonymous type without stating it, like this:

ctx.NewTheorem(new { x = default(int), y = default(int) })

Think away the default(…) parts an it almost looks like ML-style of variable declaration: x : int, y : int. This has the slight disadvantage of creating an instance of the anonymous type, just to infer its type (as you’ll see further on in this series, we won’t use the parameter value at all). If you really want to get rid of that as well, you could go for a Func<T>:

Theorem<T> NewTheorem<T>(Func<T> ctor) { … }
ctx.NewTheorem(() => new { x = default(int), y = default(int) });

But that’s overkill (and yet a bit more of noise).

 

Instantiating Theorem objects – Option 2

An alternative, but more complex to implement, way would be to use LINQ syntax “all the way” to establish a theorem proving environment context, using the let keyword. Instead of writing the following:

var theo = from t in ctx.NewTheorem(new { x = default(int), y = default(int) })
           where t.x == t.y
           select t;

we’d now write this:

var theo = from t in ctx.NewTheorem()
           let x = t.MakeInt()
           let y = t.MakeInt()
           where x == y
           select new { x, y };

In the first fragment the range variable t is of type “whatever the anonymous type specified was”, hence t.x and t.y resolve to the properties declared in there. In the second fragment, t would be some “theorem constructor assistant” type that has factory methods for the supported theorem symbol types, like Int or Bool. This has several advantages:

  • The user is helped not to introduce types that do not make sense (although the let clauses could be used to bring other types in scope);
  • The where constraint clause looks very natural.

However, the select part gets a little more involved if we want to extract the resulting values straight away. The main reason I’m not going for this approach right now is its complexity internally. The fragment above translates into:

var theo = ctx.NewTheorem()
           .Select(t => new { t, x = t.MakeInt() })
           .Select(*1 => new { *1, y = *1.t.MakeInt() })
           .Where(*2 => *2.*1.x == *2.y)
           .Select(*2 => new { x = *2.*1.x, *2.y });

What the *1, *2, you might think. What you’re seeing here are transparent identifiers at work (see the * notation in my C# 3.0 Query Expression Translation Cheat Sheet). This is a very powerful feature of C# 3.0 that allows you to write seamless query comprehensions code with multiple range variables in scope, but dealing with it from a LINQ binding’s perspective gets a bit complicated. I might go there in a future post though, but for now we’ll stick with option 1.

 

Implementing Where

As stated earlier, we’re about to use the “where” clause as a way to specify constraints. In other words, we need a suitable Where method so that LINQ can party on it. Where does that Where method needs to live? Obviously on the return type of the NewTheorem<T> method, which – as we saw above – is a Theorem<T> on the given context object (to be specified later how the context object looks like). Calling Where should put us back in the world of Theorem<T>, so that additional constraints can be specified again.

In order to distinguish the generic-driven query pattern from the underlying theorem solving implementation, we’ll have Theorem<T> derive from Theorem, where the latter will keep track of the collection of applied constraints.

Now the important bit first: applying Where on a Theorem<T> should provide a new instance of a Theorem<T> instead of modifying the existing one. This is the case for all LINQ implementations as the query pattern is arbitrarily composable:

var expensive = from p in products
                where p.Price > 100m
                select p;

var expensiveBeverages = from p in expensive
                         where p.Category == Category.Beverages
                         select p;

var expensiveFruits = from p in expensive
                      where p.Category == Category.Fruits
                      select p;

If the second query’s Where method were to mutate the “expensive” object by adding more predicates to it, the expensiveFruits query would now query over expensive beverages because expensive got mutated. This is clearly not what we want, so every application of a query operator should provide a new instance of a query object.

This brings us to the implementation part. Notice I’ve weeded out any of the theorem translating code as that will be subject of the next few posts:

/// <summary>
/// Representation of a theorem with its constraints.
/// </summary>
public class Theorem
{
    /// <summary>
    /// Z3 context under which the theorem is solved.
    /// </summary>
    private Z3Context _context;

    /// <summary>
    /// Theorem constraints.
    /// </summary>
    private IEnumerable<LambdaExpression> _constraints;

    /// <summary>
    /// Creates a new theorem for the given Z3 context.
    /// </summary>
    /// <param name="context">Z3 context.</param>
    protected Theorem(Z3Context context)
        : this(context, new List<LambdaExpression>())
    {
    }

    /// <summary>
    /// Creates a new pre-constrained theorem for the given Z3 context.
    /// </summary>
    /// <param name="context">Z3 context.</param>
    /// <param name="constraints">Constraints to apply to the created theorem.</param>
    protected Theorem(Z3Context context, IEnumerable<LambdaExpression> constraints)
    {
        _context = context;
        _constraints = constraints;
    }

    /// <summary>
    /// Gets the Z3 context under which the theorem is solved.
    /// </summary>
    protected Z3Context Context
    {
        get
        {
            return _context;
        }
    }

    /// <summary>
    /// Gets the constraints of the theorem.
    /// </summary>
    protected IEnumerable<LambdaExpression> Constraints
    {
        get
        {
            return _constraints;
        }
    }

    /// <summary>
    /// Solves the theorem using Z3.
    /// </summary>
    /// <typeparam name="T">Theorem environment type.</typeparam>
    /// <returns>Result of solving the theorem; default(T) if the theorem cannot be satisfied.</returns>
    protected T Solve<T>()
    {
// To be covered in the next posts.
}
    /// <summary>
    /// Returns a comma-separated representation of the constraints embodied in the theorem.
    /// </summary>
    /// <returns>Comma-separated string representation of the theorem's constraints.</returns>
    public override string ToString()
    {
        return string.Join(", ", (from c in _constraints select c.Body.ToString()).ToArray());
    }
}

So, the non-generic Theorem type is simply a container for a set of constraints, each of which is a lambda expression, e.g. t => t.x == t.y. Its generic counterpart, Theorem<T>, provides the strongly typed query pattern – just Where in our case – on top of it:

/// <summary>
/// Strongly-typed theorem type for use with LINQ syntax.
/// </summary>
/// <typeparam name="T">Enviroment type over which the theorem is defined.</typeparam>
public class Theorem<T> : Theorem
{
    /// <summary>
    /// Creates a new theorem for the given Z3 context.
    /// </summary>
    /// <param name="context">Z3 context.</param>
    internal Theorem(Z3Context context)
        : base(context)
    {
    }

    /// <summary>
    /// Creates a new pre-constrained theorem for the given Z3 context.
    /// </summary>
    /// <param name="context">Z3 context.</param>
    /// <param name="constraints">Constraints to apply to the created theorem.</param>
    internal Theorem(Z3Context context, IEnumerable<LambdaExpression> constraints)
        : base(context, constraints)
    {
    }

    /// <summary>
    /// Where query operator, used to add constraints to the theorem.
    /// </summary>
    /// <param name="constraint">Theorem constraint expression.</param>
    /// <returns>Theorem with the new constraint applied.</returns>
    public Theorem<T> Where(Expression<Func<T, bool>> constraint)
    {
        return new Theorem<T>(base.Context, base.Constraints.Concat(new List<LambdaExpression> { constraint }));
    }

    /// <summary>
    /// Solves the theorem.
    /// </summary>
    /// <returns>Environment type instance with properties set to theorem-satisfying values.</returns>
    public T Solve()
    {
        return base.Solve<T>();
    }
}

Notice how the Where method creates a new instance of Theorem<T>, passing in the original context as well as the concatenation of the current constraints with the new constraint.

 

A look at the Z3Context type

Finally to finish up this first post, we’ll have a peek at the Z3Context type, from which everything starts by creating Theorem<T> instances out of it:

/// <summary>
/// Context object for Z3 theorem proving through LINQ. Manages the configuration
/// of the theorem prover and provides centralized infrastructure for logging.
/// </summary>
public sealed class Z3Context : IDisposable
{
    /// <summary>
    /// Z3 configuration object.
    /// </summary>
    private Config _config;

    /// <summary>
    /// Creates a new Z3 context for theorem proving.
    /// </summary>
    public Z3Context()
    {
        _config = new Config();
        _config.SetParamValue("MODEL", "true");
    }

    /// <summary>
    /// Gets/sets the logger used for diagnostic output.
    /// </summary>
    public TextWriter Log { get; set; }

    /// <summary>
    /// Creates a new theorem based on the given type to establish the environment with
    /// the variables constrained by the theorem.
    /// </summary>
    /// <typeparam name="T">Theorem environment type.</typeparam>
    /// <returns>New theorem object based on the given environment.</returns>
    public Theorem<T> NewTheorem<T>()
    {
        return new Theorem<T>(this);
    }

    /// <summary>
    /// Creates a new theorem based on a skeleton object used to infer the environment
    /// type with the variables constrained by the theorem.
    /// 
    /// This overload is useful if one wants to use an anonymous type "on the fly" to
    /// create a new theorem based on the type's properties as variables.
    /// </summary>
    /// <example>
    /// <code>
    /// ctx.NewTheorem(new { x = default(int), y = default(int) }).Where(t => t.x > t.y)
    /// </code>
    /// </example>
    /// <typeparam name="T">Theorem environment type (typically inferred).</typeparam>
    /// <param name="dummy">Dummy parameter, typically an anonymous type instance.</param>
    /// <returns>New theorem object based on the given environment.</returns>
    public Theorem<T> NewTheorem<T>(T dummy)
    {
        return new Theorem<T>(this);
    }

    /// <summary>
    /// Closes the native resources held by the Z3 theorem prover.
    /// </summary>
    public void Dispose()
    {
        _config.Dispose();
    }

    /// <summary>
    /// Factory method for Z3 contexts based on the given configuration.
    /// </summary>
    /// <returns>New Z3 context.</returns>
    internal Context CreateContext()
    {
        return new Context(_config);
    }

    /// <summary>
    /// Helpers to write diagnostic log output to the registered logger, if any.
    /// </summary>
    /// <param name="s">Log output string.</param>
    internal void LogWriteLine(string s)
    {
        if (Log != null)
            Log.WriteLine(s);
    }
}

There’s the first little bit of Z3 coming in, with the constructor new’ing up a Z3 context object. It’s this context that will provide the entry-point to Z3 theorem solving going forward. Given all of this machinery we can already start writing code like this:

using (var ctx = new Z3Context())
{
    ctx.Log = Console.Out;

    var theo = from t in ctx.NewTheorem(new { x = default(bool), y = default(bool) })
               where t.x ^ t.y
               select t;

    var result = theo.Solve();
    Console.WriteLine(result);
}

Oh, and by the way, we don’t have a Select method implemented on Theorem<T>. Even though the query above has a projection specified, Select is not required as the projection is a trivial one which gets compiled away by the compiler. When and why this happens you can read in another blog post of mine.

Solve won’t do much yet, but that’s what we’ll be talking about next time. Oooh the suspense :-).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

A while back I posted Type Theory Essentials in Pictures – Part 1 – Quiz, a series of non-annotated pictures representing concepts from type theory. Though the first part was intended to be simple, I got some feedback that people were puzzled about it… So let’s go ahead and explain it today. Next time, we’ll get into more advanced concepts; hopefully this first batch of answers will make subsequent parts easier to absorb.

Here we go:

Figure 1

image

Dotted lines indicate the abstract notion of some type, e.g. T. Given the type, we can create an instance of it (through an instance constructor that is, hence the factory in the middle). An instance therefore is indicated as a solid body. So far, we’ve just stated there is a way to go from type to instance, nothing more. We haven’t yet said what that type is. Everything is still very abstractly defined as “an instance” of “a type”, hence the black and white.

 

Figure 2

image

Here we’ve gotten more concrete (= colorful). The dotted lines have been traded for solid lines and our type T has been substituted for something real: Block. Our factory is very concrete as well, there’s a real constructor somewhere that knows how to create new blocks. The output of this is a concrete instance, in this case a blue Lego block.

 

Figure 3

image

Besides values there are functions that act upon data. The most abstract notion of this is a function going from some type T to some type R. What those types are is still unknown, and so is a possible implementation of such a function. Hence everything is abstract, still dotted. The block on the left represents type T, the one on the right type R. The sad face in the middle is a unary function: it has two arms but one is disabled (meaning it has one arm to grab something of type T and transform it somehow to produce something of type R). The function by itself is still abstract, as its type parameters are not concrete either.

 

Figure 4

image

With dotted lines gone, we enter the world of concrete types again. T has been substituted for Block and R for Pin. Now we have “a function” that takes a block and produces a pin. We haven’t said how it does that transformation though; the function is still sad as it doesn’t have any concrete value to play with.

 

Figure 5

image

First we need to concretize the function, saying how it can take “some block” and produce “some pin”. A practical implementation of such a function could be a kid that demolishes a Lego block and returns the top left pin of it. The functional kid is slightly more happy because it already knows how to play with blocks (in a deterministic way though, given any Block it will produce the same corresponding Pin), but it’s still sad enough not to have concrete blocks to play with. But that’s changing now…

 

Figure 6

image

Now our functional kid got a blue Lego block to play with: everything’s happy. Given the blue block, we get the top left (obviously still blue) pin back. So far for unary functions: we’ve gone from a parameterized type Func<T, R> to a concrete function type Func<Block, Pin>, have implemented it next and finally fed it some data to act upon.

 

Figure 7

image

Same story as before, but this time for a binary function, one that takes two arguments (notice how the second arm is not disabled). Everything is still abstract.

 

Figure 8

image

Getting more concrete again, having a function that can take two blocks and produce a stack of blocks out of it. Notice the shape of the data is already known: a Stack type likely has two Block fields inside it.

 

Figure 9

image

Next we give the concrete implementation of such a function. We assume there’s a binary + operator available that acts upon Block objects to produce a Stack of the given blocks. Alternatively you could think of “new Stack(b1, b2)” as a possible implementation for the function.

 

Figure 10

image

Finally our function can play with real blocks: given a blue block and a red block, a stack of those blocks is produced with the former block placed on top of the latter block.

 

Figure 11

image

This is a bit more of a brainteaser. Here we’re currying a function: we specify not enough arguments for the function to produce a final value. Instead, we get a remainder function back. In this case we have a function to produce stacks but we only fed it one block. The result is a unary function that takes the remainder block and puts it below the already provided block.

The syntax in the title is not valid in C#: there’s no built-in support to apply a function partially. Instead you could write:

Func<Block, Block, Stack> playingKid = (b1, b2) => b1 + b2);
Block blueBlock = new Block(Color.Blue);
Func<Block, Stack> partiallyPlayingKid = otherBlock => playingKid(blueBlock, otherBlock);

The bold part indicates the explicit burden put on the developer to carry out the currying manually.

Note: Actually none of the function titles are valid C# code. A lambda expression like “(int a) => a” doesn’t have a type by itself because of various reasons (one is there are two ways to represent lambdas: as delegates or as expression trees).

 

Figure 12

image

Function composition. The big circle should be read as “after”. Here it says to compose a function from some type S to some type R with another function from some type T to some type S, by feeding the output from the latter to the input of the former:

Func<S, R> O Func<T, S>

Again the different types are indicated by different line dotting styles. Notice though how the types in the middle have to match. In mathematical terms you’d say that the range (= output type) of the first should correspond with the domain (= input type) of the second:

f : T –> S
f(t) = t + 3

g : S –> R
g(s) = s * 2

g o f : T –> R
(g o f)(t) = g(f(t)) = (t + 3) * 2

Or stated otherwise, going from T to S and S to R makes it possible to ignore the intermediate phase of S and hide it as an implementation detail by applying function application: the resulting function simply goes from T to R.

Note: I’ve omitted the smiley faces from this point on. Black solid arrows represent the concept of functions.

 

Figure 13

image

The last picture was an abstract notion of function composition: everything was still generically typed. To make it more concrete, we substitute the type parameters for concrete types. If we have a kid that removes the pins of a block (how it does that precisely is not stated yet; it could hammer the pins off or pull them off or set them on fire or …), leaving the “body” of the block, and another one that pulls some side (which one is yet not specified) off the body, we can make them play together in a room: the input of which is a Block and the output of which is a Side. We don’t need to know the kids implement it by dividing the tasks among themselves.

I’ve indicated “some side” as a titled and rotated rectangle. It’s yet unknown what the precise side will be for a particular implementation: it could be the top or the bottom, the left or the right, the front or the rear.

 

Figure 14

image

Things get concrete now. The first kid is implemented by returning the “Body” property of the block (how that one works you have to imagine yourself, as long as it has some concrete implementation), while the second one chops of the floor from the body. The result is that given a Block x, we ultimately get back its Floor side. The intermediate state of Body is irrelevant to the outsider.

Look how the types “flow” in the title.

1.                     (Block b => b.Body)(x)
2. (Body c => c.Floor)(              """"    )

Given a block, we get the body, which we feed into the function that produces a floor.

 

Figure 15

image

Feed in a blue block to this whole thing and back you get its blue floor (assuming the – mostly invisible – floors of Lego blocks have the same color of the rest of the block).

Composition is the single most important concept when dealing with complexity: implement the simpler parts and glue them together to make something more complex. We’ll see more of that in a next series.

Happy puzzling!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Today I’m going to deviate from the typical path of mainstream technologies I normally cover and enter the domain of research instead. But very interesting and active research, with lots of practical implementations it turns out: theorem provers.

A little story first. I guess it should have been a couple of years ago now that I first came in touch with Z3 rather accidentally when browsing the MSR website. At the time, I explored it just a little and was pretty excited about it but never go around to write about it. On my last speaker tour however, I met Peli, a fellow Belgian here at Microsoft in Redmond, over at Microsoft Research (MSR) working on the Pex project (remember me to blog about Pex and Code Contracts in .NET 4.0). Since Pex uses Z3 internally you can guess the rest of the story and how I got inspired to dig Z3 up again and write this post.

Enrich your vocab – Satisfiability Modulo Theories

So what’s Z3 about? In two words: it’s an SMT solver. SMT stands for Satisfiability Modulo Theories, but let’s elaborate on those expensive words a bit to make things clear, going from right to left:

  • Theories are domains within which we want to verify certain properties, ideally in an automated fashion. Z3 supports a number of built-in theories such as integer arithmetic, arrays and bit-vectors, free functions, etc. In practice those theories are mapped onto domain-specific models where theorem proving is required, e.g. what are the conditions to hit a certain code path (for test code generation as in Pex), can certain predicates be violated by a method’s execution (for Hoare triples with pre- and post-conditions, or invariants as in Spec#), do certain properties have a possibility to be true simultaneously (indicating a bug, caught by verifying compilers like Boogie), or can a code block ever be hit (allowing dead code elimination by optimizing compilers);
  • Modulo is simply glue which can be read as “under the presence of certain”;
  • Satisfiability is where the fun comes in. We have our theories as an ingredient for an SMT solver, but with a theory alone we can’t do much. Well, we could feed in all possible inputs to build a table of potential outcomes. That’s madness because of exponential growth for table sizes in terms of the number of inputs. Even more, often we’re not interested in every single input: we rather want to know whether it’s even possible for a certain formula to hold ever under the given theories. This is what satisfiability stands for, and it’s a hard problem many mathematicians have dealt with (to get a broader context search for terms like “Entscheidungsproblem”, “decidability” or for names of famous mathematicians like Hilbert, Godel, Church and Turing, Kleene and Rosser, amongst others).

To illustrate this has been applied very practically, a few samples of success stories:

  • Hyper-V, our flagship virtualization platform foundation, was built using a verifying C compiler (VCC) making it possible to catch issues with heap corruption (provability of correctness is a great deal there, to avoid discussions where guest OS’es blame the host OS for memory corruption) and such;
  • Driver verification in Windows 7 (SDV 2.0) leverages Z3;
  • Spec# and Pex use Z3 for verification and test case generation purposes respectively – this extends to the domain of Code Contracts where static verification tools are integrated with Visual Studio to catch code contract violations early;
  • Other domains include correctness checks for security stack implementations, cryptography algorithms, verification of garbage collectors (which are ultimately manipulators of large byte arrays) e.g. as used in Singularity for core OS services, program termination analysis, etc.

In exciting domains as theorem solving, you can guess there are competitions too… SMT-COMP’08 is one such competition, where Z3 scored very high results.

Getting started with Z3

The obvious first step towards exploring Z3 is to get your hands on the bits, which you can find over here. I’ll be using the 1.3 release of Z3 throughout this post. Please read the license that goes with Z3 as use of research technology has certain restrictions (I’m not a lawyer, so I’m not trying to paraphrase it in my own non-legalese :-)).

The installation will install binaries, samples and documentation to %programfiles%\Microsoft Research\Z3-1.3.6. Two files are of particular interest to our first exploration in this post:

image

  • z3.exe is the command-line theorem prover. It can accept problem inputs in specialized formats used in the domain of SMTs.
  • Microsoft.Z3.dll is a managed wrapper around the theorem solver, allowing any .NET language to call into Z3 as a library.

Using the Z3.exe command-line tool

Before we get into doing any coding ourselves, let’s just use the built-in command-line tool first. As this is an introductory post I’ll just cover a very basic subset of the possibilities of the tool. More specifically, I’ll restrict myself to the domain of Boolean logic – you know those things with true and false, conjunctions, disjunctions and negations – and just the subset of propositional logic. This by itself is an interesting study, but be aware that the theories Z3 can tackle are much richer than just this.

Anyway, what we want to do is ask the theorem prover to try to find a set of assignments for variables that make a certain formula hold, in other words that satisfy the criterion expressed by the formula under the theory of Boolean logic. One way to express this is by using an input file in a format Z3 understands. We’ll use the SMT-LIB format and write a file that states the problem of trying to satisfy the formula:

a xor b

In other words, try to find values for a and b for which this formula evaluates to true. We can translate this into SMT-LIB format by declaring a benchmark, the domain-specific lingo for a logical formula that will be checked for satisfiability against given background theories:

(benchmark demo
:status sat
:logic QF_UF
:extrapreds ((a) (b))
:formula (xor a b)
)

This simply means we’re declaring a benchmark with name “demo” which we check for satisfiability (“sat”) using the QF_UF theory (we could use other theories here, see theories for a full list, but don’t worry too much about this for now) with predicates a and b and finally specifying our formula to be checked.

We save this specification to a .smt file and feed it in to z3 as shown below:

image

Here the /m command-line flag means to display a model, i.e. a set of inputs satisfying our formula, allowing us to check the result. Alternatively we could omit the /m flag in which case z3 will just print “sat” to signal it has been able to prove the formula. In the run above we chose to display the model and found a solution where a is chosen to be true and b chosen to be false.

Exercise for the reader: test a tautology (like a or not a) and a contradiction (like a and not a) and see what happens. Try changing the :status to unsat and see what happens.

Say it with code

Z3 can also be used as a library either from native or managed code. For the purpose of this post, we’ll explore the managed code route. In a simple C# console application project, add a reference to the Microsoft.Z3.dll file in the location we’ve shown earlier. Next, import the Microsoft.Z3 namespace. With this set up, we’re ready to write code targeting Z3. Below is the simplest possible sample:

using System;
using Microsoft.Z3;

namespace Z3Demo
{
    class Program
    {
        static void Main()
        {
            using (Config config = new Config())
            {
                config.SetParamValue("MODEL", "true"); // corresponds to /m switch

                using (Context context = new Context(config))
                {
                    IntPtr a = context.MkConst("a", context.MkBoolType());
                    IntPtr b = context.MkConst("b", context.MkBoolType());

                    IntPtr xor = context.MkXor(a, b);
                    context.AssertCnstr(xor);

                    Console.WriteLine(context.ToString(xor)); // (xor a b)

                    Model model = null;
                    LBool result = context.CheckAndGetModel(ref model);
                    Console.WriteLine(result);

                    if (model != null)
                    {
                        using (model)
                        {
                            bool aVal = model.GetBoolValueBool(model.Eval(a));
                            bool bVal = model.GetBoolValueBool(model.Eval(b));
                            Console.WriteLine("a = {0}, b = {1}", aVal, bVal);
                        }
                    }
                }
            }
        }
    }
}

Notice this API reflects the nature of the unmanaged (well, native) beast underneath it; the result of creating objects through the context is handle-based, hence the IntPtr use everywhere. The flow should be pretty clear: once we have the context, we build up our formula expression to be checked. Next, we assert the created constraint to the context: this allows multiple constraints to be solved together (something we’re not doing yet in this sample). Finally we check the constraints and get a model back, assuming the result says “True” meaning satisfiability has been proven.

Executing this prints the following:

image

 

Just as we expected, didn’t we?

Note for 64-bit users. Z3 relies on the VC runtime and has unmanaged code that requires to run as a 32-bit process. Just compiling your code as an IL-only assembly will cause the resulting executable to fail loading on a 64-bit machine as the CLR will JIT compile it into 64-bit at runtime. To fix this, either compile the assembly using the /platform:x86 flag (available in Visual Studio through the project properties) or run corflags.exe on the build output to force it as a 32-bit process:

image

Exercise to the reader: try out a few other propositions, possibly with more variables.

LINQ, you say?

Did see the forest through the (expression) trees? What we just did was declaring an Boolean-typed expression tree:

var a = context.MkConst("a", context.MkBoolType());
var b = context.MkConst("b", context.MkBoolType());

var xor = context.MkXor(a, b);

Sounds familiar? True, LINQ expression trees have similar capabilities. Simply replace context by Expression, MkConst by Parameter, MkBoolType by typeof(bool) and MkXor by ExclusiveOr and you end up with a LINQ expression tree:

var a = Expression.Parameter(typeof(bool), "a");
var b = Expression.Parameter(typeof(bool), "b");

var xor = Expression.ExclusiveOr(a, b);

Of course, this only works because we’re using a fairly trivial subset of operations Z3 supports (more about that another time), but for Boolean propositions we could go with LINQ expression trees. Wouldn’t it be nice if we could write something like this:

static void Main()
{
    Solve((a, b) => a && b);
    Solve((a, b) => !(a && b));
    Solve((a, b) => a || b);
    Solve((a, b) => !(a || b));
    Solve((a, b) => a ^ b);
    Solve((a, b) => !(a ^ b));
    Solve(a => a && !a);
    Solve(a => a || !a);
}

and have it print the parameter values for which the lambda bodies evaluate to true? Guess what, let’s give it a try now. First we’ll add a couple of convenience overloads. As lambda expressions by themselves do not have a type (can either be delegates or expression trees representations thereof), it’d be quite tedious to have to write the full type name (Expression<Func<…>>) every time we want to solve for a particular lambda expression. So, a few overloads will help significantly:

private static void Solve(Expression<Func<bool, bool, bool>> ex)
{
    SolveCore(ex);
}

private static void Solve(Expression<Func<bool, bool>> ex)
{
    SolveCore(ex);
}

This just calls into a SolveCore method that takes in the expression tree as a LambdaExpression object, so we can treat propositions with any number of “arguments” in exactly the same way. What we do inside this method is as follows:

private static void SolveCore(LambdaExpression f)
{
    Console.WriteLine(f);
    Dictionary<ParameterExpression, bool> solution;
    if (new LogicConstraintSolver(f).Solve(out solution))
    {
        foreach (var parameter in solution.Keys)
        {
            Console.WriteLine(parameter.Name + " = " + solution[parameter]);
        }
    }
    else
    {
        Console.WriteLine("No solution.");
    }
    Console.WriteLine();
}

The goal of this code is simple: given the lambda expression representing the proposition, go ahead and solve it, returning mappings from the parameters of the lambda expression (a, b, etc) onto the satisfying Boolean values.

What does the LogicConstraintSolver look like? Essentially a visitor over the expression tree that was fed in, making sure we don’t don anything but elementary Boolean logic, and cross-compiling the expression tree into calls on a Z3 Context object.

class LogicConstraintSolver
{
    private LambdaExpression _ex;

    public LogicConstraintSolver(LambdaExpression ex)
    {
        _ex = ex;
    }

Oh well, made nothing static for some reason. Just a proof of concept after all, and ideally you’d grab some stock visitor base type to derive from requiring instance members anyway. The next method is more interesting, I promise:

    public bool Solve(out Dictionary<ParameterExpression, bool> solution)
    {
        solution = null;

        using (Config config = new Config())
        {
            config.SetParamValue("MODEL", "true");

            using (Context context = new Context(config))
            {
                var environment = GetEnvironment(context);

                IntPtr constraint = Visit(context, environment, _ex.Body);
                context.AssertCnstr(constraint);

                Model model = null;
                LBool result = context.CheckAndGetModel(ref model);

                if (model != null)
                {
                    using (model)
                    {
                        solution = GetSolution(model, environment);
                    }
                }

                return result == LBool.True;
            }
        }
    }

Macroscopically the same as our original hard-coded XOR implementation, but with the core replaced:

  1. First we get the environment, which will be a dictionary of ParameterExpression to IntPtr (sorry for the over-use of “var” in this sample). This maps the lambda parameters on the handles for the corresponding Z3 “constants” (returned from MkConst calls).
  2. A visitor method walks the expression tree recursively, turning it into a handle to the Z3 representation of the proposition being cross-compiled. To do so, we need the context (for use of its factory methods) and the environment from above.
  3. Finally we add the constraint, check the model, and get a solution which is again a dictionary but this time mapping the lambda parameters onto their corresponding truth values required to satisfy the proposition.

Now, let’s investigate each of those helper methods. They’re fairly trivial in fact:

    private Dictionary<ParameterExpression, IntPtr> GetEnvironment(Context context)
    {
        var environment = new Dictionary<ParameterExpression, IntPtr>();

        foreach (var parameter in _ex.Parameters)
        {
            if (parameter.Type != typeof(bool))
                throw new NotSupportedException("Non-Boolean parameter.");

            environment.Add(parameter, context.MkConst(parameter.Name, context.MkBoolType()));
        }

        return environment;
    }

No magic here; just enforcing every parameter to be Boolean-typed and calling into the context to construct constants with a Bool type, ultimately storing those in a mapping table. Next is our visitor.

    private IntPtr Visit(Context context, Dictionary<ParameterExpression, IntPtr> environment, Expression expression)
    {
        switch (expression.NodeType)
        {
            case ExpressionType.And:
            case ExpressionType.AndAlso:
                return VisitBinary(context, environment, (BinaryExpression)expression, (ctx, a, b) => ctx.MkAnd(a, b));
            case ExpressionType.Or:
            case ExpressionType.OrElse:
                return VisitBinary(context, environment, (BinaryExpression)expression, (ctx, a, b) => ctx.MkOr(a, b));
            case ExpressionType.ExclusiveOr:
                return VisitBinary(context, environment, (BinaryExpression)expression, (ctx, a, b) => ctx.MkXor(a, b));
            case ExpressionType.Not:
                return VisitUnary(context, environment, (UnaryExpression)expression, (ctx, a) => ctx.MkNot(a));
            case ExpressionType.Parameter:
                return VisitParameter(context, environment, (ParameterExpression)expression);
            default:
                throw new NotSupportedException("Invalid logic expression.");
        }
    }

Here we’re a bit functionally inspired. We just generalize the binary and unary cases, passing in a little lambda that given a context and the arguments can invoke the right factory method on the context object. Of course those arguments need to be filled in with concrete values, which is what the VisitBinary and VisitUnary methods are responsible for, based on recursion:

    private IntPtr VisitBinary(Context context, Dictionary<ParameterExpression, IntPtr> environment, BinaryExpression expression, Func<Context, IntPtr, IntPtr, IntPtr> ctor)
    {
        return ctor(context, Visit(context, environment, expression.Left), Visit(context, environment, expression.Right));
    }

    private IntPtr VisitUnary(Context context, Dictionary<ParameterExpression, IntPtr> environment, UnaryExpression expression, Func<Context, IntPtr, IntPtr> ctor)
    {
        return ctor(context, Visit(context, environment, expression.Operand));
    }

    private IntPtr VisitParameter(Context context, Dictionary<ParameterExpression, IntPtr> environment, ParameterExpression parameter)
    {
        IntPtr value;
        if (!environment.TryGetValue(parameter, out value))
            throw new NotSupportedException("Unknown parameter encountered: " + parameter.Name + ".");

        return value;
    }

All we’re doing here is basic recursion and object construction through our passed in delegates: e.g. an AndAlso node in the LINQ expression tree has a Left and Right property. Recursively we translate those into IntPtr handles into the Z3 “object model”. Given those values, the lambda “(ctx, a, b) => ctx.MkAnd(a, b)” invokes the MkAnd factory method to produce an And node in the target “object model”. The case of visiting a parameter expression simply looks up the handle to the Z3 Const node (created in GetEnvironment before) and returns it.

Finally, we can get the solution from the generated model by getting the truth values for each parameter that occurs in the environment:

    private Dictionary<ParameterExpression, bool> GetSolution(Model model, Dictionary<ParameterExpression, IntPtr> environment)
    {
        var result = new Dictionary<ParameterExpression, bool>();

        foreach (var key in environment.Keys)
        {
            result[key] = model.GetBoolValueBool(model.Eval(environment[key]));
        }

        return result;
    }
}

That’s it. Barely 100 lines of code in total. And sure enough, It Just Works (TM):

image 

Truly fascinating, isn’t it? But this is just the tip of the iceberg. Some next time, we’ll look at the domain of integer arithmetic to make things a bit more interesting...

Exercise to the reader, in the meantime: feed the output of the theorem prover back in to the original LINQ expression to check the correctness of the answer. Use either interpretation (by visiting the expression tree) or compilation (if you feel lame and prefer a simple .Compile() call somewhere).

Happy theorem proving!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Last time around in this series, I promised to talk about generic co- and contra-variance. So that’s why we’re here today. In this post I’ll explain the theoretical concepts behind these fancy-sounding terms, look at the runtime’s take on it, show how to use them in C# 4.0 and most important of all: tell you why you don’t have to worry about all of this :-).

What’s variance?

Language features with names like variance ought to come from a theoretical background, don’t you think so? Absolutely right, and that’s no different for variance with category theory being its source of inspiration. Although we often speak about co- and contra-variance (turns out C# – amongst lots of other languages – already use those concepts , as we’ll see later) there are three sorts of variance to be explained before moving on.

Type ordering

First we need to establish the notion of type ordering. All of you know about object-oriented programming where types are used to indicate the “kind” of data one is dealing with. On top of this, subtyping is used to specialize types. Or, the other way around, operations supported on a supertype can also be applied to its subtypes (possibly with a specialized implementation through overriding and virtual method dispatch).

For example, System.Object has a ToString method. System.DateTime is a subtype of System.Object and therefore also has a ToString method on it (either derived from the base type as-is or overridden, but that doesn’t matter for now).

Let’s use more concise notation to write down such relationships. If D(erived) is a subclass of B(ase), we’ll write D <: B. To say d is an instance of D, we’ll write d : D.

Based on this we can start formulating properties, such as subsumption, the formal notion of subtype-relationships:

If d : D and D <: B, then d : B

For instance, if we substitute d for name, D for System.String and B for System.Object, we end up with:

If name : System.String and System.String <: System.Object, then name : System.Object

We all know this: because name is declared to be a string and the string type is a subtype of object, we can say name ought to be an object. The notation <: is what we call type ordering. I’ll leave it to the math-savvy reader to map those concepts back to the properties of binary relations, such as reflexivity, (anti)-symmetry and transitivity.

Variance

What’s up with variance? In short, variance is a property of operators that act on types. This is very abstract, but we’ll make it very concrete in just a second. For now, think of it very generally: assume op(T1, T2) is an operator “that does something with/based on” T1 and T2 which are both types. For mathematicians (and programmers as it turns out) it makes perfect sense to attribute properties to that operator, capturing its behavior with respect to the types it acts on.

To make things a bit more concrete, let’s map the concept of operators onto generics. (Note for geeks only. Generics are a form of “parametric polymorphism”, which extends the lambda calculus to something known as System F.) As we know, generics allow for type parameterization. Below is a simple example introducing a Pair type:

class Pair<TFirst, TSecond>
{
    public Pair(TFirst first, TSecond second)
    {
        First = first;
        Second = second;
    }

    public TFirst First { get; private set; }
    public TSecond Second { get; private set; }
}

You can think of this type declaration as an operator that, given type values for TFirst and TSecond, can give us a constructed type back (in other words, it’s a type constructor, in the Haskell sense, not the CLR sense). Now we want to be able to express type ordering relationships over such constructed generic types. For example, how does Pair<string, int> relate to Pair<string, object> or to Pair<DateTime, int> or to … whatever. More specifically, we want to be able to infer the relationship between any Pair<T1, T2> and Pair<T3, T4> from relationships between the parameters.

For instance, can we say (this is a question, not an answer just yet…) that:

If T1 <: T3 and T2 <: T4, then Pair<T1,T2> <: Pair<T3,T4>

These are the kind of questions asked by theorists, but obviously by developers today as well.

Covariance

Time to dive into the first kind of variance: covariance. Co as a prefix means “together with”. It means an operator preserves the ordering of types, when compared to its operands. Let me give a concrete sample, based on our Pair class above.

If you know that Apple is a subtype of Fruit (Apple <: Fruit) and Tomato is a subtype of Vegetable (Tomato <: Vegetable), can you also say that Pair<Apple, Tomato> is a subtype of Pair<Fruit, Vegetable>? Looks like that ought to be valid on first sight, no? But there might be some hidden caveats… We’ll get to that soon.

The formalized notion of covariance can be stated as follows:

Operator op(T1, …, Tn) is covariant in Ti (1 <= i <= n) if Si <: Ti implies op(T1, …, Ti-1, Si, Ti+1, …, Tn) <: op(T1, …, Tn)

A very practical case where this comes up is with sequences. For instance, given a LINQ query that returns a sequence of Student objects, can we treat that sequence as one of Person objects, i.e.:

IEnumerable<Person> people = from Student s in db.Students
                             where s.Age <= 25
                             select s;

Today, in C# 3.0, you cannot do this because IEnumerable<T> is not covariant in T.

Contravariance

The opposite of “co” is “contra” which means “against”. In the context of variance it means an operator reverses the ordering of types, when compared to its operands. Again a concrete sample is more than welcome I guess.

Let’s stick with fruits. We known – just repeating this obvious fact – that Apple is a subtype of Fruit (Apple <: Fruit). Now say if we want to carry out comparison between apples (IComparer<Apple>), is it possible then to use a comparison between fruits instead (IComparer<Fruit>)? Looks like that should be possible, right? Everything that can handle two pieces of fruit for comparison ought to be able to handle two apples as each apple is a piece of fruit.

The formalized notion of contravariance can be stated as follows:

Operator op(T1, …, Tn) is contravariance in Ti (1 <= i <= n) if Si <: Ti implies op(T1, …, Ti-1, Si, Ti+1, …, Tn) :> op(T1, …, Tn)

A very practical case where this comes up is with action delegates. For instance, given an action that takes in a Person, can we treat that action as one that deals with a Student object instead, i.e.:

Action<Student> submitLetter = (Person p) => { 
    SendMailTo(p.HomeAddress);
};

Update: I made a mistake in my original post, where I swapped Person and Student in the sample above. Guess it was getting late and despite double-checking all uses of the terminology, it slipped through. Thanks to Chris for pointing this out!

Today, in C# 3.0, you cannot do this because Action<T> is not contravariant in T.

Broken array variance

First a shock to the reader: arrays in the CLR are covariant. “So what?”, you might wonder. Well, and here comes the shock, it turns out covariant use of arrays is not type-safe. (Let you heartbeat go down before reading on.) We’ll get to why covariant treatment of arrays is broken, but first some history. You might assume this “broken array variance” was put in the CLI (Common Language Infrastructure, the standardized specification of the CLR, ECMA-335) intentionally. The mission of the CLI was – and still is, bigger than ever with the whole DLR on top of it – to accommodate executing different languages on the same unified runtime with a unified type system, instruction set, runtime services, etc. One such language was Java, which has broken array covariance, and being able to accommodate that language won over fixing this behavior.

But what makes arrays unsafe for covariance? Another sample with fruits… The story of the fruit farmer.

A fruit farmer produces apples and has a contract with a local grocery store to sell the apples. To do so, the farmer hands over an array of apples to the store. The contract states the store can return the apples that haven’t been sold in the next week, and only the apples that were sold (indicated by empty spots in the array from which the pieces of fruit have been taken) will be billed. The remainder apples – possible rotten by now – are sent to a juice factory nearby that creates sweet apple juice. This scheme is illustrated below:

image

But because arrays are covariant, the grocery story (on the right) can treat an array of Apples (Apple[]) as if it were an array of Fruits (Fruit[]). Since the cells of arrays are writeable, this opens up a potential hole in the system. What if the grocery store wants to cheat and put some rotten tomatoes in the tray (let’s assume tomatoes are fruits for sake of this discussion; if you don’t agree with this, substitute “rotten tomato” for “rotten lemon” but that doesn’t sound as bad IMO, hence the use of tomatoes). The contract with the farmer stated that only the number of empty places in the tray will be considered in billing the grocery story; so fill a few empty places with unsellable rotten tomatoes and the price gets reduced. This might go unnoticed if the farmer doesn’t enforce runtime fruit/vegetable type safety.

If the contract with the juice factory states that only Apple-arrays can be passed in, but this didn’t get checked by the farmer at runtime after return from the grocery store, their juice will now contain tomato juice as well (a strange combination I guess). Or worse, the juice factory will blow up because the apple peeling machine expects a certain toughness from the apple being peeled and tomatoes are much softer (note: I don’t know anything at all about the factory process involved in creating juice, so I’m just fantasizing about possible horror stories).

This mishap is illustrated below.

image

But who to blame? The grocery story followed the array-based contract exactly; arrays do not prevent writing operations, so putting tomatoes in is not a violation. Clearly the farmer needs a tighter contract that ensures – statically – the grocery store cannot put other fruits or vegetables in. What the farmer should use is an “pull-based enumerating device for apples” (an IEnumerator<Apple> that is) as shown below:

image

The spring depicts the enumerating behavior – you can only get things out but can’t push things in (ignore forceful mechanisms and ignoring the fact the spring might crush the apples :-)). Such an IEnumerable<T> is safely covariant because you can’t push anything in, so even if the farmer treats the IEnumerable<Apple> as an IEnumerable<Fruit> all he can do is get pieces of fruit (which always will happen to be apples) out.

This illustrates why arrays T[] are not safely covariant and why IEnumerable<T> is. Or in code:

namespace Rural.Fields
{
    using Market;
    using Industry;

    internal sealed class Farm
    {
        private GroceryStore _store = …;
        private JuiceFactory _juice = …;
        private const decimal APPLE_PRICE = 0.5;

        private Apple[] PluckAppleTrees() { … }

        public void Work()
        {
            Apple[] apples = PluckAppleTrees();
            _store.SellFruits(apples);
// here the array is treated covariantly!
            int sold = apples.Where(apple => apple != null).Count();
            _store.SendInvoice(sold * APPLE_PRICE);
            _juice.ProduceAppleJuice(apples);
// here the array is treated invariantly
            _juice.SendInvoice((apples.Length – sold) * APPLE_PRICE * 0.8 /* price of rotten apple */);
        }
    }
}

namespace Market
{
    public sealed class GroceryStore
    {
        public void SellFruits(Fruit[] fruits) { … }
        public void SendInvoice(decimal amount) { … }
    }
}

namespace Industry
{
    public sealed class JuiceFactory
    {
        public void ProduceAppleJuice(Apple[] apples) { … }
        public void SendInvoice(decimal amount) { … }
    }
}

To ensure type safety for covariant arrays, the CLR injects runtime type checks for every attempt to store an element in an array. If an object with a type incompatible with the array in memory is attempted to be written to the array, an ArrayTypeMismatchException occurs. In our case above, if the SellFruits method of GroceryStory would try to write a Tomato object to the array that’s passed in (as Fruit[] but that was created as an Apple array, in the PluckAppleTrees method of the Farm) the CLR would detect this and throw the exception.

A more isolated sample:

string[] names = new string[] { “Bart”, “John” };
object[] things = names;
things[0] = 123;
// ArrayTypeMismatchException – 123 is not a string
Console.WriteLine(names[0].ToUpper() /* if the above would work, we’d call the non-existing ToUpper method on System.Int32 */);

Today’s co- and contra-variance support

C# actually has places where co- and contra-variance principles are being used, more specifically in delegate types. A typical sample where this comes in useful is when dealing with events.

delegate void CancelEvent(object sender, CancelEventArgs e);
delegate void ProgressEvent(object sender, ProgressEventArgs e);

sealed class Engine
{
    public event CancelEvent Cancelled;
    public event ProgressEvent ProgressChanged;

    public void Run() { … }
    public void Cancel() { … }
}

static class Program
{
    static void Main()
    {
        var engine = new Engine();
        engine.Cancelled += Logger;
        engine.ProgressChanged += Logger;
       
// run engine, etc
    }

    void Logger(object sender, EventArgs e) { … }
}

Here we’re attaching an event handler to two events which have different signatures. However, the signature of Logger uses a common supertype of both events’ event arguments type, so the compiler allows this. The reason it does is because input parameters can safely be treated contravariantly. In other words, we’re getting in a more derived type for the event arguments but we treat it as less derived. As the argument appears in an input position, this is safe to do.

Here’s a little exercise for the reader. The following fragment compiles fine because of contravariant treatment for delegate parameters:

class Contra
{
    static void Main()
    {
        var b = new Bar();
        b.Event += Handler;
    }

    static void Handler(object o /* contravariant treatment when used with delegate type D */)
    {
    }
}

delegate void D(string s);

class Bar
{
    public event D Event;
}

However, if we change the parameter on D by adding a ref modifier, it doesn’t work anymore.

static void Handler(ref object o /* ??? */)

delegate void
D(ref string s);

Why? Illustrate with a sample showing why contravariant treatment would be malicious if it were allowed here…

Covariant and contravariant generic parameters in the CLI

With the second release of the CLR, as part of .NET Framework 2.0, generics were introduced based on the work done by Don Syme and Andrew Kennedy in the Gyro project. Right from the start, generic parameters have supported the declaration of desired variance behavior. Section 8 in Partition I of the ECMA-335 spec for the CLI states:

In addition, CLI supports covariant and contravariant generic parameters, with the following characteristics:

  • It is type-safe (based on purely static checking)
  • Simplicity: in particular, variance is only permitted on generic interfaces and generic delegates (not classes or value-types)
  • Languages not wishing to support variance can ignore the feature, and treat all generic types as non-variant.
  • Enable implementation of more complex covariance scheme as used in some languages, e.g. Eiffel.

but so far, C# and VB have been following the third bullet, ignoring the feature. Before we go there, we should have a look at how generics variance is surfaced through IL, proving its support in the CLR today. More information on this can be found in paragraph 9.5 of Partition II:

The CLI supports covariance and contravariance of generic parameters, but only in the signatures of interfaces and delegate classes. 
The symbol “+” is used in the syntax of §10.1.7 to denote a covariant generic parameter, while “-” is used to denote a contravariant generic parameter.

Our two main samples have been IEnumerable<T> and IComparer<T>. Let’s define our own interfaces for both (in C#) and see how it looks like at the level of IL:

interface IEnumerable<T>
{
    IEnumerator<T> GetEnumerator();
}

interface IEnumerator<T>
{
    bool MoveNext();
    T Current { get; }
}

interface IComparer<T>
{
    int Compare(T arg1, T arg2);
}

By default generic interface and delegate types are invariant:

image

However, we can roundtrip through IL to make those types covariant (for IEnumer*<T>) or contravariant (for IComparer<T>)in their generic parameter T. The difference is subtle: adding a + (covariant) or a - (contravariant) to the generic parameter T.

.class interface private abstract auto ansi IEnumerable`1<+T>
{
  .method public hidebysig newslot abstract virtual
          instance class IEnumerator`1<!T>
          GetEnumerator() cil managed
  {
  } // end of method IEnumerable`1::GetEnumerator

} // end of class IEnumerable`1

.class interface private abstract auto ansi IEnumerator`1<+T>
{
  .method public hidebysig newslot abstract virtual
          instance bool  MoveNext() cil managed
  {
  } // end of method IEnumerator`1::MoveNext

  .method public hidebysig newslot specialname abstract virtual
          instance !T  get_Current() cil managed
  {
  } // end of method IEnumerator`1::get_Current

  .property instance !T Current()
  {
    .get instance !T IEnumerator`1::get_Current()
  } // end of property IEnumerator`1::Current
} // end of class IEnumerator`1

.class interface private abstract auto ansi IComparer`1<-T>
{
  .method public hidebysig newslot abstract virtual
          instance int32  Compare(!T arg1,
                                  !T arg2) cil managed
  {
  } // end of method IComparer`1::Compare

} // end of class IComparer`1

image

Notice the ilasm tool doesn’t statically verify the correct use of variance annotations. It’s possible to mark a generic parameter as covariant while it’s used in input positions. It’s the responsibility of language compilers, e.g. for C#, to enforce such rules:

  • Covariant parameters should only be used in output positions: method return values, get-only properties or indexers.
  • Contravariant parameters should only occur in input positions: method parameters, set-only properties or indexers.

C# 4.0 support

Starting with C# 4.0, the language does support co- and contra-variance on generic delegate and interface type parameters. This feature has two sides to it:

  • As a consumer of generic interface or delegate types that behave either co- or contra-variantly, you can now do what you couldn’t do before

IEnumerable<Person> people = from Student s in db.Students
                             where s.Age <= 25
                             select s;

IComparer<string> comp = new MyObjectComparer(); // implements IComparer<object>

  • As a provider of generic interface or delegate types you can now specify the intended behavior. This is done using the out and in keywords, which respectively stand for covariant (output positions only) and contravariant (input positions only). The compiler enforces that parameters that behave in a non-invariant way are used appropriately:

interface IEnumerable<out T>
{
    IEnumerator<T> GetEnumerator();
}

interface IEnumerator<out T>
{
    bool MoveNext();
    T Current { get; }
}

interface IComparer<in T>
{
    int Compare(T arg1, T arg2);
}

You might wonder why the compiler doesn’t infer the applicable variance annotation on behalf of the user. This could easily lead to “accidental” incorrect treatment as things evolve. Although it’s not a good idea to start changing interfaces, during development your interface types might be in flux and it would be disruptive if consumer code starts breaking all of a sudden because a generic parameter’s variance treatment has changed. It was felt it’d be better to have developers be explicit about variance.

Obviously a bunch of interface and delegate types in the BCL have been modified with variance annotations. IComparer<T>, IEnumerable<T> and IEnumerator<T> are just a few samples. Others include the Func and Action delegates:

delegate R Func<in T1, in T2, …, in Tn, out R>(T1 arg1, T2 arg2, …, Tn argn);
delegate void Action<in T1, in T2, …, in Tn>(T1 arg1, T2 arg2, …, Tn argn);

That’s it. I told’ya it was simple, no? The good thing is, you shouldn’t really know much about all of this: It Just Works (now also available for C# – inside joke).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts