Introduction

In preparation for some upcoming posts related to LINQ (what else?), Windows PowerShell and Rx, I had to set up a local LDAP-capable directory service. (Hint: It will pay off to read till the very end of the post if you’re wondering what I’m up to...) In this post I’ll walk the reader through the installation, configuration and use of Active Directory Lightweight Directory Services (LDS), formerly known as Active Directory Application Mode (ADAM). Having used the technology several years ago, in relation to the LINQ to Active Directory project (which as an extension to this blog series will receive an update), it was a warm and welcome reencounter.

 

What’s Lightweight Directory Services anyway?

Use of hierarchical storage and auxiliary services provided by technologies like Active Directory often has advantages over alternative designs, e.g. using a relational database. For example, user accounts may be stored in a directory service for an application to make use of. While Active Directory seems the natural habitat to store (and replicate, secure, etc.) additional user information, IT admins will likely point you – the poor developer – at the door when asking to extend the schema. That’s one of the places where LDS comes in, offering the ability to take advantage of the programming model of directory services while keeping your hands off “the one and only AD schema”.

The LDS website quotes other use cases, which I’ll just copy here verbatim:

Active Directory Lightweight Directory Service (AD LDS), formerly known as Active Directory Application Mode, can be used to provide directory services for directory-enabled applications. Instead of using your organization’s AD DS database to store the directory-enabled application data, AD LDS can be used to store the data. AD LDS can be used in conjunction with AD DS so that you can have a central location for security accounts (AD DS) and another location to support the application configuration and directory data (AD LDS). Using AD LDS, you can reduce the overhead associated with Active Directory replication, you do not have to extend the Active Directory schema to support the application, and you can partition the directory structure so that the AD LDS service is only deployed to the servers that need to support the directory-enabled application.

  • Install from Media Generation. The ability to create installation media for AD LDS by using Ntdsutil.exe or Dsdbutil.exe.

  • Auditing. Auditing of changed values within the directory service.

  • Database Mounting Tool. Gives you the ability to view data within snapshots of the database files.

  • Active Directory Sites and Services Support. Gives you the ability to use Active Directory Sites and Services to manage the replication of the AD LDS data changes.

  • Dynamic List of LDIF files. With this feature, you can associate custom LDIF files with the existing default LDIF files used for setup of AD LDS on a server.

  • Recursive Linked-Attribute Queries. LDAP queries can follow nested attribute links to determine additional attribute properties, such as group memberships.

Obviously that last bullet point grabs my attention through I will retain myself from digressing here.

 

Getting started

If you’re running Windows 7, the following explanation is the right one for you. For older versions of the operating system, things are pretty similar though different downloads will have to be used. For Windows Server 2008, a server role exists for LDS. So, assuming you’re on Windows 7, start by downloading the installation media over here. After installing this, you should find an entry “Active Directory Lightweight Directory Services Setup Wizard” under the “Administrative Tools” section in “Control Panel”:

image

LDS allows you to install multiple instances of directory services on the same machine, just like SQL Server allows multiple server instances to co-exist. Each instance has a name and listens on certain ports using the LDP protocol. Starting this wizard – which lives under %SystemRoot%\ADAM\adaminstall.exe, revealing the former product name – brings us here:

image

After clicking Next, we need to decide whether we create a new unique instance that hasn’t any ties with existing instances, or whether we want to create a replicate of an existing instance. For our purposes, the first option is what we need:

image

Next, we’re asked for an instance name. The instance name will be used for the creation of a Windows Service, as well as to store some settings. Each instance will get its own Windows Service. In our sample, we’ll create a directory for the Northwind Employees tables, which we’ll use to create accounts further on.

image

We’re almost there with the baseline configuration. The next question is to specify a port number, both for plain TCP and for SSL-encrypted traffic. The default ports, 389 and 636, are fine for us. Later we’ll be able to connect to the instance by connecting to LDP over port 389, e.g. using the System.DirectoryServices namespace functionality in .NET. Notice every instance of LDS should have its own port number, so only one can be using the default port numbers.

image

Now that we have completed the “physical administration”, the wizard moves on to a bit of “logical administration”. More specifically, we’re given the option to create a directory partition for the application. Here we choose to create such a partition, though in many concrete deployment scenarios you’ll want the application’s setup to create this at runtime. Our partition’s distinguished name will mimic a “Northwind.local” domain containing a partition called “Employees”:

image

After this bit of logical administration, some more physical configuration has to be carried out, specifying the data files location and the account to run the services under. For both, the default settings are fine. Also the administrative account assigned to manage the LDS instance can be kept as the currently logged in user, unless you feel the need to change this in your scenario:

image image

Finally, we’ve arrived at an interesting step where we’re given the option to import LDIF files. And LDIF file, with extension .ldf, contains the definition of a class that can be added to a directory service’s schema. Basically those contain things like attributes and their types. Under the %SystemRoot%\ADAM folder, a set of out-of-the-box .ldf files can be found:

image

Instead of having to run the ldifde.exe tool, the wizard gives us the option to import LDIF files directly. Those classes are documented in various places, such as RFC2798 for inetOrgPerson. On TechNet, information is presented in a more structured manner, e.g revealing that inetOrgPerson is a subclass of user. Custom classes can be defined and imported after setup has completed. In this post, we won’t extend the schema ourselves but we will simply be using the built-in User class so let’s tick that one:

image

After clicking Next, we get a last chance to revisit our settings or can confirm the installation. At this point, the wizard will create the instance – setting up the service – and import the LDIF files.

image image

Congratulations! Your first LDS instance has materialized. If everything went alright, the NorthwindEmployees service should show up:

image

 

Inspecting the directory

To inspect the newly created directory instance, a bunch of tools exist. One is ADSI Edit which you could already see in the Administrative Tools. To set it up, open the MMC-based tool and go to Action, Connect to… In the dialog that appears, specify the server name and choose Schema as the Naming Context.

image

For example, if you want to inspect the User class, simply navigate to the Schema node in the tree and show the properties of the User entry.

image

To visualize the objects in the application partition, connect using the distinguished name specified during the installation:

image

Now it’s possible to create a new object in the directory using the context menu in the content pane:

image

After specifying the class, we get to specify the “CN” name (for common name) of the object. In this case, I’ll use my full name:

image image

We can also set additional attributes, as shown below (using the “physicalDeliveryOfficeName” to specify the office number of the user):

image image

After clicking Set, closing the Attributes dialog and clicking Finish to create the object, we see it pop up in the items view of the ADSI editor snap-in:

image

 

Programmatic population of the directory

Obviously we’re much more interested in a programmatic way to program Directory Services. .NET supports the use of directory services and related protocols (LDAP in particular) through the System.DirectoryServices namespace. In a plain new Console Application, add a reference to the assembly with the same name (don’t both about other assemblies that deal with account management and protocol stuff):

image

For this sample, I’ll also assume the reader got a Northwind SQL database sitting somewhere and knows how to get data out of its Employees table as rich objects. Below is how things look when using the LINQ to SQL designer:

image

We’ll just import a few details about the users; it’s left to the reader to map other properties onto attributes using the documentation about the user directory services class. Just a few lines of code suffice to accomplish the task (assuming the System.DirectoryServices namespace is imported):

static void Main()
{
    var path = "LDAP://bartde-hp07/CN=Employees,DC=Northwind,DC=local";
    var root = new DirectoryEntry(path);

    var ctx = new NorthwindDataContext();
    foreach (var e in ctx.Employees)
    {
        var cn = "CN=" + e.FirstName + e.LastName;

        var u = root.Children.Add(cn, "user");
        u.Properties["employeeID"].Value = e.EmployeeID;
        u.Properties["sn"].Value = e.LastName;
        u.Properties["givenName"].Value = e.FirstName;
        u.Properties["comment"].Value = e.Notes;
        u.Properties["homePhone"].Value = e.HomePhone;
        u.Properties["photo"].Value = e.Photo.ToArray();
        u.CommitChanges();
    }
}

After running this code – obviously changing the LDAP path to reflect your setup – you should see the following in ADSI Edit (after hitting refresh):

image

Now it’s just plain easy to write an application that visualizes the employees with their data. We’ll leave that to the UI-savvy reader (just to tease that segment of my audience, I’ve also imported the employee’s photo as a byte-array).

 

A small preview of what’s coming up

To whet the reader’s appetite about next episodes on this blog, below is a single screenshot illustrating something – IMHO – rather cool (use of LINQ to Active Directory is just an implementation detail below):

image

Note: What’s shown here is the result of a very early experiment done as part of my current job on “LINQ to Anything” here in the “Cloud Data Programmability Team”. Please don’t fantasize about it as being a vNext feature of any product involved whatsoever. The core intent of those experiments is to emphasize the omnipresence of LINQ (and more widely, monads) in today’s (and tomorrow’s) world. While we’re not ready to reveal the “LINQ to Anything” mission in all its glory (rather think of it as “LINQ to the unimaginable”), we can drop some hints.

Stay tuned for more!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

A while ago I was explaining runtime mechanisms like the stack and the heap to some folks. (As an aside, I’m writing a debugger course on “Advanced .NET Debugging with WinDbg with SOS”, which is an ongoing project. Time will tell when it’s ready to hit the streets.) Since the context was functional programming where recursion is a typical substitute (or fuel if you will) for loops, an obvious topic for discussion is the possibility to hit a stack overflow. Armed with my favorite editor, Notepad.exe, and the C# command-line compiler, I quickly entered the following sample to show “looping with recursion” and how disaster can strike:

using System;

class Program
{
    static void Main()
    {
        Rec(0);
    }

    static void Rec(int n)
    {
        if (n % 1024 == 0)
            Console.WriteLine(n);

        Rec(n + 1);
    }
}

The module-based condition in there is to avoid excessive slowdowns due to Console.WriteLine use, which is rather slow due to the way the Win32 console output system works. To my initial surprise, the overflow didn’t come anywhere in sight and the application kept running happily:

image

I rather expected something along the following lines:

image

So, what’s going on here? Though I realized pretty quickly what the root cause is of this unexpected good behavior, I’ll walk the reader through the thought process used to “debug” the application’s code.

 

I made a call, didn’t I?

The first thing to check is that we really are making a recursive call in our Rec method. Obviously ildasm is the way to go to inspect that kind of stuff, so here’s the output which we did expect.

image

In fact, the statement made above – “which we did expect” – is debatable. Couldn’t the compiler just turn the call into a jump right to the start of the method after messing around a bit with the local argument slot that holds argument value n? That way we wouldn’t have to make a call and the code would still work as expected. Essentially what we’re saying here is that the compiler could have turned the recursive call into a loop construct. And indeed, some compilers do exactly that. For example, consider the following F# sample:

#light

let rec Rec n =
   if n % 1024 = 0 then
       printfn "%d" n

   Rec (n + 1)

Rec 0

Notice the explicit indication of the recursive nature of a function by means of the “rec” keyword. After compiling this piece of code using fsc.exe, the following code is shown in Reflector (decompiling to C# syntax) for the Rec function:

image

The mechanics of the printf call are irrelevant. What matters is the code that’s executed after the n++ statement, which isn’t a recursive call to Rec itself. Instead, the compiler has figured out a loop can be used. Hence, no StackOverflowException will result.

Back to the C# sample though. What did protect the code from overflowing the stack? Let’s have some further investigations, but first … some background.

 

Tail calls

One optimization that can be carried out for recursive functions is to spot tail calls and optimize them away into looping – or at a lower level, jumps – constructs. A tail call is basically a call after which the current stack frame is no longer needed upon return from the call. For example, our simple sample can benefit from tail call optimization since the Rec method doesn’t really do anything anymore after returning from the recursive Rec call:

static void Rec(int n)
{
    if (n % 1024 == 0)
        Console.WriteLine(n);

    Rec(n + 1);
}

This kind of optimization – as carried out by F# in the sample shown earlier – can’t always take place. For example, consider the following definition of a factorial method:

static int Fac(int n)
{
    if (n == 0)
        return 1;

    return n * Fac(n – 1);
}

The above has quite a few issues such as the inability to deal with negative values and obviously the arithmetic overflow disaster that will strike when the supplied “n” parameter is too large for the resulting factorial to fit in an Int32. The BigInteger type introduced in .NET 4 (and not in .NET 3.5 as originally planned) would be a better fit for this kind of computation, but let’s ignore this fact for now.

A more relevant issue in the context of our discussion is the code’s use of recursion where a regular loop would suffice, but now I’m making a value judgment of imperative control flow constructs versus a more functional style of using recursion. That’s true nonetheless is the fact that the code above is not immediately amenable for tail call optimization. To see why this is, rewrite the code as follows:

static int Fac(int n)
{
    if (n == 0)
        return 1;

    int t = Fac(n – 1);
    return n * t;

}

See what’s going on? After returning from the recursive call to Fac, we still need to have access to the value of “n” in the current call frame. As a result, we can’t reuse the current stack frame when making the recursive call. Implementing the above in F# (just for the sake of it) and decompiling it, shows the following code:

image

The culprit keeping us from employing tail call optimization is the multiplication instruction needed after the return from the recursive call to Fac. (Note: the second operand to the multiplication was pushed onto the evaluation stack in IL_0005; in fact IL_0006 could also have been a dup instruction.) C# code will be slightly different but achieve the same computation (luckily!).

Sometimes it’s possible to make a function amenable for tail call optimization by carrying out a manual rewrite. In the case of the factorial method, we can employ the following trick:

static int Fac(int n)
{
    return Fac_(n, 1);
}

static int Fac_(int n, int res)
{
    if (n == 0)
        return res;

    return Fac_(n – 1, n * res);
}

Here, we’re not only decrementing n in every recursive call, we’re also keeping the running multiplication at the same time. In my post Jumping the trampoline in C# – Stack-friendly recursion, I explained this principle in the “Don’t stand on my tail!” section. The F# equivalent of the code, shown below, results in tail call optimization once more:

let rec Fac_ n res =
   if n = 0 then
       res
   else
       Fac_ (n - 1) (n * res)

let Fac n =
   Fac_ n 1

The compilation result is shown below:

image

You can clearly see the reuse of local argument slots.

 

A smart JIT

All of this doesn’t yet explain why the original C# code is just working fine though our look at the generated IL code in the second section of this post did reveal the call instruction to really be there. One more party is involved in getting our much beloved piece of C# code to run on the bare metal of the machine: the JIT compiler.

In fact, as soon as I saw the demo not working as intended, the mental click was made to go and check this possibility. Why? Well, the C# compiler doesn’t optimize tail calls into loops, nor does it emit tail.call instructions. The one and only remaining party is the JIT compiler. And indeed, since I’m running on x64 and am using the command-line compiler, the JIT compiler is more aggressive about performing tail call optimizations.

Let’s explain a few things about the previous paragraph. First of all, why does the use of the command-line compiler matter? Won’t the same result pop up if I used a Console Application project in Visual Studio? Not quite, if you’re using Visual Studio 2010 that is. One the decisions made in the last release is to mark executables IL assemblies (managed .exe files) as 32-bit only. That doesn’t mean the image contains 32-bit instructions (in fact, the C# compiler never emits raw assembler); all it does it tell the JIT to only emit 32-bit assembler at runtime, hence resulting in a WOW64 process on 64-bit Windows. The reasons for this are explained in the Rick Byer’s blog post on the subject. In our case, we’re running the C# compiler without the /platform:x86 flag – which now is passed by the default settings of a Visual Studio 2010 executable (not library!) project – therefore resulting in an “AnyCPU” assembly. The corflags.exe tool can be used to verify this claim:

image

In Visual Studio 2010, a new Console Application project will have the 32-bit only flag set by default. Again, reasons for this decision are brought up in Rick’s post on the subject.

image

Indeed, when running the 32-bit only assembly, a StackOverflowException results. An alternative way to tweak the flags of a managed assembly is by using corflags.exe itself, as shown below:

image

It turns out when the 64-bit JIT is involved, i.e. when the AnyCPU Platform target is set – the default on the csc.exe compiler – tail call optimization is carried out for our piece of code. A whole bunch of conditions under which tail calls can be optimized by the various JIT flavors can be found on David Broman’s blog. Grant Richins has been blogging about improvements made in .NET 4 (which don’t really apply to our particular sample). One important change in .NET 4 is the fact the 64-bit JIT now honors the “tail.” prefix on call instructions, which is essential to the success of functional style languages like F# (indeed, F#’s compiler actually has a tailcalls flags, which is on by default due to the language’s nature).

 

Seeing the 64-bit JIT’s work in action

In order to show the reader the generated x64 code for our recursive Rec method definition, we’ll switch gears and open up WinDbg, leveraging the SOS debugger extension. Obviously this requires one to install the Debugging Tools for Windows. Also notice the section’s title to apply to x64. For x86 users, the same experiment can be carried out, revealing the x86 instructions generated without the tail call optimization, hence explaining the overflow observed on 32-bit executions.

Loading the ovf.exe sample (making sure the 32-bit only flag is not set!) under the WinDbg debugger – using windbg.exe ovf.exe – brings us to the first loader breakpoint as shown below. In order to load the Son Of Strike (SOS) debugger extension, set a module load breakpoint for clrjit.dll (which puts us in a convenient spot where the CLR has been sufficiently loaded to use SOS successfully). When that breakpoint hits, the extension can be loaded using .loadby sos clr:

image

Next, we need to set a breakpoint on the Rec method. In my case, the assembly’s file name is ovf.exe, the class is Program and the method is Rec, requiring me to enter the following commands:

image

The !bpmd extension command is used to set a breakpoint based on a MethodDesc – a structure used by the CLR to describe a method. Since the method hasn’t been JIT compiled yet, and hence no physical address for the executable code is available yet, a pending breakpoint is added. Now we let go the debugger and end up hitting the breakpoint which got automatically set when the JIT compiler took care of compiling the method (since it came “in sight” for execution, i.e. because of Main’s call into it). Using the !U – for unassemble – command we can now see the generated code:

image

Notice the presence of code like InitializeStdOutError which is the result from inlining of the Console.WriteLine method’s code. What’s going on here with regards to the tail call behavior is the replacement of a call instruction with a jump simply to the beginning of the generated code. The rest of the code can be deciphered with a bit of x86/x64 knowledge. For one thing, you can recognize the 1024 value (used for our modulo arithmetic) in 3FF which is 1023. The module check stretches over a few instructions that basically use a mask over the value to see whether any of the low bits is non-zero. If so, the value is not dividable by 1024; otherwise, it is. Based on this test (whose value gets stored in eax), a jump is made or not, either going through the path of calling Console.WriteLine or not.

 

Contrasting with the x86 assembler being used

In the x86 setting, we’ll see different code. To show this, let’s use a Console Application in Visual Studio 2010, whose default platform target is – as mentioned earlier – 32-bit. In order to load SOS from inside the Immediate Window, enable the native debugger through the project settings:

image

Using similar motions as before, we can load the SOS extension upon hitting a breakpoint. Instead of using !bpmd, we can use !name2ee to resolve the JITTED Code Address for the given symbol, in this case the Program.Rec method:

image

Inspecting the generated code, one will encounter the following call instruction to the same method. This is the regular recursive call without any tail call optimization carried out. Obviously this will cause a StackOverflowException to occur. Also notice from the output below that the Console.WriteLine method call didn’t get inlined in this particular x86 case.

image

 

Revisiting the tail. instruction prefix

As referred to before, the IL instruction set has a tail. prefix for call instructions. Before .NET 4, this was merely a hint to the JIT compiler. For x86, it was (and still is) a request of the IL generator to the JIT compiler to perform a tail call. For x64, prior to CLR 4.0, this request was not always granted. For our x86 case, we can have a go at inserting the tail. prefix for the recursive call in the code generated by the C# compiler (which doesn’t emit this instruction by itself as explained before). Using ildasm’s /out parameter, you can export the ovf.exe IL code to a text file. Notice the COR flags have been set to “32-bit required” using either the x86 platform target flag on csc.exe or by using corflags /32bit+:

image

Now tweak the code of Rec as shown below. After a tail call instruction, no further code should execute other than a ret. If this rule isn’t obeyed, the CLR will throw an exception signaling an invalid program. Hence we remove the nop instruction that resulted from a non-optimized build (Debug build or csc.exe use without /o+ flag). To turn the call into a tail call one, we add the “tail.” prefix. Don’t forget the space after the dot though:

image

The session of roundtripping through ILDASM and ILASM with the manual tweak in Notepad shown above is shown here:

image

With this change in place, the ovf.exe will keep on running without overflowing the stack. Looking at the generated code through the debugger, one would see a jmp instruction instead of a call, explaining the fixed behavior.

 

Conclusion

Tail calls are the bread and butter of iterative programs written in a functional style. As such, the CLR has evolved to support tail call optimization in the JIT when the tail. prefix is present, e.g. as emitted by the F# compiler when needed (though the IL code itself may be turned into a loop by the compiler itself). One thing to know is that on x64, the JIT is more aggressive about detecting and carrying out tail recursive calls (since it has a good value proposition with regards to “runtime intelligence cost” versus “speed-up factor”). For more information, I strongly recommend you to have a look at the CLR team’s blog: Tail Call Improvements in .NET Framework 4.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Recently I’ve been playing with Windows PowerShell 2.0 again, in the context of my day-to-day activities. One hint should suffice for the reader to get an idea of what’s going on: push-based collections. While I’ll follow up on this subject pretty soon, this precursor post explains one of the things I had to work around.

 

PowerShell: a managed application or not?

Being designed around the concept of managed object pipelines, one may expect powershell.exe to be a managed executable. However, it turns out this isn’t the case completely. If you try to run ildasm.exe on the PowerShell executable (which lives in %windir%\system32\WindowsPowerShell\v1.0 despite the 2.0 version number, due to setup complications), you get the following message:

image

So much for the managed executable theory. What else can be going on to give PowerShell the power of managed objects. Well, it could be hosting the CLR. To check this theory, we can use the dumpbin.exe tool, using the /imports flag, checking for mscoree.dll functions being called. And indeed, we encounter the CorBindToRuntimeEx function that’s been the way to host the CLR prior to .NET 4’s in-process side-by-side introduction (a feature I should blog about as well since I wrote a CLR host for in-process side-by-side testing on my prior team here at Microsoft).

image

One of the parameters passed to CorBindToRuntimeEx is the version of the CLR to be loaded. Geeks can use WinDbg or cdb to set a breakpoint on this function and investigate the version parameter passed to it by the PowerShell code:

image

Notice the old code name of PowerShell still being revealed in the third stack frame (from the top). In order to hit this breakpoint on a machine that has .NET 4 installed, I’ve used the mscoreei.dll module rather than mscoree.dll. The latter has become a super-shim in the System32 folder, while the former one is where the CLR shim really lives (“i” stands for “implementation”). This refactoring has been done to aid in servicing the CLR on different version of Windows, where the operating system “owns” the files in the System32 folder.

Based on this experiment, it’s crystal clear the CLR is hosted by Windows PowerShell, with hardcoded affinity to v2.0.50727. This is in fact a good thing since automatic roll-forward to whatever the latest version of the CLR is on the machine could cause incompatibilities. One can expect future versions of Windows PowerShell to be based on more recent versions of the CLR, once all required testing has been carried out. (And in that case, one will likely use the new “metahost” CLR hosting APIs.)

 

Loading .NET v4 code in PowerShell v2.0

The obvious question with regards to some of the stuff I’ve been working on was whether or not we can run .NET v4 code in Windows PowerShell v2.0? It shouldn’t be a surprise this won’t work as-is, since the v2.0 CLR is loaded by the PowerShell host. Even if the hosting APIs weren’t involved and the managed executable were compiled against .NET v2.0, that version’s CLR would take precedence. This is in fact the case for ISE:

image

Trying to load a v4.0 assembly in Windows PowerShell v2.0 pathetically fails – as expected – with the following message:

image

So, what are the options to get this to work? Let’s have a look.

Warning:  None of those hacks are officially supported. At this point, Windows PowerShell is a CLR 2.0 application, capable of loading and executing code targeting .NET 2.0 through .NET 3.5 SP1 (all of which run on the second major version of the CLR).

 

Option 1 – Hacking the parameter passed to CorBindToRuntimeEx

If we just need an ad-hoc test of Windows PowerShell v2.0 running on CLR v4.0, we can take advantage of WinDbg once more. Simply break on the CorBindToRuntimeEx and replace the v2.0.50727 string in memory by the v4.0 version, i.e. v4.0.30319. The “eu” command used for this purpose stands for “edit memory Unicode”:

image

If we let go the debugger after this tweak, we’ll ultimately get to see Windows PowerShell running seemingly fine, this time on CLR 4.0. One proof is the fact we can load the .NET 4 assembly we tried to load before:

image

Another proof can be found by looking at the DLL list for the PowerShell.exe instance in Process Explorer:

image

No longer we see mscorwks.dll (which is indicative of CLR 2.0 or below), but a clr.dll module appears instead. While this hack works fine for single-shot experiments, we may want to get something more usable for demo and development purposes.

Note:  Another option – not illustrated here – would be to use Detours and intercept the CorBindToRuntimeEx call programmatically, performing the same parameter substitution as the one we’ve shown through the lenses of the debugger. Notice though the use of CorBindToRuntimeEx is deprecated since .NET 4, so this is and stays a bit of a hack either way.

 

Option 2 – Hosting Windows PowerShell yourself

The second option we’ll explore is to host Windows PowerShell ourselves, not by hosting the CLR and mimicking what PowerShell.exe does, but by using the APIs provided for this purpose. In particular, the ConsoleShell class is of use to achieve this. Moreover, besides simply hosting PowerShell in a CLR v4 process, we can also load snap-ins out of the box. But first things first, starting with a .NET 4 Console Application, add a reference to the System.Management.Automation and Microsoft.PowerShell.ConsoleHost assemblies which can be found under %programfiles%\Reference Assemblies\Microsoft\WindowsPowerShell\v1.0:

image

The little bit of code required to get basic hosting to work is shown below:

using System;
using System.Management.Automation.Runspaces;
using Microsoft.PowerShell;

namespace PSHostCLRv4
{
    class Program
    {
        static int Main(string[] args)
        {
            var config = RunspaceConfiguration.Create();
            return ConsoleShell.Start(
config,
"Windows PowerShell - Hosted on CLR v4\nCopyright (C) 2010 Microsoft Corporation. All rights reserved.",
"",
args
); } } }

Using the RunspaceConfiguration object, it’s possible to load snap-ins if desired. Since that would reveal the reason I was doing this experiment, I won’t go into detail on that just yet :-). The tip in the introduction should suffice to get an idea of the experiment I’m referring to. Here’s the output of the above:

image

While this hosting on .NET 4 is all done using legitimate APIs, it’s better to be conservative when it comes to using this in production since PowerShell hasn’t been blessed to be hosted on .NET 4. While compatibility between CLR versions and for the framework assemblies has been a huge priority for the .NET teams (I was there when it happened), everything should be fine. But the slightest bit of pixy dust (e.g. changes in timing for threading, a classic!) could reveal some issue. Till further notice, use this technique only for testing and experimentation.

Enjoy and stay tuned for more PowerShell fun (combined with other technologies)!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

A quick update to my readers on a few little subjects. First of all, some people have noticed my blog welcomed readers with a not-so-sweet 404 error message the last few days. Turned out my monthly bandwidth was exceeded which was enough reason for my hosting provider to take the thing offline.

image

Since this is quite inconvenient I’ve started some migration of image content to another domain, which is work in progress and should (hopefully) prevent the issue from occurring again. Other measures will be taken to limit the download volumes.

Secondly, many others have noticed it’s been quite silent on my blog lately. As my colleague Wes warned me, once you start enjoying every day of functional programming hacking on Erik’s team, time for blogging steadily decreases. What we call “hacking” has been applied to many projects we’ve been working on over here in the Cloud Programmability Team, some of which are yet undisclosed. The most visible one today is obviously the Reactive Extensions both for .NET and for JavaScript, which I’ve been evangelizing both within and outside the company. Another one which I can only give the name for is dubbed “LINQ to Anything” that’s – as you can imagine – keeping me busy and inspired on a daily and nightly basis. On top of all of this, I’ve got some other writing projects going on that are nearing completion (finally).

Anyway, the big plan is to break the silence and start blogging again about our established technologies, including Rx in all its glory. Subjects will include continuation passing style, duality between IEnumerable<T> and IObservable<T>, parameterization for concurrency, discussion of the plethora of operators available, a good portion of monads for sure, the IQbservable<T> interface (no, I won’t discuss the color of the bikeshed) and one of its applications (LINQ to WMI Events), etc. Stay tuned for a series on those subjects starting in the hopefully very near future.

See you soon!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

During my last tour I’ve been collecting quite some fundamental and introductory Rx samples as illustrations with my presentations on the topic. As promised, I’m sharing those out through my blog. More Rx content is to follow in the (hopefully near) future, with an exhaustive discussion of various design principles and choices, the underlying theoretical foundation of Rx and coverage of lots of operators.

In the meantime, download the sample project here. While the project targets Visual Studio 2010 RTM, you can simply take the Program.cs file and build a Visual Studio 2008 project around it, referencing the necessary Rx assemblies (which you can download from DevLabs).

Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

It's been a long time I've written epic blog posts over here, but for a good reason. We've been working very hard on getting a new Rx release out the door and I'm proud to announce it's available now through http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx. Notice we got a .NET 4 RC compatible download available as well, so you can play with the latest and greatest of technologies in one big jar :-). More goodness will follow later, so stay tuned!

At some point in the foreseeable future, I'll start a series on how Rx works and what its operators are as well. If you have any particular topics you'd like to see covered, don't hesitate to let me know through my blog. In the meantime, make sure to evaporate all your feedback on the forums at http://social.msdn.microsoft.com/Forums/en-US/rx/threads. We love to hear what you think, what operators you believe are missing, any bugs you find, etc.

Update: We also have published a video on the new release at http://channel9.msdn.com/posts/J.Van.Gogh/Your-RxNET-Prescription-Has-Been-Refilled.

Have fun!
Bart @ Rx

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Slightly over two years after arriving here in Redmond to work on the WPF team, time has come for me to make a switch and pursue other opportunities within the company. Starting January 13th, I’ll be working on the SQL Cloud Data Programmability Team on various projects related to democratizing the cloud. While we have much more rabbits sitting in our magician hats, Rx is the first big deliverable we’re working on.

For my blog, there won’t be much change as I’ve always written on topics related to what I’ll be working on: language innovation, data access, LINQ, type systems, lambda fun, etc. I’m planning to stay committed to blogging and other evangelism activities, including speaking engagements from time to time, so feel free to ping me if I’m in your proximity (or if you’re visiting our campus). Next up and confirmed are TechDays “low lands” in Belgium and the Netherlands, end of March.

Needless to say, I’m thrilled to have this opportunity of working together with a relatively small group of smart and passionate people, on the things I’d spend all my free time on anyway. Having this one-to-one alignment between day-to-day professional activities at work and all sorts of free time hacking projects is like a dream coming true. Thanks Danny, Erik, Jeffrey, Mark and Wes for taking me on board.

Expect to see more Rx blogging love over here, and watch out for more goodness to come your way in the foreseeable future. In the meantime, check out the following resources on the matter:

Please keep the feedback on Rx coming: help us, help you!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about EnumerableEx’s facilities to tame side-effects in a functionally inspired manner:

image

 

To side effect or not to side effect?

Being rooted in query comprehensions as seen in various functional programming languages (including (the) pure one(s)), one would expect LINQ to have a very functional basis. Indeed it has, but being hosted in various not functionally pure languages like C# and Visual Basic, odds are off reasoning about side-effects in a meaningful and doable manner. As we’ve seen before, when talking about the Do and Run operators, it’s perfectly possible for a query to exhibit side-effects during iteration. You don’t even have to look that far, since every lambda passed to a query operator is an opportunity of introducing effects. The delayed execution nature of LINQ queries makes that those effects appear at the point of query execution. So far, nothing new.

So, the philosophical question ought to be whether or not we should embrace side-effects or go for absolute purity. While the latter would be preferable for various reasons, it’s not enforceable through the hosting languages for LINQ, so maybe we should exploit side-effects if we really want to do so. The flip side of this train of thought is that those side-effects could come and get us if we’re not careful, especially when queries get executed multiple times, potentially as part of a bigger query. In such a case, you’d likely not want effects to be duplicated. Below is a sample of such a problematic query expression:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _);
xrs.Zip(xrs, (l, r) => l + r).Take(10).Run(Console.WriteLine);

Using Generate, we generate a sequence of random numbers. Recall the first argument is the state of the anamorphic Generate operator, which we get passed in the lambdas following it: once to produce an output sequence (just a single random number in our case) and once to iterate (just keeping the same random number generator here). What’s more important is we’re relying on the side-effect of reading the random number generator which, as the name implies, provides random answers to the Next inquiry every time it gets called. In essence, the side-effect can (not) be seen by looking at the signature of Random.Next, which says it returns an int. In .NET this means the method may return the same int every time it gets called, but there are no guarantees whatsoever (as there would be in pure functional programming languages).

This side-effect, innocent and intentional as it may seem, comes and gets us if we perform a Zip on the sequence with itself. Since Zip iterates both sides, we’re really triggering separate enumeration (“GetEnumerator”) over the same sequence two times. Though it’s the same sequence object, each of its iterations will produce different results. As a result, the expected invariant of the Zip’s output being only even numbers (based on the assumption l and r would be the same as they’re produced by the same sequence) doesn’t hold:

52
114
112
103
41
135

78
114
59
137

While random number generation is a pretty innocent side-effect, not having it under control properly can lead to unexpected results as shown above. We can visualize this nicely using another side-effect introduced by Do:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr));
xrs.Zip(xrs, (l, r) => l + r).Take(10).Run(Console.WriteLine);

This will print a message for every number flowing out of the random number generating sequence, as shown below:

! -> 97
! -> 78
175
! -> 11
! -> 6
17
! -> 40
! -> 17
57
! -> 92
! -> 63
155
! -> 70
! -> 13
83
! -> 41
! -> 1
42
! -> 64
! -> 76
140
! -> 30
! -> 71
101
! -> 1
! -> 81
82
! -> 65
! -> 45
110

If we look a bit further to the original query, we come to the conclusion we can’t apply any form of equational reasoning anymore: it seems that the common subexpression “xrs” is not “equal” (as in exposing the same results) in both use sites. The immediate reason in the case of LINQ is the delayed execution, which is a good thing as our Generate call produces an infinite sequence. More broadly, it’s the side-effect that lies at the heart of the problem as equational reasoning breaks down in such a setting. For that very reason, side-effect permitting languages have a much harder time carrying out optimizations to code and need to be very strict about specifying the order in which operations are performed (e.g. in C#, arguments to a method call – which is always “call-by-value” – are evaluated in a left-to-right order).

Moving Take(10) up doesn’t change the delayed characteristic either:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Take(10)
    .Do(xr => Console.WriteLine("! -> " + xr));
xrs.Zip(xrs, (l, r) => l + r).Run(Console.WriteLine);

What would help is forcing the common subexpression’s query to execute, persisting (= caching) its results in memory, before feeding them in to the expression using it multiple times:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Take(10).ToArray()
    .Do(xr => Console.WriteLine("! -> " + xr));
xrs.Zip(xrs, (l, r) => l + r).Run(Console.WriteLine);

Don’t forget the Take(10) call though, as calling ToArray (or ToList) on an infinite sequence is not quite advised on today’s machines with finite amounts of memory. It’s clear such hacking is quite brittle and it breaks the delayed execution nature of the query expression. In other words, you can’t really hand out the resulting expression to a caller for it to call when it needs results (if it ever does). We’re too eager about evaluating (part of) the query, just to be able to tame the side-effect:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Take(10).ToArray();

var randomEvens = xrs.Zip(xrs, (l, r) => l + r);
// What if the consumer of randomEvens expects different results on each enumeration... Hard cheese! 
randomEvens.Run(Console.WriteLine);
randomEvens.Run(Console.WriteLine);

It’s clear that we need some more tools in our toolbox to tame desired side-effects when needed. That’s exactly what this post focuses on.

 

Option 1: Do nothing with Let

A first way to approach side-effects is to embrace them as-is. We just allow multiple enumerations of the same sequence to yield different results (or more generally, replicate side-effects). However, we can provide a bit more syntactical convenience in writing queries that reuse the same common subexpression in multiple places. In the above, we had to introduce an intermediate variable to store the common expression in, ready for reuse further on:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _);
xrs.Zip(xrs, (l, r) => l + r).Take(10).Run(Console.WriteLine);

Can’t we somehow write this more fluently? The answer is yes, using the Let operator which passes its left-hand side to a lambda expression that can potentially use it multiple times:

EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Let(xrs => xrs.Zip(xrs, (l, r) => l + r)).Take(10).Run(Console.WriteLine);

You can guess the signature of Let just by looking at the use above, but let’s include it for completeness:

public static IEnumerable<TResult> Let<TSource, TResult>(this IEnumerable<TSource> source, Func<IEnumerable<TSource>, IEnumerable<TResult>> function);

Because of the call-by-value nature of the languages we’re talking about, the expression used for the source parameter will be fully evaluated (not the same as enumerated!) before Let gets called, so we can feed it (again in a call-by-value manner) to the function which then can refer to it multiple times by means of its lambda expression parameter (in the sample above this is “xrs”). Let comes from the world of functional languages where it takes the following form:

let x = y in z

means (in C#-ish syntax)

(x => z)(y)

In other words, there’s a hidden function x => z sitting in a let-expression and the “value” for x (which is y in the sample) gets passed to it, providing the result for the entire let-expression. In EnumerableEx.Let, the function is clear as the second parameter, and the role of “y” is fulfilled by the source parameter. One could create a Let-form for any object as follows (not recommended because of the unrestricted extension method):

public static R Let<T, R>(this T t, Func<T, R> f)
{
    return f(t);
}

With this, you can write things like this:

Console.WriteLine(DateTime.Now.Let(x => x - x).Ticks);

This will print 0 ticks for sure, since the same DateTime.Now is used for x on both sides of the subtraction. If we were to expand this expression by substituting DateTime.Now for x, we’d get something different due to the duplicate evaluation of DateTime.Now, exposing the side-effect of reading from the system clock:

Console.WriteLine((DateTime.Now - DateTime.Now).Ticks);

(Pop quiz: What sign will the above Ticks result have? Is it possible for the above to return 0 sometimes?)

 

Option 2: Cache on demand a.k.a. MemoizeAll

As we’ve seen before, on way to get rid of the side-effect replication is by forcing eager evaluation of the sequence through operators like ToArray or ToList. However, those are a bit too eager in various ways:

  • They persist the whole sequence, which won’t work for infinite sequences.
  • They do so on the spot, i.e. the eagerness can’t be delayed till a later point (‘on demand”).

The last problem can be worked around using the Defer operator, but the first one is still a problem requiring another operator. Both those things are what MemoizeAll provides for, essentially persisting the sequence bit-by-bit upon consumption. This is achieved by exposing the enumerable while only maintaining a single enumerator to its source:

image

In the figure above, this is illustrated. Red indicates a fetch operation where the original source’s iterator makes progress as an element is requested that hasn’t been fetched before. Green indicates persisted (cached, memoized) objects. Gray indicates elements in the source that have been fetched and hence belong to the past from the (single) source-iterators point of view: MemoizeAll won’t ever request those again. Applying this operator to our running sample using Zip will produce results with the expected invariant:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .MemoizeAll();
xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
    xrs.Do(xr => Console.WriteLine("R -> " + xr)),
    (l, r) => l + r).Take(10).Run(Console.WriteLine);

Now we’ll see the xrs-triggered Do messages being printed only 10 times since the same element will be consumed by the two uses of xrs within Zip. The result looks as follows, showing how the right consumer of Zip never causes a fetch back to the random number generating source due to the internal caching by MemoizeAll:

! -> 71
L -> 71
R -> 71
142
! -> 18
L -> 18
R -> 18
36
! -> 12
L -> 12
R -> 12
24
! -> 96
L -> 96
R -> 96
192
! -> 1
L -> 1
R -> 1
2
! -> 54
L -> 54
R -> 54
108
! -> 9
L -> 9
R -> 9
18
! -> 87
L -> 87
R -> 87
174
! -> 18
L -> 18
R -> 18
36
! -> 12
L -> 12
R -> 12
24

What about lifetime of the source’s single enumerator? As soon as one of the consumers reaches the end of the underlying sequence, we got all elements cached and are prepared to any possible inquiry for elements on the output side of the MemoizeAll operator, hence it’s possible to dispose of the original enumerator. It should also be noted that memoization operators use materialization internally to capture the behavior of the sequence to expose to all consumers. This means exceptions are captured as Notification<T> so they’re repeatable to all consumers:

var xes = EnumerableEx.Throw<int>(new Exception()).StartWith(1).MemoizeAll();
xes.Catch((Exception _) => EnumerableEx.Return(42)).Run(Console.WriteLine);
xes.Catch((Exception _) => EnumerableEx.Return(42)).Run(Console.WriteLine);

The above will therefore print 1, 42 twice. In other words, the source blowing up during fetching by MemoizeAll doesn’t terminate other consumers that haven’t reached the faulty state yet (but if they iterate long enough, they’ll eventually see it exactly as the original consumer did).

Finally, what’s All about MemoizeAll? In short: the cache used by the operator can grow infinitely large. The difference compared to ToArray and ToList has been explained before, but it’s worth repeating it: MemoizeAll doesn’t fetch its source’s results on the spot but only makes progress through the source’s enumerator when one of the consumers requests an element that hasn’t been retrieved yet. Call it a piecemeal ToList if you want.

 

Option 3: Memoize, but less “conservative”

While MemoizeAll does the trick to avoid repetition of side-effects, it’s quite conservative in its caching as it never throws away elements it has retrieved. You never know whether someone – like a slow enumerator or a whole new enumerator over the memoized result – will request the data again, so a general-purpose Memoize can’t throw away a thing. However, if you know the behavior of the consumers of the memoized source, you can be more efficient about it and use Memoize specifying a buffer size. In our running sample of Zip we know that both uses of the source for the left and right inputs to Zip will be enumerated at the same pace, so it suffices to keep the last element in the cache in order for the right enumerator to be able to see the element the left enumerator just saw. Memoize with buffer size 1 does exactly that:

var xrs = EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .Memoize(1);
xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
    xrs.Do(xr => Console.WriteLine("R -> " + xr)),
    (l, r) => l + r).Take(10).Run(Console.WriteLine);

In pictures, this looks as follows:

image

Another valid buffer size – also the default – is zero. It’s left to the reader, as an exercise, to come up with a plausible theory for what that one’s behavior should be and to depict this case graphically.

(Question: Would it be possible to provide a “smart” memoization operator that knows exactly when it can abandon items in the front of its cache? Why (not)?)

 

Derived forms

The difference between Let and the Memoize operators is that the former feeds in a view on an IEnumerable<T> source to a function, allowing that one to refer to the source multiple times in the act of producing a source in return. Let is, as we saw, nothing but fancy function application in a “fluent” left-to-right dataflowy way. Derived forms of Memoize exist that have the same form where a function is fed a memoized data source:

  • Replay is Memoize on steroids
  • Publish is MemoizeAll on steroids

The following snippets show just what those operators do (modulo parameter checks):

public static IEnumerable<TResult> Publish<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function) { return function(source.MemoizeAll()); } public static IEnumerable<TResult> Publish<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function,
TSource initialValue) { return function(source.MemoizeAll().StartWith(initialValue)); } public static IEnumerable<TResult> Replay<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function) { return function(source.Memoize()); } public static IEnumerable<TResult> Replay<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function,
int bufferSize) { return function(source.Memoize(bufferSize)); }

So we could rewrite our Zip sample in a variety of ways, the following being the cleanest one-sized buffer variant:

EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .Replay(xrs => xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
                   xrs.Do(xr => Console.WriteLine("R -> " + xr)),
                   (l, r) => l + r),
            1)
    .Take(10).Run(Console.WriteLine);

 

Option 4: Fair (?) sharing with Share and Prune

The Share operator shares an IEnumerator<T> for any number of consumers of an IEnumerable<T>, hence avoiding duplication of side-effects. In addition, it also guarantees that no two consumers can see the same element, so in effect the Share operator has the potential of distributing elements across different consumers. Looking at it from another angle, one consumer can steal elements from the source, preventing another consumer from seeing it. Prune is derived from Share as follows:

public static IEnumerable<TResult> Prune<TSource, TResult>(this IEnumerable<TSource> source,
Func<IEnumerable<TSource>, IEnumerable<TResult>> function) { return function(source.Share()); }

The naming for Prune follows from the effect consumers inside the function have on the sequence being shared: each one consuming data effectively prunes elements from the head of the sequence, so that others cannot see those anymore. An example is shown below, showing another way a Zip could go wrong (practical scenarios for this operator would involve sharing) since the left and right consumers both advance the cursor of the same shared enumerator under the hood:

EnumerableEx.Generate(new Random(), rnd => EnumerableEx.Return(rnd.Next(100)), /* iterate */ _ => _)
    .Do(xr => Console.WriteLine("! -> " + xr))
    .Prune(xrs => xrs.Do(xr => Console.WriteLine("L -> " + xr)).Zip(
                   xrs.Do(xr => Console.WriteLine("R -> " + xr)),
                   (l, r) => l + r))
    .Take(10).Run(Console.WriteLine);

The result of this is of interest since the logging will reveal the sharing characteristic. Looking at the first Do’s output we’ll see it gets triggered by any consumer on the inside of Prune:

! -> 37
L -> 37
! -> 51
R -> 51
88
! -> 98
L -> 98
! -> 89
R -> 89
187
! -> 4
L -> 4
! -> 71
R -> 71
75
! -> 43
L -> 43
! -> 30
R -> 30
73
! -> 18
L -> 18
! -> 24
R -> 24
42
! -> 17
L -> 17
! -> 41
R -> 41
58
! -> 45
L -> 45
! -> 68
R -> 68
113
! -> 83
L -> 83
! -> 53
R -> 53
136
! -> 64
L -> 64
! -> 69
R -> 69
133
! -> 0
L -> 0
! -> 22
R -> 22
22

In pictures, this looks as follows:

image

Exercise: Can you guess how Memoize(0) differs from Share?

Quiz: What should be the behavior of the following fragment? (Tip: you got to know what two from clauses result in and how they execute)

Enumerable.Range(0, 10)
.Prune(xs => from x in xs.Zip(xs, (l, r) => l + r) from y in xs select x + y) .Run(Console.WriteLine);

 

Next on More LINQ

A look at the Asynchronous and Remotable operators, dealing with some infrastructure-related concepts, wrapping up this series for now.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

select top 9 [Subject] from dbo.cs_Posts
where postlevel = 1 and usertime < '01/01/2010' and usertime >= '01/01/2009'
order by TotalViews desc

Forgive me for the classic SQL, but here are the results with some short annotations inline:

  1. (Mis)using C# 4.0 Dynamic – Type-Free Lambda Calculus, Church Numerals, and more

    Uses the new C# 4.0 dynamic feature to implement the type-free lambda calculus consisting of an abstraction and application operator. Besides talking about the fundamentals of lambda calculus, this post shows how to implement the SKI combinators and Church Booleans, Church numerals and even recursive functions.
     
  2. LINQ to Ducks – Bringing Back The Duck-Typed foreach Statement To LINQ

    Since LINQ to Objects is layered on top of IEnumerable<T>, it doesn’t work against objects that just happen to implement the enumeration pattern consisting of GetEnumerator, MoveNext and Current. Since the foreach statement actually does work against such data sources, we bring back this duck typing to LINQ using AsDuckEnumerable<T>().
     
  3. Type-Free Lambda Calculus in C#, Pre-4.0 – Defining the Lambda Language Runtime (LLR)

    We repeat the exercise of the first blog post but now without C# 4.0 dynamic features., encoding application and abstraction operators using none less that exceptions. Those primitives define what I call the Lambda Language Runtime (LLR), which we use subsequently to implement a bunch of samples similar to the ones in the first post.
     
  4. Taming Your Sequence’s Side-Effects Through IEnumerable.Let

    Enumerable sequences can exhibit side-effects for various reasons ranging from side-effecting filter predicates to iterators with side-effecting imperative code interwoven in them. The Let operator introduced in this post helps you to keep those side-effects under control when multiple “stable” enumerations over the sequence are needed.
     
  5. Statement Trees With Less Pain – Follow-Up on System.Linq.Expressions v4.0

    The introduction of the DLR in the .NET 4 release brings us not only dynamic typing but also full-fledged statement trees as an upgrade to the existing LINQ expression trees. Here we realize a prime number generator using statement trees and runtime compilation, reusing expression trees emitted by the C# compiler where possible.
     
  6. LINQ to Z3 – Theorem Solving on Steroids – Part 1

    LINQifying Microsoft Research’s Z3 theorem solver has been one of my long-running side-projects. This most recent write-up on the matter illustrates the concept of a LINQ-enabled Theorem<T> and the required visitor implementation to interop with the Z3 libraries. Finally, we show a Sudoku and Kakuro solver expressed in LINQ.
     
  7. Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

    Just like post 5, we have a look at the .NET 4 expression tree support, now including statement trees. Besides pointing out the new tree node types, we show dynamic compilation and inspect the generated IL code using the SOS debugger’s dumpil command. In post 5, we follow up by showing how to reuse C# 3.0 expression tree support.
     
  8. Unlambda .NET – With a Big Dose of C# 3.0 Lambdas

    Esoteric programming languages are good topics for Crazy Sundays posts. In this one we had a look at how to implement the Unlambda language – based on SKI combinators and with “little” complications like Call/CC – using C# 3.0 with lots of lambda expressions. To satisfy our curiosity, we run a Fibonacci sample program.
     
  9. C# 4.0 Feature Focus – Part 4 – Co- and Contra-Variance for Generic Delegate and Interface Types

    Generic co- and contra-variance is most likely the most obscure C# 4.0 feature, so I decided to give it a bit more attention using samples of the everyday world (like apples and tomatoes). We explain why arrays are unsafe for covariance and how generic variance gets things right, also increasing your expressiveness.

In conclusion, it seems esoteric and foundational posts are quite popular, but then again that’s what I write about most. For 2010, I hope to please my readers’ interests even further with the occasional “stunt coding”, “brain pain” and “mind bending” (based on Twitter quotes in 2009). If there are particular topics you’d like to see covered, feel free to let me know. So, thanks again for reading in 2009 (good for slightly over 1TB – no that’s not a typo – of data transfer from my hoster) and hope to see you back in 2010!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduced in my previous blog post on The Essence of LINQ – MinLINQ, the first release of this project is now available for reference at the LINQSQO CodePlex website at http://linqsqo.codeplex.com. Compared to the write-up over here in my previous post, there are a few small differences and caveats:

  • Only FEnumerable functionality is available currently; the FObservable dual may follow later.
  • Option<T> has been renamed to Maybe<T>, to be CLS compliant and avoid clashes with the VB keyword.
  • Some operators are not provided, in particular GroupBy, GroupJoin and Join. They’re left as an exercise.
  • A few operator implementations are categorized as “cheaters” since they roundtrip through System.Linq.
  • Don’t nag about performance. The MinLINQ code base is by no means optimal and so be it.
  • Very few System.Interactive operators are included since those often require extra foundations (such as concurrency).

A few highlights:

  • FEnumerable.Essentials.cs is where the fun starts. Here the three primitives – Ana, Bind and Cata – form the ABC of LINQ.
  • There’s a Naturals() constructor function generating an infinite sequence of natural numbers, used in operators that use indexes.
  • OrderBy and ThenBy are supported through roundtripping to System.Linq with a handy trick to keep track of IOrderedEnumerable<T>.
  • As a sample, I’ve included Luke Hoban’s LINQified RayTracer with AsFEnumerable and AsEnumerable roundtripping. It works just fine.
  • Creating an architectural diagram in Visual Studio 2010 yields the following result (not meant to zoomed in), where I’ve used the following colors:
    • Green = Ana
    • Blue = Bind
    • Red = Cata

image

Obviously, all sorts of warnings apply. People familiar to my blog adventures will know this already, but just in case:

//
// This project is meant as an illustration of how an academically satifying layering
// of a LINQ to Objects implementation can be realized using monadic concepts and only
// three primitives: anamorphism, bind and catamorphism.
//
// The code in this project is not meant to be used in production and no guarantees are
// made about its functionality. Use it for academic stimulation purposes only. To use
// LINQ for real, use System.Linq in .NET 3.5 or higher.
//
// All of the source code may be used in presentations of LINQ or for other educational
// purposes, but references to http://www.codeplex.com/LINQSQO and the blog post referred
// to above - "The Essence of LINQ - MinLINQ" - are required.
//

Either way, if you find LINQ interesting and can stand some “brain pain of the highest quality” (a Twitter quote by dahlbyk), this will likely be something for you.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts Next page »