December 2009 - Posts

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about new combinator operators provided by EnumerableEx:

image

 

Combine and conquer?

Combinators are at the heart of LINQ’s expressive power, allowing sequences to be combined into new ones. In earlier posts, I’ve shown the essence of monadic computation through the following illustration:

image

It’s fair to say that SelectMany (or Bind) is the mother of all combinators, as many others can be derived from it (Exercise: implement Where and Select using SelectMany and a limited number of auxiliary operators like Return). In today’s post we’ll look at various new combinators added to the IEnumerable<T> set of operators.

So, what’s a combinator? In one world view (the one we’re using), it’s an operator that combines one or more instances of a given entity into a new such entity. For example, in functional programming we got S, K and I combinators that act on functions:

S x y z = (x z) y z
K x y = x
I x = x

A more precise definition can be found on http://en.wikipedia.org/wiki/Combinator, for those interested in more foundational stuff. In our case, we’ll combine one or more IEnumerable<T> instances into a new IEnumerable<R> (where R can be different from T).

 

Concat, now with more arguments

LINQ to Objects has always had a Concat operator, with the following signature:

public static IEnumerable<TSource> Concat<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second);

However, this is merely a special case of a more general version of Concat, introduced in EnumerableEx:

public static IEnumerable<TSource> Concat<TSource>(params IEnumerable<TSource>[] sources);
public static IEnumerable<TSource> Concat<TSource>(this IEnumerable<IEnumerable<TSource>> sources);

The second one is the core operator we’re talking about here, with the first overload providing convenience due to the lack of a “params enumerable” feature in the language. The Concat operator is simple to understand, simply yielding all TSource objects from all sequences in the sources parameter. If an error occurs during enumeration any of the sequences, the resulting concatenated sequence is also terminated for yielding. In fact, this operator is very similar to OnErrorResumeNext where the error condition is ignored.

Below is a sample illustrating the main scenarios:

new[] {
    new[] { 1, 2 },
    new[] { 3, 4 },
    new[] { 5, 6 }
}
.Concat()
.Materialize(/* for pretty printing */)
.Run(Console.WriteLine);

new[] {
    new[] { 1, 2 },
    new[] { 3, 4 }.Concat(EnumerableEx.Throw<int>(new Exception())),
    new[] { 5, 6 }
}
.Concat()
.Materialize(/* for pretty printing */)
.Run(Console.WriteLine);

The first sample will print numbers 1 through 6, while the second one will print 1 through 4 and an error notification.

image

 

Merge, a parallel Concat

Where Concat will proceed through the sources collection sequentially, guaranteeing in-order retrieval of data, one could get all the data from the sources in a parallel manner as well. To do so, Merge spawns workers that drain all of the sources in parallel, flattening or “sinking” all the results to the caller:

public static IEnumerable<TSource> Merge<TSource>(params IEnumerable<TSource>[] sources);
public static IEnumerable<TSource> Merge<TSource>(this IEnumerable<IEnumerable<TSource>> sources);
public static IEnumerable<TSource> Merge<TSource>(this IEnumerable<TSource> leftSource, IEnumerable<TSource> rightSource);

The three overloads share the same signatures as the Concat equivalents, with the second one being the most general overload again. The same sample as for Concat can be used to illustrate the working:

new[] {
    new[] { 1, 2 },
    new[] { 3, 4 },
    new[] { 5, 6 }
}
.Merge()
.Materialize(/* for pretty printing */)
.Run(Console.WriteLine);

new[] {
    new[] { 1, 2 },
    new[] { 3, 4 }.Concat(EnumerableEx.Throw<int>(new Exception())),
    new[] { 5, 6 }
}
.Merge()
.Materialize(/* for pretty printing */)
.Run(Console.WriteLine);

What the results are will depend on the mood of your task scheduler. Either way, for the first sample you should get to see all of the numbers from 1 through 6 getting printed, in any order (though 1 will come before 2, 3 before 4 and 5 before 6). On my machine I got 1, 3, 5, 4, 2, 6 in my first run. For the second sample, it’s entirely possible to see 5 and 6 getting printed before the exception for the second source is reached. But then that’s what you expect from parallel computation, don’t you?

Merge can speed up your data retrieval operations significantly, if you don’t care about the order in which results are returned. For example, you could cause two LINQ to SQL queries that provide stock quotes to run in parallel by using Merge, followed by a client-side duplicate entry elimination technique:

var stocks =
    from quote in
        EnumerableEx.Merge(
            (from quote in t1 select quote).Do(q => Console.WriteLine("t1: " + q)),
            (from quote in t2 select quote).Do(q => Console.WriteLine("t2: " + q))
        )
    group quote by quote.Symbol into g
    select new { g.Key, Price = g.Average(p => p.Price) };

stocks.Run(Console.WriteLine);

Results could look as follows, with the main idea being the parallel retrieval of both query results:

Query: SELECT Symbol, Price FROM Trader1
Query: SELECT Symbol, Price FROM Trader2
t2: { Symbol = MSFT, Price = 30.94 }
t1: { Symbol = MSFT, Price = 30.99 }
t1: { Symbol = ORCL, Price = 24.92 }
t1: { Symbol = GOOG, Price = 618.35 }
t1: { Symbol = AAPL, Price = 209.10 }
t2: { Symbol = ORCL, Price = 25.06 }
t2: { Symbol = GOOG, Price = 610.25 }
t2: { Symbol = AAPL, Price = 204.99 }
{ Key = MSFT, Price = 30.965 }
{ Key = ORCL, Price = 24.99 }
{ Key = GOOG, Price = 614.30 }
{ Key = AAPL, Price = 207.045 }

image

(Note: behavior in face of an exception will depend on timing and is not included in the diagram.)

 

Amb, a racing game

Amb is the ambiguous operator as introduced by McCarthy in 1963. Because of its nostalgic background, it’s been chosen to preserve the name as-is instead of expanding it. What’s so ambiguous about this operator? Well, the idea is that Amb allows two sequences to race to provide the first result causing the winning sequence to be elected as the one providing the resulting sequence from the operator call. The operator’s signatures make this clear:

public static IEnumerable<TSource> Amb<TSource>(params IEnumerable<TSource>[] sources);
public static IEnumerable<TSource> Amb<TSource>(this IEnumerable<IEnumerable<TSource>> sources);
public static IEnumerable<TSource> Amb<TSource>(this IEnumerable<TSource> leftSource, IEnumerable<TSource> rightSource);

Again, the overloads are threesome, just like Concat and Merge. To provide a sample of the operator’s behavior, use the following simple implementation of a Delay operator:

public static IEnumerable<TSource> Delay<TSource>(this IEnumerable<TSource> source, int delay)
{
    return EnumerableEx.Defer(() => { Thread.Sleep(delay); return source; });
}

Now we can write the following two test cases:

var src1 = new[] { 1, 2 }.Delay(300);
var src2 = new[] { 3, 4 }.Delay(400);
src1.Amb(src2).Run(Console.WriteLine);

var src3 = new[] { 5, 6 }.Delay(400);
var src4 = new[] { 7, 8 }.Delay(300);
src3.Amb(src4).Run(Console.WriteLine);

The expected result will be that src1 and src4 win their Amb battles against src2 and src3, respectively. One practical use for this operator is to have two or more redundant data sources, all containing the same data, fight to provide the quickest answer to a query. Here’s a sample illustrating this:

var stocks =
    EnumerableEx.Amb(
        (from quote in t1 select quote).Do(q => Console.WriteLine("t1: " + q)),
        (from quote in t2 select quote).Do(q => Console.WriteLine("t2: " + q))
    );

stocks.Run(Console.WriteLine);

Results could look as follows, assuming t2 was the quickest to provide an answer:

Query: SELECT Symbol, Price FROM Trader1
Query: SELECT Symbol, Price FROM Trader2
t2: { Symbol = MSFT, Price = 30.94 }
t2: { Symbol = ORCL, Price = 25.06 }
t2: { Symbol = GOOG, Price = 610.25 }
t2: { Symbol = AAPL, Price = 204.99 }
{ Key = MSFT, Price = 30.94 }
{ Key = ORCL, Price = 25.06 }
{ Key = GOOG, Price = 610.25 }
{ Key = AAPL, Price = 204.99 }

image

 

Repeat, again and (maybe) again

The purpose of Repeat is self-explanatory and could be seen as a constructor function as well. Two categories of overloads exists: one that takes a single element and an optional repeat count (unspecified = infinite) and another that takes a sequence and an optional repeat count. While the former is more of a constructor, the latter is more of a combinator over a single input sequence:

        public static IEnumerable<TSource> Repeat<TSource>(this IEnumerable<TSource> source);
        public static IEnumerable<TSource> Repeat<TSource>(TSource value);
        public static IEnumerable<TSource> Repeat<TSource>(this IEnumerable<TSource> source, int repeatCount);
        public static IEnumerable<TSource> Repeat<TSource>(TSource value, int repeatCount);

Samples don’t need much further explanation either:

EnumerableEx.Repeat(1).Take(5).Run(Console.WriteLine);
EnumerableEx.Repeat(2, 5).Run(Console.WriteLine);

new[] { 3, 4 }.Repeat().Take(4).Run(Console.WriteLine);
new[] { 5, 6 }.Repeat(2).Run(Console.WriteLine);

It goes almost without saying that an input sequence causing an exception will also terminate the enumeration of a repeated form of the same sequence:

new[] { 5, 6 }.Concat(EnumerableEx.Throw<int>(new Exception())).Repeat(2).Run(Console.WriteLine);

image

 

Zip ‘em together

Introduced in .NET 4.0, I’ve covered the new Zip operator already in my earlier post on C# 4.0 Feature Focus - Part 3 - Intermezzo: LINQ's new Zip operator. Rx ports back this operator to the .NET 3.5 System.Interactive library for consistency. In summary, Zip walks two sequences hand-in-hand, combing their respective yielded elements using a given function to produce a result. The signature is as follows:

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(this IEnumerable<TFirst> first, IEnumerable<TSecond> second, Func<TFirst, TSecond, TResult> resultSelector);

A simple example is shown below:

Enumerable.Range(1, 26).Zip(
    "abcdefghijklmnopqrstuvwxyz",
    (i, c) => "alpha[" + i + "] = " + c
).Run(Console.WriteLine);

In here, the first sequence is an IEnumerable<int> and the second one is a string, hence an IEnumerable<char>. The result is a table of mappings between numbers and letters. As an exercise, implement the following overload of Select using Zip and Generate, in terms of the more commonly used overload of Select that doesn’t take a position in the selector function:

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, int, TResult> selector);

One thing that’s interesting about the interactive version of Zip is its left-to-right characteristic with regards to enumeration of first and second. Internally, it does something along the following lines:

while (first.MoveNext() && second.MoveNext())
    …

In other words, “first” is dominant in that it can prevent a MoveNext call on second from happening, e.g. because of an exception getting thrown, non-termination (stuck forever) and termination (returning false). The following matrix shows the implications of this:

image

It’s left as an exercise to the reader to implement the right-hand side behavior (notice the transposition symmetry!) for fun, where a Zip could fetch results from both sources simultaneously, combining their results or exceptions into produced results. What are advantages and disadvantages of such an approach? As an additional question, think about ways to detect and report an asymmetric zip, where one of both sides still has an element while the other side has signaled termination.

Finally, the diagram illustrating some of the regular operations of Zip. Other combinations of behavior can be read from the matrix above.

image

 

Scan, a running aggregation operator

Readers familiar with the LINQ to Objects APIs will know about the Aggregate operator, which we also mentioned before when talking about the new Generate operator (as the opposite of Aggregate). Aggregate “folds” or reduces a sequence of elements into a single value, eating the elements one by one using some specified function. However, sometimes you may not want to loose the intermediate results, e.g. if you want to compute a running sum or so. Scan allows you to do so:

public static IEnumerable<TSource> Scan<TSource>(this IEnumerable<TSource> source, Func<TSource, TSource, TSource> accumulator);
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> accumulator);

You’ll see big similarities with the existing Aggregate operator when looking at the signatures above, and use of the operator is straightforward as well:

Enumerable.Range(1, 10)
.Scan((sum, i) => sum + i)
.Run(Console.WriteLine);

Enumerable.Range(2, 9).Reverse()
.Scan(3628800, (prod, i) => prod / i)
.Run(Console.WriteLine);

The first sample will simply print 1, 1+2 = 3, 3+3 = 6, 6+4 = 10, … In the second sample, a seed value is used to illustrate an inverse factorial computation, dividing a given value by subsequent descending values (from 10 to 2).

image

 

SelectMany

Finally, as a honor to the monadic bind operator, a new overload was added for SelectMany :-). Its signature is shown below, and it’s left to the reader to figure out what it does (simple):

public static IEnumerable<TOther> SelectMany<TSource, TOther>(this IEnumerable<TSource> source, IEnumerable<TOther> other);

 

Next on More LINQ

Functionally inspired constructs allowing to share enumerables and tame their side-effects.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about the materialization and dematerialization operators provided by EnumerableEx:

image

 

von Neumann was right

Code and data are very similar animals, much more similar than you may expect them to be. We can approach this observation from two different angles, one being a machine-centric view. Today’s computers are realizations of von Neumann machines where instructions and data are treated on the same footage from a memory storage point of view. While this is very useful, it’s also the source of various security-related headaches such as script or SQL injection and data execution through e.g. stack overruns (Data Execution Prevention is one mitigation).

Another point of view goes back to the foundational nature of programming, in particular the essentials of functional programming, where functions are used to represent data. An example are Church numerals, which are functions that are behaviorally equivalent to natural numbers (realized by repeated application of a function, equal in number to the natural number being represented). This illustrates how something that seems exclusively code-driven can be used to represent or mimic data.

If the above samples seem far-fetched or esoteric, there are a variety of more familiar grounds where the “code as data” paradigm is used or exploited. One such sample is LISP where code and data representation share the same syntactical form and where the technique of quotation can be used to represent a code snippet as data for runtime inspection and/or manipulation. This is nothing other than meta-programming in its earliest form. Today we find exactly the same principle back in C#, and other languages, through expression trees. The core property here is so-called homo-iconicity, where code can be represented as data without having to resort to a different syntax (homo = same; iconic = appearance):

Func<int, int> twiceD = x => x * 2;
Expression<Func<int, int>> twiceE = x => x * 2;

What what does all of this have to do with enumerable sequences? Spot on! The matter is that sequences seem to be a very data-intensive concept, and sure they are. However, the behavior and realization of such sequences, e.g. through iterators, can be very code-intensive as well, to such an extent that we introduced means to deal with exceptions (Catch for instance) and termination (Repeat, restarting after completing). This reveals that it’s useful to deal with all possible states a sequence can go through. Guess what, state is data.

 

The holy trinity of IEnumerator<T> and IObserver<T> states

In all the marble diagrams I’ve shown before, there was a legend consisting of three potential states an enumerable sequence can go through as a result of iteration. Those three states reflect possible responses to a call to MoveNext caused by the consumer of the sequence:

image

In the world of IObserver<T>, the dual to IEnumerator<T> as we saw in earlier episodes, those three states are reflected in the interface definition directly, with three methods:

// Summary:
//     Supports push-style iteration over an observable sequence.
public interface IObserver<T>
{
    // Summary:
    //     Notifies the observer of the end of the sequence.
    void OnCompleted();
    //
    // Summary:
    //     Notifies the observer that an exception has occurred.
    void OnError(Exception exception);
    //
    // Summary:
    //     Notifies the observer of a new value in the sequence.
    void OnNext(T value);
}

Instead of having an observer getting called on any of those three methods, we could equally well record the states “raised” by the observable, which turns calls (code) into object instances (data) of type Notification<T>. This operation is called materialization. Thanks to dualization, the use of Notification<T> can be extended to the world of enumerables as well.

image

Notification<T> is a discriminated union with three notification kinds, reflecting the three states we talked about earlier:

public enum NotificationKind
{
    OnNext = 0,
    OnError = 1,
    OnCompleted = 2,
}

 

It’s a material dual world

Materialization is the act of taking a plain enumerable and turning it into a data-centric view based on Notification<T>. Dematerialization reverses this operation, going back to the code-centric world. Thanks to this back-and-forth ability between the two worlds of code and data, we get the ability to use LINQ over notification sequences and put the result back into the regular familiar IEnumerable<T> world. A figure makes this clear:

image

The power of this lies in the ability to use whatever domain is more convenient to perform operations over a sequence. Maybe you want to do thorough analysis of error conditions, corresponding to the Error notification kind, or maybe it’s more convenient to create a stream of notification objects before turning them into a “regular” sequence of objects that could exhibit certain additional behavior (like error conditions). This is exactly the same as the tricks played in various other fields, like mathematics where one can do Fourier analysis either in the time of frequency domain. Sometimes one is more convenient than the other; all that counts is to know there are reliable ways to go back and forth.

image

(Note: For the Queryable sample, you may want to end up in the bottom-right corner, so the AsQueryable call is often omitted.)

 

Materialize and Dematerialize

What remains to be said in this post are the signatures of the operators and a few samples. First, the signatures:

public static IEnumerable<Notification<TSource>> Materialize<TSource>(this IEnumerable<TSource> source);
public static IEnumerable<TSource> Dematerialize<TSource>(this IEnumerable<Notification<TSource>> source);

An example of materialization is shown below, where we take a simple range generator to materialize. We expect to see OnNext notifications for all the numeric values emitted, terminated by a single OnCompleted call:

Enumerable.Range(1, 10)
.Materialize()
.Run(Console.WriteLine);

This prints:

OnNext(1)
OnNext(2)
OnNext(3)
OnNext(4)
OnNext(5)
OnNext(6)
OnNext(7)
OnNext(8)
OnNext(9)
OnNext(10)
OnCompleted()

A sample where an exception is triggered by the enumerator is shown below. Notice the code won’t blow up when enumerating over the materialized sequence: the exception is materialized as a passive exception object instance in an error notification.

Enumerable.Range(1, 10).Concat(EnumerableEx.Throw<int>(new Exception()))
.Materialize()
.Run(Console.WriteLine);

The result is as follows:

OnNext(1)
OnNext(2)
OnNext(3)
OnNext(4)
OnNext(5)
OnNext(6)
OnNext(7)
OnNext(8)
OnNext(9)
OnNext(10)
OnError(System.Exception)

Starting from a plain IEnumerable<T> the grammar of notifications to be expected is as follows:

OnNext* ( OnCompleted | OnError )?

In the other direction, starting from the world of IEnumerable<Notification<T>> one can write a different richer set of sequence defined by the following grammar:

( OnNext | OnCompleted | OnError )*

For example:

var ns = new Notification<int>[] {
    new Notification<int>.OnNext(1),
    new Notification<int>.OnNext(2),
    new Notification<int>.OnCompleted(),
    new Notification<int>.OnNext(3),
    new Notification<int>.OnNext(4),
    new Notification<int>.OnError(new Exception()),
    new Notification<int>.OnNext(5),
};

Dematerializing this sequence of notifications will produce an enumerable sequence that will run no further than the first OnCompleted or OnError:

ns
.Dematerialize()
.Run(Console.WriteLine);

This prints 1 and 2 and then terminates. The reason this can still be useful is to create a stream of notifications that will be pre-filtered before doing any dematerialization operation on it. For example, a series of batches could be represented in the following grammar:

( OnNext* OnCompleted )*

If the user requests to run n batches, the first n – 1 OnCompleted notifications can be filtered out using some LINQ query expression, before doing dematerialization.

Finally, a sample of some error-filtering code going back and forth between IEnumerable<T> and IEnumerable<Notification<T>> showing practical use for those operators when doing sophisticated error handling:

var xs1 = new[] { 1, 2 }.Concat(EnumerableEx.Throw<int>(new InvalidOperationException()));
var xs2 = new[] { 3, 4 }.Concat(EnumerableEx.Throw<int>(new ArgumentException()));
var xs3 = new[] { 5, 6 }.Concat(EnumerableEx.Throw<int>(new OutOfMemoryException()));
var xs4 = new[] { 7, 8 }.Concat(EnumerableEx.Throw<int>(new ArgumentException()));

var xss = new[] { xs1, xs2, xs3, xs4 };
var xns = xss.Select(xs => xs.Materialize()).Concat();

var res = from xn in xns
          let isError = xn.Kind == NotificationKind.OnError
          let exception = isError ? ((Notification<int>.OnError)xn).Exception : null
          where !isError || exception is OutOfMemoryException
          select xn;

res.Dematerialize().Run(Console.WriteLine);

Given some input sequences, we materialize and concatenate all of them into sequence xns. Now we write a LINQ query over the notifications to filter out exceptions, unless the exception is a critical OOM one (you could add others to this list). The result is we see 1 through 6 being printed to the screen. (Question: What’s the relationship to OnErrorResumeNext that we saw in the previous post? What’s similar, what’s different?)

 

Exercises

As an exercise, try to implement the following operators in a notification-oriented manner:

  1. Catch
    (tip: use SelectMany and lots of conditional BLOCKED EXPRESSION
  2. Finally
    (tip: use SelectMany and Defer)
  3. OnErrorResumeNext – overload taking two IEnumerable<TSource> sequences
    (tip: use TakeWhile)
  4. Retry – overload with a retry count
    (tip: recursion, ignore stack overflow conditions)

The skeleton code for those operators is shown below:

return
    source
.Materialize()
// Your stuff here
.Dematerialize();

All-inclusive unit test:

    new[] { 1, 2 }
        .Finally(() => Console.WriteLine("Finally inner"))
    .Concat(EnumerableEx.Throw<int>(new InvalidOperationException()))
.Catch((InvalidOperationException _) => new[] { 3, 4 }.Concat(EnumerableEx.Throw<int>(new Exception())))
.Finally(() => Console.WriteLine("Finally outer"))
.OnErrorResumeNext(new[] { 5, 6 })
.Concat(EnumerableEx.Throw<int>(new ArgumentException()))
.Retry(2)
.Run(Console.WriteLine);

This should produce the same results with the built-in operators and with your implementation of those operators. More specifically, the result has to be:

1
2
Finally inner
3
4
Finally outer
5
6
1
2
Finally inner
3
4
Finally outer
5
6

with no exception leaking to the surface in the call site (behavior of Retry after the retry count has been exceeded).

 

Next on More LINQ

Various combinators to combine or transform existing observable sources into others.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about constructor operators provided by EnumerableEx:

image

 

Constructing sequences

In order to perform operations over sequences using various combinators and operators, it’s obviously a prerequisite to have such sequences available. While collection types in the .NET Framework implement IEnumerable<T> (or the non-generic counterpart, bridgeable to LINQ using the Cast<T> Standard Query Operator), one often wants to construct sequences on the spot. Moreover, sequences often should have a lazy nature as their persistence in memory may be problematic or infeasible (infinite sequences). For all those reasons, constructor operators come in handy.

LINQ to Objects already has a constructor function called Enumerable.Range to produce a sequence with a integral numbers starting from a certain value, returning the asked amount of numbers lazily:

// Imperative
for (int i = 0; i < 10; i++)
{
    Console.WriteLine(i);
}

// LINQish
Enumerable.Range(start: 0, count: 10).Run
(
    Console.WriteLine
);

The lazy nature should not be underestimated, as one could create infinite sequences representing the potential to produce a certain (ordered) set of objects. When combined with other restriction operators it becomes possible to use composition to limit the produced results in a manner very close to the domain we’re talking about. For example, positive natural numbers are integer numbers larger or equal to zero. Numbers starting with 5 are the numbers, capped by means of a Skip operation or something similar. Taking a number of elements can be done using Take. Without deviating too much from our today’s blogging mission, here’s what I’m alluding to:

static IEnumerable<int> Integer()
{
    for (int i = int.MinValue; i < int.MaxValue; i++)
        yield return i;

    yield return int.MaxValue;
}
var ints = Integer();
var nats = from i in ints where i >= 0 select i;
var some = nats.Skip(5).Take(5); // Good luck :-)
some.Run(Console.WriteLine);

I’ll leave it to the reader as a challenge to come up with ways to optimize this in a variety of ways whilst preserving the declarative nature on the use site (i.e. make the sarcastic “Good luck” go away).

Back to Rx: in today’s installment we’ll look at various constructor functions in EnumerableEx.

 

Return and the cruel return of the monad

The simplest constructor function is Return, simply yielding the single value specified on demand. It’s similar to a one-element array and that’s about it from a practical point of view:

public static IEnumerable<TSource> Return<TSource>(TSource value);

You should be able to guess the implementation of the operator for yourself. Use is straightforward as shown below:

EnumerableEx.Return(42).Run(Console.WriteLine);

One interesting thing about this constructor function is its signature, going from TSource to IEnumerable<TSource>. This is nothing but the return function (sometimes referred to as unit) used on a monad, with a more general signature of T to M<T>, the little brother to the bind function which has signature M<T> –> (T –> M<R>) –> M<R>, also known as SelectMany in LINQ. The triplet (known as a Kleisli triple) of the type constructor M (in LINQ the particular cases of IEnumerable<T> and IQueryable<T> are used, i.e. not a general type constructor), the unit and bind function form a monad.

image

For a great overview of Language Integrated Monads, have a look at Wes Dyer’s The Marvels of Monads post. For a more foundational paper (with lots of applications though), have a look at Philip Wadler’s Monads for Functional Programming paper.

 

Throw me an exception please

Another singleton constructor is the Throw function that we’ve seen repeatedly in the previous post on exception handling over sequences. Its role is to provide an enumerable that will throw an exception upon the first MoveNext call during enumeration:

public static IEnumerable<TSource> Throw<TSource>(Exception exception);

In fact, this is a lazily thrown exception constructor. Use is simple again:

EnumerableEx.Throw<int>(new Exception()).Run();

Notice you got to specify the element type for the returned (never-yielding) sequence as we’re constructing an IEnumerable<T> and there’s no information to infer T from. Obviously, the resulting sequence can be combined with other sequences of the same type in various places, e.g. using Concat. Below is a sample of how to use the Throw constructor with SelectMany to forcefully reject even numbers in a sequence (rather than filtering them out):

var src = Enumerable.Range(1, 10);//.Where(i => i % 2 != 0);
var res = src.SelectMany(i =>
    i % 2 == 0
    ? EnumerableEx.Throw<int>(new Exception("No evens please!"))
    : EnumerableEx.Return(i)
);
res.Run(Console.WriteLine);

Here we use the conditional operator to decide between an exception throwing sequence or a singleton element sequence (in this case, “Many” in “SelectMany” has “Single” semantics).

 

Empty completing the triad

Since the introduction of LINQ in .NET 3.5 (thanks to reader Keith for reminding me about my heritage), there’s been an Empty constructor as well, with the following signature and implementation:

public static IEnumerable<TSource> Empty<TSource>()
{
    yield break;
}

There seems little use for this though I challenge the reader to use this one to build the Where operator using SelectMany. In fact, the reason I say “for completeness” is illustrated below:

image

 

StartWith = Snoc (or Cons in disguise)

People familiar with LISP, ML, Scala, and many other functional languages, will know the concept of cons by heart. Cons is nothing but the abbreviation for “construct” used to create a bigger list (in LISP lingo) out of an existing list and an element to be prepended:

(cons 1 (cons 2 nil))

The above creates a list with 1 as the head and (cons 2 nil) as the tail, which by itself expands into a cell containing 2 and a tail with the nil (null) value. The underlying pair of the head value and tail “reference” to the tail list is known as a cons cell. Decomposition operators exist, known as car and cdr (from old IBM machine terminology where cons cells were realized in machine words consisting of a so called “address” and “decrement” register, explaining the a and d in car and cdr – c and r stand for content and register respectively):

(car (cons 1 2)) == 1
(cdr (cons 1 2)) == 2

The StartWith operator is none other than Cons in reverse (sometimes jokingly referred to as “Snoc” by functional programmers):

public static IEnumerable<TSource> StartWith<TSource>(this IEnumerable<TSource> source, params TSource[] first);
public static IEnumerable<TSource> StartWith<TSource>(this IEnumerable<TSource> source, TSource first);

Focus on the second one first. See how the “first” parameter is taken in as the second argument to StartWith. The reason is it’d be very invasive to put the extension method this parameter on the “first” parameter, as it would pollute all types in the framework with a “Cons” method:

public static IEnumerable<TSource> Cons<TSource>(this TSource head, IEnumerable<TSource> tail);

So, StartWith has to be read in reverse as illustrated below:

EnumerableEx.StartWith(
    EnumerableEx.StartWith(
        EnumerableEx.Return(3),
        2
    ),
    1
).Run(Console.WriteLine);

This prints 1, 2, 3 since 2 is put in front of 3 and 1 in front of that { 2, 3 } result. An overload exists to start a sequence with multiple elements in front of it:

EnumerableEx.StartWith(
    EnumerableEx.Return(3),
    1, 2
).Run(Console.WriteLine);

image

 

Generate is your new anamorphism

Generate is the most general constructor function for sequences you can imagine. It’s the dual of Aggregate in various ways. Where Aggregate folds a sequence into a single object by combining elements in the input sequence onto a final value in a step-by-step way, the Generate function unfolds a sequence out of a generator function also in a step-by-step way. To set the scene, let’s show the power of Aggregate by refreshing its signature and showing how to implement a bunch of other LINQ combinators in terms of it:

public static TResult Aggregate<TSource, TAccumulate, TResult>(this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func, Func<TAccumulate, TResult> resultSelector);

Given a seed value and a function to combine an element of the input sequence with the current accumulator value into a new accumulator value, the Aggregate function can produce a result that’s the result of (left-)folding all elements in the sequence one-by-one. For example, a sum is nothing but a left-fold thanks to left associativity of the numerical addition operation:

1 + 2 + 3 + 4 + 5 = ((((1 + 2) + 3) + 4) + 5)

The accumulated value is the running sum of everything to the left of the current element. Seeing the elements of a sequence being eaten one-by-one is quite a shocking catastrophic event for the sequence, hence the name catamorphism. Below are implementations of Sum, Product, Min, Max, FirstOrDefault, LastOrDefault, Any and All:

var src = Enumerable.Range(1, 10);

Console.WriteLine("Sum = " + src.Aggregate(0, (sum, i) => sum + i));
Console.WriteLine("Prd = " + src.Aggregate(1, (prd, i) => prd * i));
Console.WriteLine("Min = " + src.Aggregate(int.MaxValue, (min, i) => i < min ? i : min));
Console.WriteLine("Max = " + src.Aggregate(int.MinValue, (max, i) => i > max ? i : max));
Console.WriteLine("Fst = " + src.Aggregate((int?)null, (fst, i) => fst == null ? i : fst));
Console.WriteLine("Lst = " + src.Aggregate((int?)null, (lst, i) => i));
Console.WriteLine("AlE = " + src.Aggregate(true, (all, i) => all && i % 2 == 0));
Console.WriteLine("AnE = " + src.Aggregate(false, (any, i) => any || i % 2 == 0));

As the dual to catamorphisms we find anamorphisms, where one starts from an initial state and generates elements for the resulting sequence. I leave it to the reader to draw parallels with others words starting with ana- (from the Greek “up”). The most elaborate signature of Generate is shown below:

public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, bool> condition, Func<TState, IEnumerable<TResult>> resultSelector, Func<TState, TState> iterate);

To see this is the dual to Aggregate, you got to use a bit of fantasy, but you can see the parallels. Where Aggregate takes in an IEnumerable<TSource> and produces a TResult, the Generate function produces an IEnumerable<TResult> from a given TState (and a bunch of other things). On both sides, there’s room for an initial state and a way to make progress (“func” versus “iterate”) both staying in their respective domains for the accumulation type (TAccumulate and TState). To select the result (that will end up in the output sequence), the overload above allows to produce multiple TResult values to be returned per TState. And finally, there’s a stop condition which is implicit in the case of a catamorphism as the “remaining tail of sequence is empty” condition can be used for it (i.e. MoveNext returns false).

Another way to look at Generate is to draw the parallel with a for loop’s three parts: initialization, termination condition, update. In fact, Generate is implemented as some for-loops. More signatures exist:

public static IEnumerable<TValue> Generate<TValue>(this Func<Notification<TValue>> function);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, IEnumerable<TResult>> resultSelector, Func<TState, TState> iterate);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, Notification<TResult>> resultSelector, Func<TState, TState> iterate);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, bool> condition, Func<TState, IEnumerable<TResult>> resultSelector, Func<TState, TState> iterate);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, bool> condition, Func<TState, TResult> resultSelector, Func<TState, TState> iterate);

We’ll discuss the ones with Notification<T> types in the next episode titled “Code = Data”, but the remaining three others are all straightforward to understand. Some lack a terminating condition while others lack the ability to yield multiple results per intermediate state. Below is a sample of Generate to produce the same results as Enumerable.Range:

Func<int, int, IEnumerable<int>> range = (start, count) => EnumerableEx.Generate(0, i => i < count, i => i + start, i => i + 1);

The other constructors we’ve seen can be written in terms of Generate as well:

Func<IEnumerable<int>> empty = () => EnumerableEx.Generate<object, int>(null, o => false, o => null, o => o);
Func<int, IEnumerable<int>> @return = i => EnumerableEx.Generate<int, int>(0, n => n < 1, o => new [] { i }, n => n + 1);
Func<Exception, IEnumerable<int>> @throw = ex => EnumerableEx.Generate<object, int>(null, o => true, o => { throw ex; return null; }, o => o);
Func<int, IEnumerable<int>, IEnumerable<int>> cons = (a, d) => EnumerableEx.Generate<int, int>(0, n => n < 2, o => o == 0 ? new [] { a } : d, n => n + 1);

@return(1).Run(Console.WriteLine);
@throw(new Exception()).Catch((Exception ex) => @return(22)).Run(Console.WriteLine);
cons(1, cons(2, cons(3, empty()))).Run(Console.WriteLine);

 

Defer what you can do now till later

The intrinsic lazy nature of sequences with regards to enumeration allows us to push more delayed effects into the sequence’s iteration code. In particular, the construction of a sequence can be hidden behind a sequence of the same type. Let’s show a signature to make this more clear:

public static IEnumerable<TSource> Defer<TSource>(Func<IEnumerable<TSource>> enumerableFactory);

In here, an IEnumerable<TSource> is created out of a factory function. What’s handed back from the call to Defer is a stub IEnumerable<TSource> that will only call its factory function (getting the real intended result sequence) upon a triggered enumeration. An example is shown below:

var xs = EnumerableEx.Defer(() =>
{
    Console.WriteLine("Factory!");
    return EnumerableEx.Return(1);
});

Console.ReadLine();

xs.Run(Console.WriteLine);
xs.Run(Console.WriteLine);

In here, the Factory message won’t be printed till something starts enumerating the xs sequence. Both calls to Run do so, meaning the factory will be called twice (and could in fact return a different sequence each time).

image

 

Next on More LINQ

More duality, this time between “code and data” views on sequences, introducing Notification<T>.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about exception handling operators provided by EnumerableEx:

image

 

Iterating with and without exceptions

Under regular circumstances, one expects sequences to produce data in response to iteration. However, it’s perfectly possibly for an iterator (or any enumerable object) to throw an exception in response to a MoveNext call. For example:

Enumerable.Range(0, 10)
    .Reverse()
    .Select(x => 100 / x)
    .Run(Console.WriteLine);

This piece of code produces the following output:

11
12
14
16
20
25
33
50
100

Unhandled Exception: System.DivideByZeroException: Attempted to divide by zero.
   at Demo.Program.<Main>b__0(Int32 x) in Program.cs:line 15
   at System.Linq.Enumerable.<>c__DisplayClass12`3.<CombineSelectors>b__11(TSource x)
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Linq.EnumerableEx.Run[TSource](IEnumerable`1 source)
   at Demo.Program.Main(String[] args) in Program.cs:line 13

Only when the Select operator’s iterator hits 0 for its input, its projection function will throw a DivideByZeroException, causing the iterator to come to an abrupt stop as seen above. In the connected world, where iterators may reach out to external services that can signal error conditions, the ability to handle such sequences in a better and composable way becomes increasingly important.

In this post, we’ll have a look at the exception handling primitives for enumerable sequences provided by Rx in System.Interactive.EnumerableEx. A related constructor operator, Throw, will be discussed later but is simply enough to reveal in this context because of its relevance:

var oops = EnumerableEx.Throw<int>(new Exception("Oops"));
oops.Run();

The Throw operator simply creates an iterator that throws the specified exception upon the first MoveNext call on its enumerator. It’s the counterpart to the Return operator creating a single-element iterator. Logically both correspond to the OnNext and OnError methods of IObserver<T> in the reactive world. In addition, we’ll see the relation between those operators and Notification<T> later on, when covering “Code = Data” discussing the Materialize and Dematerialize operators.

 

Catch it and move on

First on is the Catch operator which is available with the following signatures:

public static IEnumerable<TSource> Catch<TSource>(IEnumerable<IEnumerable<TSource>> sources);
public static IEnumerable<TSource> Catch<TSource, TException>(this IEnumerable<TSource> source, Func<TException, IEnumerable<TSource>> handler) where TException : Exception;

The second overload is the one used directly for exception handling as you’re used to it in your favorite imperative language. While you normally associate a handler code block with a “protected code block”, here a handler consists of a function producing a sequence in response to an exceptional iteration over the corresponding “protected sequence”. A sample will make things clearer. Consider the following iterator:

static IEnumerable<int> CouldThrow()
{
    yield return 1;
    yield return 2;
    throw new InvalidOperationException("Oops!");
}

Assume you can’t handle the exceptional condition from the inside and you got the iterator from somewhere else, so the following is impossible to achieve:

static IEnumerable<int> CouldThrow()
{
    try
    {
        yield return 1;
        yield return 2;
        throw new InvalidOperationException("Oops!");
    }
    catch (InvalidOperationException)
    {
        yield return 3;
        yield return 4;
        yield return 5;
    }
}

In fact, the above is invalid C# since you can’t yield from a try-block that’s associated with a catch clause, and neither can you yield from a catch clause. Either way, this illustrates basically what we want to achieve from a conceptual point of view, but on the consuming side of the iterator. This is what Catch allows us to do, as follows:

CouldThrow()
    .Catch((InvalidOperationException ex) => new[] { 3, 4, 5 })
    .Run(Console.WriteLine);

This simply prints the numbers 1 through 5 on the screen, where the last three values originate from the exception handler. Obviously one could inspect the exception object in the handler. Just like with regular block-based exception handling constructs, one can have multiple “nested” catch clauses associated with the same source sequence. This is achieved by simply chaining Catch operator calls:

new [] {
    /* yield return */ 1,
    /* yield return */ 2 }.Concat(
    /* throw */        EnumerableEx.Throw<int>(new InvalidOperationException("Oops!")))
.Catch((InvalidOperationException ex) => new [] {
    /* yield return */ 3,
    /* yield return */ 4 }.Concat(
    /* throw */        EnumerableEx.Throw<int>(new FormatException("Aargh!"))))
.Catch((FormatException ex) => new [] {
    /* yield return */ 5 })
.Run(Console.WriteLine);

Here, the first catch clause throws an exception by itself, being caught by the next catch clause. This is completely similar to regular exception handling. In summary, the Catch operator allows iteration of a sequence to continue with another one when an exception occurs during the first’s iteration. The handler function provided to Catch isn’t evaluated till an exception occurs, so if the resulting sequence isn’t iterated far enough for an exception to be triggered, the handler obviously won’t execute.

The second overload of Catch allows specifying a sequence of sequences (IEnumerable<IEnumerable<T>>), continuing a sequence that has terminated by an exception with the sequence following it. For example:

var ex = EnumerableEx.Throw<int>(new Exception());
EnumerableEx.Catch(new[]
    {
        new [] { 1, 2 }.Concat(ex),
        new [] { 3, 4 }.Concat(ex),
        new [] { 5 },
        new [] { 6 },
    }).Run(Console.WriteLine);

This again will print the numbers 1 through 5, but not 6. Reason is that the first sequence blew up after yielding 1 and 2, causing the next sequence yielding 3 and 4 to be looped in, again causing an exception followed by a hand-over to the third sequence yielding 5. This third sequence finishes regularly (as opposed to exceptionally), so the story ends. I leave it to the reader to write down the corresponding block-structured nested try-catch statements this corresponds to from a conceptual angle.

Exercise: how would you implement a rethrow operation?

image

 

Finally, too

Now we’ve seen the Catch operator, Finally should come as no surprise. From the signature alone, we can see what it does:

public static IEnumerable<TSource> Finally<TSource>(this IEnumerable<TSource> source, Action finallyAction);

Under whatever terminating circumstance when enumerating over the source, the finallyAction will be executed. Obviously this can be illustrated using two cases, one for the regular case and one for the exceptional case. For the latter, we use EnumerableEx.Throw again. First, the regular case:

/* try { */ new [] {
    /* yield return */ 1,
    /* yield return */ 2 }
.Finally(() =>
    Console.WriteLine("Finally"))
.Run(Console.WriteLine);

This will print 1 and 2, followed by the Finally message. In case of an exception, let’s show the similarity to the lexical nesting of exception handler blocks in C#:

/* try { */
    /* try { */ new[] {
        /* yield return */ 1,
        /* yield return */ 2 }.Concat(
        /* throw */        EnumerableEx.Throw<int>(new Exception()))
    .Finally(() =>
        Console.WriteLine("Finally"))
.Catch((Exception ex) => new[] {
    /* yield return */ 3,
    /* yield return */ 4,
    /* yield return */ 5 })
.Run(Console.WriteLine);

Here the innermost enumerable yields 1 and 2, followed by the throwing of an exception. The Finally operator ensures the printing action is executed no matter how this sequence terminates. In this case, the exception will be caught downstream by the Catch operator, so the end result on the screen will be 1, 2, Finally, 3, 4, 5. As a simple exercise, think about what the following code will and should print:

/* try { */
    /* try { */ new[] {
        /* yield return */ 1,
        /* yield return */ 2 }.Concat(
        /* throw */        EnumerableEx.Throw<int>(new Exception()))
    .Finally(() =>
        Console.WriteLine("Finally"))
.Catch((Exception ex) => new[] {
    /* yield return */ 3,
    /* yield return */ 4,
    /* yield return */ 5 })
.Take(2)
.Run(Console.WriteLine);

image 

(Note: break happens when a consumer stops iterating over the resulting sequence.)

 

OnErrorResumeNext as in VB

Visual Basic fans will recognize this operator without doubt. Its operation is fairly straightforward: given a sequence of sequences, those are enumerated one by one, yielding their result to the caller. This is pretty much the same as the Concat operator we’ll see when talking about combinators, with the main difference being that an exceptional termination of any of the sequences does not bubble up. Instead, the OnErrorResumeNext operator simply moves on to the next sequence it can “yield foreach”. A sample will make this clear, but first the signatures:

public static IEnumerable<TSource> OnErrorResumeNext<TSource>(params IEnumerable<TSource>[] sources);
public static IEnumerable<TSource> OnErrorResumeNext<TSource>(this IEnumerable<IEnumerable<TSource>> sources);
public static IEnumerable<TSource> OnErrorResumeNext<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> next);

The following sample prints numbers 1 through 9, with no exception surfacing, even though the third sequence did terminate exceptionally. Replacing the OnErrorResumeNext call with the use of the Concat operator would surface that exception, terminating the resulting sequence after 1 through 7 have been yielded:

EnumerableEx.OnErrorResumeNext(
    new [] { 1, 2 },
    new [] { 3, 4, 5 },
    new [] { 6, 7 }.Concat(EnumerableEx.Throw<int>(new Exception())),
    new [] { 8, 9 }
).Run(Console.WriteLine);

Use of this operator can be useful for batch processing of records where an exceptional return is tolerable.

image

 

Using resources

Just like C#’s and VB’s using statements are related to exceptions due to their “finally”-alike guarantees for cleanup, System.Interactive’s Using operator is used for proper resource cleanup, this time in the face of delayed execution of a sequence. The signature for Using is as follows:

public static IEnumerable<TSource> Using<TSource, TResource>(Func<TResource> resourceSelector, Func<TResource, IEnumerable<TSource>> resourceUsage) where TResource : IDisposable;

The idea is to create a sequence that acquires a resource when its iteration is started (by running resourceSelector), which is subsequently used to provide a data sequence “using the resource” (obtained through resourceUsage). It’s only when the resulting sequence terminates (exceptionally or regularly) that the resource is disposed by calling its Dispose method. To illustrate this, let’s cook up our own Action-based disposable:

class ActionDisposable : IDisposable
{
    private Action _a;

    public ActionDisposable(Action a)
    {
        _a = a;
    }

    public void Dispose()
    {
        _a();
    }
}

Now we can write the following two samples:

EnumerableEx.Using<int, ActionDisposable>(() => new ActionDisposable(() => Console.WriteLine("Gone")), a =>
{
    // Now we could be using a to get data back...
    Console.WriteLine(a is ActionDisposable);
    // ... but let's just return some stock data.
    return new[] { 1, 2, 3 };
})
.Run(Console.WriteLine);

EnumerableEx.Using<int, ActionDisposable>(() => new ActionDisposable(() => Console.WriteLine("Gone")), a =>
{
    // Now we could be using a to get data back...
    Console.WriteLine(a is ActionDisposable);
    // ... which may result in an exception.
    return new[] { 1, 2 }.Concat(EnumerableEx.Throw<int>(new Exception()));
})
.Catch((Exception ex) => new [] { 4, 5, 6 })
.Run(Console.WriteLine);

The first one will nicely obtain the Gone-printing resource when enumeration is triggered by Run, returning values 1, 2 and 3, before Using calls dispose on the resource, causing it to print “Gone”. In the second example, the results produced under the acquired resource scope trigger an exception, so upon leaving Using the resource will be disposed again (printing “Gone”), putting us in the Catch operator’s body as we saw before. Now the output will be 1, 2, Gone, 4, 5, 6. Again, as an exercise, think about the following one (easy, just stressing the point…):

EnumerableEx.Using<int, ActionDisposable>(() => new ActionDisposable(() => Console.WriteLine("Gone")), a =>
{
    // Now we could be using a to get data back...
    Console.WriteLine(a is ActionDisposable);
    // ... but let's just return some stock data.
    return new[] { 1, 2, 3 };
})
.Take(2)
.Run(Console.WriteLine);

image

(Note: break is caused by the consumer’s termination of iteration over the resulting sequence.)

 

Retry till you succeed

A final operator in the exception handling operators category we’re discussing in this post, is Retry. The idea of Retry is to retry enumerating and yielding a sequence till it terminates successfully:

public static IEnumerable<TValue> Retry<TValue>(this IEnumerable<TValue> source);
public static IEnumerable<TValue> Retry<TValue>(this IEnumerable<TValue> source, int retryCount);

Obviously, Retry has no effect if the source sequence iterates without an exception being triggered:

// A no-op.
new [] { 1, 2, 3 }
.Retry()
.Run(Console.WriteLine);

On the other hand, if an exception occurs, a new enumerator over the source sequence is obtained (using GetEnumerator) and iteration is retried. If the exception condition is persistent, this may cause infinite retry:

// Will go forever...
new [] { 1, 2, 3 }.Concat(EnumerableEx.Throw<int>(new Exception()))
.Retry()
.Run(Console.WriteLine);

The overload taking a retryCount can be used to cap the number of retries. If the exception condition is dependent on dynamic factors (e.g. network connectivity to a stream of data), use of Retry will eventually make the iteration succeed:

static int s_count = 0;
static IEnumerable<int> MayGetNumbers()
{
    try
    {
        yield return 4;
        if (s_count == 0)
            throw new Exception();
        yield return 5;
        if (s_count == 1)
            throw new Exception();
        yield return 6;
    }
    finally
    {
        s_count++;
    }
}

The iterator above will make a bit more progress every time it’s called, the first time getting stuck after yielding 4, the second time after yielding 4 and 5, and finally succeed to yield 4, 5 and 6. Using Retry on this one will produce the following result:

// 4, (!), 4, 5, (!), 4, 5, 6
MayGetNumbers()
.Retry()
.Run(Console.WriteLine);

I’ll leave it as an exercise to the reader to come up with a diagram for this operator, introducing a distinction between IEnumerable and IEnumerator, the latter being potentially different for every time the GetEnumerator method is called. It’s because of the potential different enumeration results that Retry has a chance to be effective.

 

Next on More LINQ

Constructor operators, producing (sometimes trivial) sequences.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about the imperative style operators provided on EnumerableEx:

image

 

Laziness and side-effecting iterators

LINQ can be quite deceptive on a first encounter due to the lazy island it provides in an otherwise eagerly evaluated language like C# and Visual Basic. Simply writing down a query doesn’t cause it to be executed, assuming no eager operators like ToArray, ToList or ToDictionary are used. In fact, the composition of sequences lies at the heart of this since sequences can evaluate lazily, on demand when calling MoveNext on an enumerator. Iterators are a simple means to provide such a sequence, potentially capturing a sea of side-effects interleaved with the act of producing (or “yielding”) values.

Let’s start with a quite subtle kind of side-effect, reading from a random number generator:

static Random s_random = new Random();

static IEnumerable<int> GetRandomNumbers(int maxValue)
{
    while (true)
    {
        yield return s_random.Next(maxValue);
    }
}

Every time you execute this, you’ll get to see different numbers. What’s more important in this context though is the fact every yield return point in the code is a place where the iterator suspends till the next call to MoveNext occurs, causing it to run till the next yield return is encountered. In other words, the whole loop is immunized till a consumer comes along. To visualize this a bit more, let’s add some Console.WriteLine output calls as an additional side-effect:

static Random s_random = new Random();

static IEnumerable<int> GetRandomNumbers(int maxValue)
{
    while (true)
    {
        Console.WriteLine("Next");
        yield return s_random.Next(maxValue);
    }
}

The following code fragment illustrates the point in time where the sequence executes:

var res = GetRandomNumbers(100).Take(10);
Console.WriteLine("Before iteration");
foreach (var x in res)
    Console.WriteLine(x);

The result is the following:

Before iteration
Next
16
Next
56
Next
46
Next
58
Next
22
Next
91
Next
77
Next
20
Next
91
Next
92

 

Run, run, run

System.Interactive’s Run operator in EnumerableEx allows execution of the sequence on the spot, in a fashion equivalent to having a foreach-loop. Two overloads exist, one discarding the element consumed from the sequence and another one feeding it in to an Action<T>:

public static void Run<TSource>(this IEnumerable<TSource> source);
public static void Run<TSource>(this IEnumerable<TSource> source, Action<TSource> action);

Rewriting the code above using the second overload will produce similar results:

var res = GetRandomNumbers(100).Take(10);
Console.WriteLine("Before iteration");
res.Run(x => Console.WriteLine(x)); // equivalent to res.Run(Console.WriteLine);

Since Run returns a void, it’s only used for its side-effects, which can be useful from time to time. Previously, a similar affect could be achieved by calling ToArray or ToList, at the cost of burning memory for no good reason. In the above, it wouldn’t even be a viable option in case you simply want to print random numbers ad infinitum, as an infinite sequence would cause the system to run out of memory in a ToArray or ToList context.

Let’s assume for the continuation of this post that GetRandomNumbers doesn’t exhibit a printing side-effect in and of itself:

static IEnumerable<int> GetRandomNumbers(int maxValue)
{
    while (true)
    {
        yield return s_random.Next(maxValue);
    }
}

In this setting, our Run call above effectively adds the side-effect of printing to the screen “from the outside”, at the (consuming) end of the “query”. Using the Do operator, one can inject a side-effect in a lazily evaluated sequence composed of different combinators.

image

 

Adding side-effects using Do

The Do method has the following signature:

public static IEnumerable<TSource> Do<TSource>(this IEnumerable<TSource> source, Action<TSource> action);

Taking in an IEnumerable<T> and producing one, it simply iterates over the source, executing the specified action before yielding the result to the consumer. Other than producing the side-effect during iteration, it doesn’t touch the sequence at all. You can write this operator in a straightforward manner yourself:

static IEnumerable<T> Do<T>(this IEnumerable<T> source, Action<T> action)
{
    foreach (var item in source)
    {
        action(item);
        yield return item;
    }
}

Or you could build it out of other combinator primitives, in particular Select:

static IEnumerable<T> Do<T>(this IEnumerable<T> source, Action<T> action)
{
    return source.Select(item =>
    {
        action(item);
        return item;
    });
}

This is useful primarily for debugging purposes, where you want to “probe” different points of execution in a query. For example, consider the following query expression:

var res = from x in GetRandomNumbers(100).Take(10)
          where x % 2 == 0
          orderby x
          select x + 1;
res.Run(x => Console.WriteLine(x));

Don’t know why it produces the results you’re seeing? Using Do, you can inject “checkpoints”. First, realize the above query desugars into:

var res = GetRandomNumbers(100).Take(10)
          .Where(x => x % 2 == 0)
          .OrderBy(x => x)
          .Select(x => x + 1);

Now we can put Do calls “on the dots” to see the values flowing through the pipeline during consumption of the query result.

var res = GetRandomNumbers(100).Take(10)
          .Do(x => Console.WriteLine("Source  -> {0}", x))
          .Where(x => x % 2 == 0)
          .Do(x => Console.WriteLine("Where   -> {0}", x))
          .OrderBy(x => x)
          .Do(x => Console.WriteLine("OrderBy -> {0}", x))
          .Select(x => x + 1)
          .Do(x => Console.WriteLine("Select  -> {0}", x));

The below shows what’s triggered by the call to Run:

Source  -> 96
Where   -> 96
Source  -> 25
Source  -> 8
Where   -> 8
Source  -> 79
Source  -> 25
Source  -> 3
Source  -> 36
Where   -> 36
Source  -> 51
Source  -> 53
Source  -> 81
OrderBy -> 8
Select  -> 9
9
OrderBy -> 36
Select  -> 37
37
OrderBy -> 96
Select  -> 97
97

For example, 25 produced by the source didn’t survive the Where operator filtering. From the output one can also see that all Where and Source consumption calls precede any OrderBy calls, since the ordering operator eagerly drains its source before carrying out the ordering and passing the results to its consumer.

Looking at the output before the first result, 9, is printed, you can observe the effect of the first MoveNext call on the resulting sequence: the whole source is consulted and fed through the Where operator in order for OrderBy to produce the first (smallest) result. A conceptual diagram illustrating the interception of sequences using Do is shown below:

image

In fact, one can make Do surface through query syntax as well, by providing an extension method overload for e.g. Where (note: this is purely for illustration purposes, and admittedly over-overloading and misusing existing operators :-)):

public static class DoEnumerable
{
    public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Action<T> action)
    {
        return source.Do(action);
    }
}

The resulting usage pattern is the following:

var res = from x in GetRandomNumbers(100).Take(10)
          /*do*/ where Console.WriteLine("Source  -> {0}", x)
          where x % 2 == 0
          /*do*/ where Console.WriteLine("Where   -> {0}", x)
          orderby x
          /*do*/ where Console.WriteLine("OrderBy -> {0}", x)
          select x + 1 into x
          /*do*/ where Console.WriteLine("Select  -> {0}", x)
          select x;

image

 

A lame semi-cooperative scheduler

Let’s first say there’s no good justification (this is the lame part) for doing this sample other than for educational purposes showing use of a sequence purely for its side-effects. The idea of the below is to declare a worker thread with varying priorities for portions of its code. Sure, we could have set thread priorities directly in the code, but the special part of it is feeding back desired priorities to the driver loop (“Start”) of the scheduler that can decide how to implement this prioritization scheme. The cooperative nature is the fact the worker threads yield their run by signaling a new priority, effectively handing over control to the driver loop. I’m calling it semi just because of the following sample implementation relying on preemptive scheduling as provided by the Thread class, though the reader challenge will be to shake off that part.

First of all, work is declared by an iterator that yields priorities followed by the work that will run under that priority. The driver can decide whether or not to call MoveNext, effectively causing the iterator to proceed till the next yield return statement. For example:

static IEnumerable<ThreadPriority> Work1()
{
    int i = 0;
    Action print = () =>
    {
        Console.WriteLine("{0} @ {1} -> {2}", Thread.CurrentThread.ManagedThreadId, Thread.CurrentThread.Priority, i++);
        for (int j = 0; j < 10000000; j++)
            ;
    };
    yield return ThreadPriority.Normal;
    {
        print();
    }
    yield return ThreadPriority.Lowest;
    {
        print();
    }
    yield return ThreadPriority.Normal;
    {
        print();
    }
    yield return ThreadPriority.Highest;
    {
        print();
    }
    yield return ThreadPriority.Highest;
    {
        print();
    }
}

The block-based work item declaration after a yield syntactically groups work items and their priorities. Obviously we fake work to illustrate the point. A driver loop, called Start, can be implemented as lame as relying on the managed Thread type:

static void Start(IEnumerable<ThreadPriority> work)
{
    new Thread(() =>
    {
        work.Do(p => Thread.CurrentThread.Priority = p).Run();
    }).Start();
}

In here, we’re using both Run and Do to respectively run the work and cause the side-effect of adjusting the priority of the thread hosting the work. The reader is invited to cook their own dispatcher with the following signature:

static void Start(params IEnumerable<ThreadPriority>[] workers);

The idea of this one will be to implement a prioritization scheme – just for fun and definitely no profit other than intellectual stimulus – by hand: run all the work on the same thread, with MoveNext calls standing for an uninterruptible quantum. During a MoveNext call, the worker will proceed till the next yield return is encountered, so you may cause an unfair worker to run away and do work forever. This pinpoints the very nature of cooperative scheduling: you need trust in the individual workers. But when you regain control, retrieving the priority for the next work item the worker plans to do, you can make a decision whether you let it go for another quantum (by calling MoveNext) or let another worker from the worker list take a turn (tip: use an ordering operator to select the next worker to get a chance to run). This process continues till all workers have no more work items left, indicated by MoveNext returning false (tip: keep a list of “schedulable” items).

In the scope of this post, the sole reason I showed this sample is because of the use of Do and Run to drive home the point of those operators. Sure, you can achieve the same result (if desired at all) by tweaking the managed thread priority directly in each worker.

 

Next on More LINQ

Dealing with exceptions caused by sequence iteration.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about getting started with System.Interactive, also touching briefly on the deep duality.

 

Where to get it?

To get the Reactive Extensions, which include System.Interactive, visit the landing page on DevLabs over here. Downloads are available for .NET Framework 3.5 SP1, .NET Framework 4.0 Beta 2 and Silverlight 3. In this series, I’ll be using the “desktop CLR” distributions from Visual Studio 2008 and Visual Studio 2010.

The differences between the various distributions are of a technical nature and have to do with backporting certain essentials Rx relies on, to the .NET Framework 3.5 SP1 stack. For instance, the IObservable<T> and IObserver<T> interfaces exist in .NET 4.0 but don’t in .NET 3.5. Similarly, the Task Parallel Library (TPL) is available in .NET 4.0’s System.Threading namespace, while Rx redistributes it to run on .NET 3.5 SP1.

 

What’s in it?

Once you’ve installed, have a look at your Program Files (x86) folder, under Microsoft Reactive Extensions. I’m using the “DesktopV2” version here, which refers to CLR 2.0 and the .NET Framework 3.5 SP1 package. The main difference with the “DesktopV4” version is the presence of System.Threading, which contains the Parallel Extensions that ship in .NET 4.0:

image

A brief introduction to the remaining assemblies:

  • System.CoreEx.dll contains some commonly used types like Action and Func delegates with bigger arities (up to 16 parameters), new Property<T> primitives, a Unit type, an Event type wrapping “object sender, EventArgs e” pairs, a Notification<T> (which will be discussed extensively) and some notions of time in the form of TimeInterval<T> and Timestamped<T>.
  • System.Interactive.dll, the subject of this new series, contains extension methods for IEnumerable<T> and additional LINQ to Objects operators, provided in a type called EnumerableEx.
  • System.Reactive.dll, which is where Rx gets its name for and which will be discussed in future series, is the home for reactive programming tools. It contains IObservable<T> and IObserver<T>, as well as various combinators over it (sometimes referred to as “LINQ to Events”). In addition, it provides primitives like subjects and contains a join library (more about this in a separate installment).

 

Duality? Help!

As we like to use expensive words like “mathematical dual” it makes sense to provide some easy to grasp introduction to the subject. The first thing to look at is the distinction between interactive and reactive programming. In the diagram below, this is illustrated:

image

In the world of interactive programming, the application asks for more information. It pulls data out of a sequence that represents some data source, in particular by calling MoveNext on an enumerator object. The application is quite active in the data retrieval process: besides getting an enumerator (by calling GetEnumerator on an enumerable), it also decides about the pace of the retrieval by calling MoveNext at its own convenience.

In the world of reactive programming, the application is told about more information. Data is pushed to it from a data source by getting called on the OnNext method of an observer object. The application is quite passive in the data retrieval process: apart from subscribing to an observable source, it can’t do anything but reacting to the data pushed to it by means of OnNext calls.

The nice thing about those two worlds is that they’re dual. The highlighted words in the paragraphs above have dual meanings. Because of this observation, it’s desirable to search for dualities on a more formal and technical level as well. In particular, the interfaces being used here are the exact duals of one another: IEnumerable<T> is to IObservable<T> as IEnumerator<T> is to IObserver<T>. Dualization can be achieved by turning inputs (e.g. method parameters) into output (e.g. return values):

image

Lots of dualities exist in various disciplines, providing for great knowledge transfers between different domains. For example, in formal logic, De Morgan’s law allows converting expressions built from conjunctions into ones built from disjunctions, and vice versa. In electronics, similarities exist between the behavior of capacitors and inductances: know one and how to go back and forth between domains, and you know the other. Fourier calculus provides duals between time and frequency domains.

One thing all those have in common is a way to go back and forth between domains. Such a mechanism exists in the world of System.Reactive and System.Interactive as well. Every observable collection can be turned into an enumerable one and vice versa, using operators called ToEnumerable and ToObservable. To get a feel about how those work, imagine an enumerable collection first. The only thing one can do to retrieve its data is enumerate over it. For all the values received, signal them on the resulting observable’s observer. In the opposite direction, you subscribe on an observable collection to receive the values thrown at you and keep them so that the resulting enumerable can fetch them.

In this series, we’ll not look over the garden wall to the reactive world just yet. Instead, we’ll get our hands dirty in the world of System.Interactive, a logical extension to .NET 3.5’s IEnumerable<T> extension methods, known as the Standard Query Operators.

 

Operators overview

The System.Linq.EnumerableEx static class in System.Interactive contains various (extension) methods that operator on IEnumerable<T> enumerable collections. It should be seen as a logical extension to the System.Linq.Enumerable class in System.Core. In the illustration below I’ve summarize the various categories those new operators fall into. Some could be considered to fall in multiple categories, so take this with a grain of salt. Nevertheless, we’ll look at those big buckets in subsequent posts in this series:

  • Imperative use – provides operators that execute a sequence (Run) and inject side-effecting Actions in a chain of query operator calls (Do), which is handy for debugging.
  • Exceptions – enumeration of sequences can cause exceptions (e.g. if you write an iterator, but also by other means – see later), which may need to be handled somehow. Methods like Catch, Finally, Using, OnErrorResumeNext and Retry provide means to make a sequence resilient in face of exceptions.
  • Constructors – instead of creating an iterator yourself, it’s possible to let the system create a sequence on your behalf, e.g. by providing it a generator function (Generate), by composing sequences and elements (Return, StartWith, Throw), or triggering the call of a deferred constructor function when a client start enumerating (Defer).
  • Code = Data – the triplet of OnNext, OnError and OnComplete seen on IObserver<T> is a very code-centric way of signaling various outcomes of data consumption. An alternative view is to treat those outcomes as pieces of data, called notifications (Notification<T>). Using Materialize and Dematerialize, one can transfer back and forth between those two domains.
  • Combinators – producing sequences out of one or more existing sequences is what combinators generally do. One can repeat a sequence a number of times (Repeat), zip two sequences together (Zip), let two sequences battle to provide a result the fastest (Amb), and more. Those operators are most “in line” with what you already know from System.Linq today.
  • Functional – while the imperative and exception categories acknowledge the possibility for sequence to exhibit side-effects, the functional category is meant to tame the side-effects, typically in one-producer-many-consumer scenarios. When a sequence may produce side-effects during iteration, it may be desirable to avoid duplication of those when multiple consumers iterate.
  • Miscellaneous – just that, miscellaneous.

image

Next time, we’ll start by looking at the “Imperative use” category. Download the libraries today and start exploring!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

The CLR’s exception handling facilities provide for protected blocks (“try”) one can associate a handler with. There are four kinds of handlers, and exactly one can be associated with a protected block (but nesting can be used to associate multiple handlers with a block of code):

  • A finally handler is executed whenever the block is exited, regardless of whether this happened by normal control flow or an unhandled exception. C# exposes this using the finally keyword.
  • A type-filtered handler handles an exception of a specified class or any of its subclasses. Better known as a “catch block”, C# provides this through its catch keyword.
  • A user-filtered handler runs user-specified code to determine whether the exception should be ignored, handled by the associated handler, or passed on to the next protected block. C# doesn’t expose this, but Visual Basic does by means of its When keyword.
  • A fault handler is executed if an exception occurs, but not on completion of normal control flow. Neither C# nor Visual Basic provide a fault handler language feature.

In this reader challenge, we’re going to focus on fault handlers. Due to their lack of language surface, their effect is often mimicked by using some local state to determine whether the protected block exited gracefully or not:

bool success = false;
try
{
    // Do stuff
    success = true;
}
finally
{
   if (!success)
   {
       // There was a fault. Do something special.
   }
   // Fault or not; this is what finally does.
}

If an exception happens during “Do stuff”, we end up in the finally block and come to conclude success was never set to true. This indicates an error happened, and we should handle the fault case. However, this technique can get a bit tricky when there are different paths exiting the try block: one could return from the enclosing method in various places, requiring the “success = true” code to be sprinkled around. This is exactly what exception handling was designed for: reducing clutter in your code that has to do with error condition/code tracking. So, we’re defeating that purpose.

Today’s challenge is to create a true fault handler in C#, just for the sake of it. This is merely a brain teaser, encouraging readers to find out what happens behind the scenes of compiled C# code. We won’t be addressing certain concerns like non-local return (the case I mentioned above) but will be hunting for the true “fault” handler treasure hidden deeply in the C# compiler’s IL code emitter. The operational specification is the following:

var f = Fault(() => Console.WriteLine("Okay"),
              () => Console.WriteLine("Fault"));
f();
Console.WriteLine();

var g = Fault(() => { throw new Exception("Oops"); },
              () => Console.WriteLine("Fault"));
try
{
    g();
}
catch (Exception ex)
{
    Console.WriteLine(ex);
}

The above should produce the following output:

Okay

Fault
System.Exception: Oops
   at Program.<Main>b__2()
   (I won’t reveal the secrets here yet…)
   at Program.Main()

Action f illustrates the non-exceptional case where the fault handler is not invoked (a finally handler would get invoked). Action g illustrates the exceptional case where the fault handler gets invoked and the exception bubbles up to the catch-block surrounding its invocation.

It’s strictly forbidden to use local state in Fault (or a method it calls) to track the successful execution of the protected block. Therefore, the below is an invalid solution:

static Action Fault(Action protectedBlock, Action faultHandler)
{
    return () =>
    {
        bool success = false;
        try
        {
            protectedBlock();
            success = true;
        }
        finally
        {
            if (!success)
                faultHandler();
        }
    };
}

Moreover, execution of your Fault method should really use a fault handler as encountered in IL code. It should be a fault handler, not mimic one. In addition, you should not go for a solution where you write a Fault method in ILASM by hand and link it as a netmodule in a C# project, using al.exe:

.class private FaultClosure
{
  .field class [System.Core]System.Action protectedBlock
  .field class [System.Core]System.Action faultHandler

  .method void .ctor()
  {
    ldarg.0
    call instance void [mscorlib]System.Object::.ctor()
    ret
  }

  .method void Do()
  {
    .try
    {
      ldarg.0
      ldfld class [System.Core]System.Action Program/FaultClosure::protectedBlock
      callvirt instance void [System.Core]System.Action::Invoke()
      leave.s END
    }
    fault
    {
      ldarg.0
      ldfld class [System.Core]System.Action Program/FaultClosure::faultHandler
      callvirt instance void [System.Core]System.Action::Invoke()
      endfault
    }
    END: ret
  }
}

.method static class [System.Core]System.Action Fault(class [System.Core]System.Action protectedBlock, class [System.Core]System.Action faultHandler)
{
  .locals init (class Program/FaultClosure V_0)
  newobj void Program/FaultClosure::.ctor()
  stloc.0
  ldloc.0
  ldarg.0
  stfld class
[System.Core]System.Action Program/FaultClosure::protectedBlock
  ldloc.0
  ldarg.1
  stfld class
[System.Core]System.Action Program/FaultClosure::faultHandler
  ldloc.0
  ldftn instance void Program/FaultClosure::Do()
  newobj void [System.Core]System.Action::.ctor(object, native int)
  ret
}

Again, this exercise is just for fun with no profit other than brain stimulation. Hint: what C# 2.0 or later feature may cause a “fault” block to be emitted (i.e. if you ildasm a compiled valid C# application, you can find a “fault” keyword)?

Happy holidays!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts