Tuesday, December 29, 2009 12:30 AM bart

More LINQ with System.Interactive – Exploiting the code = data relationship

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about the materialization and dematerialization operators provided by EnumerableEx:

image

 

von Neumann was right

Code and data are very similar animals, much more similar than you may expect them to be. We can approach this observation from two different angles, one being a machine-centric view. Today’s computers are realizations of von Neumann machines where instructions and data are treated on the same footage from a memory storage point of view. While this is very useful, it’s also the source of various security-related headaches such as script or SQL injection and data execution through e.g. stack overruns (Data Execution Prevention is one mitigation).

Another point of view goes back to the foundational nature of programming, in particular the essentials of functional programming, where functions are used to represent data. An example are Church numerals, which are functions that are behaviorally equivalent to natural numbers (realized by repeated application of a function, equal in number to the natural number being represented). This illustrates how something that seems exclusively code-driven can be used to represent or mimic data.

If the above samples seem far-fetched or esoteric, there are a variety of more familiar grounds where the “code as data” paradigm is used or exploited. One such sample is LISP where code and data representation share the same syntactical form and where the technique of quotation can be used to represent a code snippet as data for runtime inspection and/or manipulation. This is nothing other than meta-programming in its earliest form. Today we find exactly the same principle back in C#, and other languages, through expression trees. The core property here is so-called homo-iconicity, where code can be represented as data without having to resort to a different syntax (homo = same; iconic = appearance):

Func<int, int> twiceD = x => x * 2;
Expression<Func<int, int>> twiceE = x => x * 2;

What what does all of this have to do with enumerable sequences? Spot on! The matter is that sequences seem to be a very data-intensive concept, and sure they are. However, the behavior and realization of such sequences, e.g. through iterators, can be very code-intensive as well, to such an extent that we introduced means to deal with exceptions (Catch for instance) and termination (Repeat, restarting after completing). This reveals that it’s useful to deal with all possible states a sequence can go through. Guess what, state is data.

 

The holy trinity of IEnumerator<T> and IObserver<T> states

In all the marble diagrams I’ve shown before, there was a legend consisting of three potential states an enumerable sequence can go through as a result of iteration. Those three states reflect possible responses to a call to MoveNext caused by the consumer of the sequence:

image

In the world of IObserver<T>, the dual to IEnumerator<T> as we saw in earlier episodes, those three states are reflected in the interface definition directly, with three methods:

// Summary:
//     Supports push-style iteration over an observable sequence.
public interface IObserver<T>
{
    // Summary:
    //     Notifies the observer of the end of the sequence.
    void OnCompleted();
    //
    // Summary:
    //     Notifies the observer that an exception has occurred.
    void OnError(Exception exception);
    //
    // Summary:
    //     Notifies the observer of a new value in the sequence.
    void OnNext(T value);
}

Instead of having an observer getting called on any of those three methods, we could equally well record the states “raised” by the observable, which turns calls (code) into object instances (data) of type Notification<T>. This operation is called materialization. Thanks to dualization, the use of Notification<T> can be extended to the world of enumerables as well.

image

Notification<T> is a discriminated union with three notification kinds, reflecting the three states we talked about earlier:

public enum NotificationKind
{
    OnNext = 0,
    OnError = 1,
    OnCompleted = 2,
}

 

It’s a material dual world

Materialization is the act of taking a plain enumerable and turning it into a data-centric view based on Notification<T>. Dematerialization reverses this operation, going back to the code-centric world. Thanks to this back-and-forth ability between the two worlds of code and data, we get the ability to use LINQ over notification sequences and put the result back into the regular familiar IEnumerable<T> world. A figure makes this clear:

image

The power of this lies in the ability to use whatever domain is more convenient to perform operations over a sequence. Maybe you want to do thorough analysis of error conditions, corresponding to the Error notification kind, or maybe it’s more convenient to create a stream of notification objects before turning them into a “regular” sequence of objects that could exhibit certain additional behavior (like error conditions). This is exactly the same as the tricks played in various other fields, like mathematics where one can do Fourier analysis either in the time of frequency domain. Sometimes one is more convenient than the other; all that counts is to know there are reliable ways to go back and forth.

image

(Note: For the Queryable sample, you may want to end up in the bottom-right corner, so the AsQueryable call is often omitted.)

 

Materialize and Dematerialize

What remains to be said in this post are the signatures of the operators and a few samples. First, the signatures:

public static IEnumerable<Notification<TSource>> Materialize<TSource>(this IEnumerable<TSource> source);
public static IEnumerable<TSource> Dematerialize<TSource>(this IEnumerable<Notification<TSource>> source);

An example of materialization is shown below, where we take a simple range generator to materialize. We expect to see OnNext notifications for all the numeric values emitted, terminated by a single OnCompleted call:

Enumerable.Range(1, 10)
.Materialize()
.Run(Console.WriteLine);

This prints:

OnNext(1)
OnNext(2)
OnNext(3)
OnNext(4)
OnNext(5)
OnNext(6)
OnNext(7)
OnNext(8)
OnNext(9)
OnNext(10)
OnCompleted()

A sample where an exception is triggered by the enumerator is shown below. Notice the code won’t blow up when enumerating over the materialized sequence: the exception is materialized as a passive exception object instance in an error notification.

Enumerable.Range(1, 10).Concat(EnumerableEx.Throw<int>(new Exception()))
.Materialize()
.Run(Console.WriteLine);

The result is as follows:

OnNext(1)
OnNext(2)
OnNext(3)
OnNext(4)
OnNext(5)
OnNext(6)
OnNext(7)
OnNext(8)
OnNext(9)
OnNext(10)
OnError(System.Exception)

Starting from a plain IEnumerable<T> the grammar of notifications to be expected is as follows:

OnNext* ( OnCompleted | OnError )?

In the other direction, starting from the world of IEnumerable<Notification<T>> one can write a different richer set of sequence defined by the following grammar:

( OnNext | OnCompleted | OnError )*

For example:

var ns = new Notification<int>[] {
    new Notification<int>.OnNext(1),
    new Notification<int>.OnNext(2),
    new Notification<int>.OnCompleted(),
    new Notification<int>.OnNext(3),
    new Notification<int>.OnNext(4),
    new Notification<int>.OnError(new Exception()),
    new Notification<int>.OnNext(5),
};

Dematerializing this sequence of notifications will produce an enumerable sequence that will run no further than the first OnCompleted or OnError:

ns
.Dematerialize()
.Run(Console.WriteLine);

This prints 1 and 2 and then terminates. The reason this can still be useful is to create a stream of notifications that will be pre-filtered before doing any dematerialization operation on it. For example, a series of batches could be represented in the following grammar:

( OnNext* OnCompleted )*

If the user requests to run n batches, the first n – 1 OnCompleted notifications can be filtered out using some LINQ query expression, before doing dematerialization.

Finally, a sample of some error-filtering code going back and forth between IEnumerable<T> and IEnumerable<Notification<T>> showing practical use for those operators when doing sophisticated error handling:

var xs1 = new[] { 1, 2 }.Concat(EnumerableEx.Throw<int>(new InvalidOperationException()));
var xs2 = new[] { 3, 4 }.Concat(EnumerableEx.Throw<int>(new ArgumentException()));
var xs3 = new[] { 5, 6 }.Concat(EnumerableEx.Throw<int>(new OutOfMemoryException()));
var xs4 = new[] { 7, 8 }.Concat(EnumerableEx.Throw<int>(new ArgumentException()));

var xss = new[] { xs1, xs2, xs3, xs4 };
var xns = xss.Select(xs => xs.Materialize()).Concat();

var res = from xn in xns
          let isError = xn.Kind == NotificationKind.OnError
          let exception = isError ? ((Notification<int>.OnError)xn).Exception : null
          where !isError || exception is OutOfMemoryException
          select xn;

res.Dematerialize().Run(Console.WriteLine);

Given some input sequences, we materialize and concatenate all of them into sequence xns. Now we write a LINQ query over the notifications to filter out exceptions, unless the exception is a critical OOM one (you could add others to this list). The result is we see 1 through 6 being printed to the screen. (Question: What’s the relationship to OnErrorResumeNext that we saw in the previous post? What’s similar, what’s different?)

 

Exercises

As an exercise, try to implement the following operators in a notification-oriented manner:

  1. Catch
    (tip: use SelectMany and lots of conditional BLOCKED EXPRESSION
  2. Finally
    (tip: use SelectMany and Defer)
  3. OnErrorResumeNext – overload taking two IEnumerable<TSource> sequences
    (tip: use TakeWhile)
  4. Retry – overload with a retry count
    (tip: recursion, ignore stack overflow conditions)

The skeleton code for those operators is shown below:

return
    source
.Materialize()
// Your stuff here
.Dematerialize();

All-inclusive unit test:

    new[] { 1, 2 }
        .Finally(() => Console.WriteLine("Finally inner"))
    .Concat(EnumerableEx.Throw<int>(new InvalidOperationException()))
.Catch((InvalidOperationException _) => new[] { 3, 4 }.Concat(EnumerableEx.Throw<int>(new Exception())))
.Finally(() => Console.WriteLine("Finally outer"))
.OnErrorResumeNext(new[] { 5, 6 })
.Concat(EnumerableEx.Throw<int>(new ArgumentException()))
.Retry(2)
.Run(Console.WriteLine);

This should produce the same results with the built-in operators and with your implementation of those operators. More specifically, the result has to be:

1
2
Finally inner
3
4
Finally outer
5
6
1
2
Finally inner
3
4
Finally outer
5
6

with no exception leaking to the surface in the call site (behavior of Retry after the retry count has been exceeded).

 

Next on More LINQ

Various combinators to combine or transform existing observable sources into others.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Twitted by thecryptic

Tuesday, December 29, 2009 3:49 AM by Twitted by thecryptic

Pingback from  Twitted by thecryptic

# Dew Drop &#8211; December 29, 2009 | Alvin Ashcraft&#039;s Morning Dew

Pingback from  Dew Drop &#8211; December 29, 2009 | Alvin Ashcraft&#039;s Morning Dew

# re: More LINQ with System.Interactive – Exploiting the code = data relationship

Tuesday, December 29, 2009 9:55 AM by Anthony

Bart,

Keep up the good work, since you single handedly have made me a better developer!!

Have you ever thought of writing books?

Anthony

# re: More LINQ with System.Interactive – Exploiting the code = data relationship

Tuesday, December 29, 2009 11:22 AM by Omer Mor

Hi - I love this series.

BTW - your rss/atom feeds are not showing the images correctly: they're referencing the pictures on bartdesmet.net instead on bartdesmet.info where the web page is referencing them. Try reading your blog with a feed reader, and see for yourself.

# The Morning Brew - Chris Alcock &raquo; The Morning Brew #507

Wednesday, December 30, 2009 3:15 AM by The Morning Brew - Chris Alcock » The Morning Brew #507

Pingback from  The Morning Brew - Chris Alcock  &raquo; The Morning Brew #507

# Больше про LINQ с System.Interactive

Wednesday, December 30, 2009 10:21 AM by progg.ru

Thank you for submitting this cool story - Trackback from progg.ru

# re: More LINQ with System.Interactive – Exploiting the code = data relationship

Thursday, December 31, 2009 4:52 AM by Jason

Great series. Is your final .Concat() new too, or just something I missed?

Appears to have signature:

public static IEnumerable<T> Concat(this IEnumerable<IEnumerable<T>> sequences);

# re: More LINQ with System.Interactive – Exploiting the code = data relationship

Thursday, December 31, 2009 11:23 AM by bart

Hi Jason,

This Concat overload is new too. See my subsequent post on the combinators for an explanation. In particular, Concat, Merge and Amb all have the same three overloads (IE<IE<T>>, IE<T>[] with params and one with IE<T> twice).

Thanks,

-Bart

# Reactive Extensions for .NET (Rx) &laquo; Just Justin&#039;s

Saturday, February 06, 2010 3:45 AM by Reactive Extensions for .NET (Rx) « Just Justin's

Pingback from  Reactive Extensions for .NET (Rx)  &laquo; Just Justin&#039;s

# Homoiconic &#8211; or code is data and data is code &laquo; Life, programming etc.

Pingback from  Homoiconic &#8211; or code is data and data is code &laquo; Life, programming etc.

# Homoiconic &#8211; or code is data and data is code &laquo; Life, programming etc.

Pingback from  Homoiconic &#8211; or code is data and data is code &laquo; Life, programming etc.

# Reactive Extensions for .NET ( “stuff happens” )

Wednesday, August 18, 2010 7:18 AM by Mike Taulty's Blog

I’ve been taking a look at the Reactive Extensions for .NET. It’s early days for me at this point but

# Right-hand side Enumerable.Zip

Monday, June 13, 2011 2:46 PM by Chasing state of the art

With Reactive Extensions for .NET (Rx) and .NET Framework 4 a new LINQ operator was introduced – Zip ( Bart De Smet gives excellent explanation about the idea and implementation details behind Zip operator ). In a nutshell it merges two sequences into