scala

April 15, 2013
in geek, scala
7 min read

List comprehension over non-enumerables in C

As I was trying to finish up my C# implementation of Try[A] for Scando, I ran into some LINQ issues. Or at least I thought I did. Using the LINQ syntax kept failing to call my extension methods and instead fell back to the IEnumerable<T> ones. I thought to myself how unfortunate it was that the LINQ syntax was hard wired to IEnumerable<T>, otherwise it would be perfect to emulate Scala's for comprehensions.

Fortunately I had just watched Runar Bjarnason's excellent "Lambda: The Ultimate Dependency Injection Framework" and recalled usage of a container type similar to how Option<T> and Try<T> are used. I also recalled that projecting such a type with a function that also returned a new instance of that container type via map (roughly equivalent to Select) created nested containers, yet I had a test for Option<T> that looked like this:

[Test]
public void Can_chain_somes() {
 var r = Api.DoSomething()
    .Select(Api.Passthrough)
    .Select(Api.Passthrough)
    .GetOrElse("bar");
 Assert.AreEqual("foo", r);
}

This should have returned an Option<Option<Option<string>>> but instead just returned Option<string>. Obviously my implementation of Select was wrong, which would explain why LINQ syntax didn't use it. And that probably meant that my assumptions about IEnumerable were wrong as well. Runar solved the nested container problem by implementing flatMap which also enabled for comprehensions in Scala. For me that meant that I was also missing SelectMany to make LINQ work properly.

Scala For Comprehension

Before I get into LINQ, I should cover why wrapping a value in a single or no value collection is a useful thing in the first place. Let's look at an example of functions returning Option[A]:

for {
  user <- UserRepository.findById(1) if user.isActive
  profile <- ProfileRepository.getProfileByUser(user)
  url <- profile.url
} yield url

This will get a single Option[User], check that the user is active, fetch the user's profile and then return the profile url. What makes this code special is that we don't have to check that the previous call returned a value. If UserRepository returned None, then the if guard never gets executed (since it gets executed for all values in the "Collection") and likewise ProfileRepository is never called since no user was selected.

With for comprehensions and the Option container we can chain calls and have it handle the short circuiting along the way so we will always receive an Option of the yielded type without checking return values along the way.

LINQ and IEnumerable

Ok, but that's subverting a list processing construct for its behavior on empty collections and Scala's excellent type inference deals with types changing throughout the comprehensions. Can LINQ pull that off?

Turns out, yep, it can. The LINQ syntactic sugar doesn't have any ties to operating on IEnumerable or any type at all. To mimic the Scala for comprehension above, we just need to implement 3 methods.

Select: Transforming C to C

For illustration purposes, I've create a container Type C<T> with an accessor Value. It does not implement IEnumerable<T> as Option<T> originally did (which I've since removed since the only reason had been that I thought I needed it for LINQ).

Let's start with the simplest comprehension style LINQ query:

from x in new C<int>(1) select x

At this point the compiler complains with Could not find an implementation of the query pattern for source type 'Sandbox.C'. 'Select' not found. Simple enough, let's add Select to C<T>:

public C<TResult> Select<TResult>(Func<T, TResult> selector) {
  return new C<TResult>(selector(_value));
}

Note: LINQ does not care whether its methods are provided as methods on the object or extensions. For this example I'm just attaching it straight on C<T> instead of using an Extension Method, so I don't need the this C<T> source argument.

Now that query works and we are getting back a C<T>, not an IEnumerable<T>. That means that we can call .Value directly on the result of the query.

SelectMany: Flatting C>

Next up is passing the result from the first as input to the next call:

from x in new C<int>(1)
from y in new C<int>(x)
select y

Again, it won't compile. Now it complains about SelectMany. Since there are multiple overloads of SelectMany, let's start with the simplest, i.e. a selector to transform T into C<TResult>:

public C<TResult> SelectMany<TResult>(Func<T, C<TResult>> selector ) {
   return selector(Value);
}

Except that just leads to a new compiler error: No overload for method 'SelectMany' takes 2 arguments. To get a closer look at what the compiler wants to do, we can just rewrite the query using int[] instead of C<T>:

from x in new[] {1}
from y in new[] {x}
select x;

Then, using IlSpy and turning Query decompilation off, we can see what .NET actually translates that LINQ to:

new int[] {1}.SelectMany(x => new int[] {x}, (x, y) => y);

That's interesting. Instead of

SelectMany<TSource, TResult>(
  this IEnumerable<TSource> source,
  Func<TSource,IEnumerable<TResult>> selector
)

It's using

SelectMany<TSource, TCollection, TResult>(
  this IEnumerable<TSource> source,
  Func<TSource,IEnumerable<TCollection>> collectionSelector,
  Func<TSource,TCollection,TResult> resultSelector
)

but passes in a resultCollector that merely ignores the first argument and always returns the second, making it functionally equivalent of the first signature. Here's what that looks like as a Method on C<T>:

public C<TResult> SelectMany<TCollection, TResult>(
    Func<T, C<TCollection>> collectionSelector, 
    Func<T, TCollection, TResult> resultCollector
) {
  var c = collectionSelector(Value);
  return new C<TResult>(resultCollector(Value, c.Value));
}

And now the LINQ query with multiple from clauses works.

Where: What does a short circuit of C look like?

Finally, we want to add the guard clause, which in LINQ is simply Where:

from x in new C<T>(1) where x == 1
select x

with Where having the following signature

C<T> Where(Func<T, bool> predicate) { ... }

Implementing the guard poses a problem, though. Where will always return a C<T>, but what does a C<T> that failed the predicate check look like? Which also raises the question, what value does the from retrieve when encountering such a C<T>? If the point was that comprehension lets us short circuit the execution chain, then we need a way to indicate that C<T> doesn't have a value.

For this implementation, I've simply added an IsEmpty flag and use the no argument constructor to construct an empty version. While that takes care of implementing Where, it does point to an oversight in Select and SelectMany. Both need to be aware of IsEmpty and return an empty C<T> rather than executing the selectors in order to provide short-circuiting behavior. The full implementation of C<T> now looks like this:

public class C<T> {

  public C() { IsEmpty = true; }
  public C(T value) { Value = value; }

  public T Value { get; private set; }
  public bool IsEmpty { get; private set; }

  public C<T> Where(Func<T, bool> predicate) {
    if(IsEmpty) {
      return this;
    }
    return predicate(Value) ? this : new C<T>();
  }

  public C<TResult> Select<TResult>(Func<T, TResult> selector) {
    return IsEmpty ? new C<TResult>() : new C<TResult>(selector(Value));
  }

  public C<TResult> SelectMany<TCollection, TResult>(
    Func<T, C<TCollection>> collectionSelector,
    Func<T, TCollection, TResult> resultCollector
  ) {
    if(IsEmpty) {
      return new C<TResult>();
    }
    var c = collectionSelector(Value);
    return c.IsEmpty ? new C<TResult>() : new C<TResult>(resultCollector(Value, c.Value));
  }
}

That still does not answer how from x in a works since that clearly pulls a value from the container. You would expect an enumerator to be required. The key to understanding how this construct works is that the transformation of the call becomes a Select or SelectMany, which has no concept of enumeration. Both have a selector that takes T and converts it into the containerized T, C<T>. Since traditionally the container is IEnumerable, the natural association for x in a is an enumeration, but it really is just an instance of the container with no assumption about what it contains.

Applying LINQ comprehensions

Now that we know how to create a container type, we can create return values for our methods that follow those principles and rewrite the Scala example in C#:

from user in UserRepository.FindById(1) where user.isActive
from profile in ProfileRepository.GetProfileByUser(user)
from url in profile.url
select url

Essentially, C<T> is just a naive re-implementation of Option<T>. The reason I created it instead of just walking through the existing type was to illustrate this container type pattern. It's the pattern Option<T> uses, as does Try<T> and it is used it to implement the Reader Monad for functional dependency injection. The essence of this pattern is that by wrapping T in a container we can attach an environment to it to carry along additional data. In the case of Option, it was the ability to determine the presence or lack of a value for T. In case of Try it is presence of an error occurring while trying to produce T. and for functional Dependency Injection it was the ability to carry along dependencies that a call at the end of a long chain needed without the intermediate calls having to be aware of them. By using the container, functions that care about the environment can take it as input or return it, while functions that don't can be lifted to fit the required shape and take part in the composition without ever needing to be aware of the environment being passed along.

Having used for comprehensions in Scala and seeing how it can remove a lot of imperative boiler plate that does nothing to aid in the comprehension of the workflow, it is nice to see that it requires just three methods to allow LINQ to be used for a similarly expressive syntax for composing method chains.

March 30, 2013
in geek, scala
3 min read

Scala in the key of C#: Option

Update: Since I first released Option, I've realized that it implementing IEnumerable was unneccesary to use LINQ syntax and actually made it less useful. I've updated the post to reflect these changes to Option.

Currently reading Daniel Westheide's excellent "Neophyte's guide Scala" series and it's inspired me to port some concepts from Scala to C#. This is less about trying to replace Scala -- it has too many compelling aspects that cannot be mimicked -- and more about exploring concepts I come across in Scala in the realm i'm best versed in, and, hopefully, along the way, create some useful C# utilities. First up, the Option<T> type.

Get thee behind me, null

Reading The Option Type the simplicity and elegance struck me. My Option<T> is inspired by Scala's Option[A], which in turn has its origin in Haskell's Option. On the surface it's a pretty simple container for a value that may or may not exist.

var some = Option<string>.Some("foo");

// manually checking for defined
Console.WriteLine(some.IsDefined ? some.Value : "not defined");
// => "foo"

// or use the built in helper
Console.WriteLeing(some.GetOrElse("not defined"));
// => "foo"

// None is a singleton
var none = Option<string>.None;

Console.WriteLine(none.GetOrElse("not defined"));
// => "not defined"

Nobody likes NullReferenceException, so Option makes reference types behave more like nullable value types. But with the null coalescing operator ??, trying to check for null and substituting an alternative value is already very simple in .NET, so why bother with an Option port? After all, the most compelling usage of Option in scala, pattern matching, just cannot be reproduced in .NET, so there goes a large part of the motivation. But once I learned that Option[A] behaves like an Iterable and therefore could use all the common collection methods, I was intrigued.

The Power of LINQ compels you

You see, an Option is can be considered a collection of zero or one values. By implementing the Select, SelectMany and Where Extension Methods for Option we can use LINQ to chain calls together. This makes Option much more composable than manual null checks.

var title = (from url in ParseUrl(urlStr)
             from response in GetWebRequest(url)
             from doc in GetXmlBody(response)
             from title in GetTitleTag(doc)
             select title);
if(title.IsDefined) {
  Console.WriteLine(title.Value);
}

Each method returns an Option, for which the compiler insert the appropriate extension method for the specific LINQ syntax. An undefined Option behaves just like and empty IEnumerable, i.e. the selector callback is skipped and the query is short circuited. Using the from x in a syntax uses SelectMany (if there is more than one from clause) and could just as easily have been written by manually chaining the calls:

var title = ParseUrl(urlStr)
            .SelectMany(GetWebRequest)
            .SelectMany(GetXmlBody)
            .SelectMany(GetTitleTag);
Console.WriteLine(title.GetOrElse("no title found"));

What Else can Option do for Me?

Option implements equality comparison based on the contained type, so two Options of the same type containing the same value will be equal, according to that type's rules. Along with that Option<T>.None equals null and all None regardless of type have the same hashcode, i.e. 0.

Finally, in addition to GetOrElse, there is also OrElse, which provides a way to substitute a different Option, allowing the chaining of many calls to get the first one that returns a value.

Scando

Option is available on github under the MIT license in the Scando repository. This repository is where I plan to put all code inspired by Scala. There's already an implementation of the Try type in there as well, which I will be writing about shortly.