Iloggable

September 2, 2009
in net, geek
3 min read

Boolean Algebra for Tag queries

Currently working on a tag query engine and couldn't find anything all that useful in the published approaches. I want to do arbitrary boolean algebras against a set of tags in a database, which seems to be out of scope of SQL approaches. All the various tagging schemas out there reduce to either just AND or just OR queries, but not complex logic. However, I want to be able to do something like:

(foo+bar)|(foo+baz)|(bar+^baz)

If there is a way to do this with SQL, i'd love to know. But the way i look at it, i really have to fetch all tags for each item and then do apply that formula to the list of tags on the item.

But let's break down the matching problem itself into something i can execute. Let's assume I've got a simple parser that can turn the above into an AST. Really, i can decompose any variation into three operations, AND(a,b), OR(a,b) and NOT(a). And I can represent those with some simple Func<> definitions:

Func<bool, bool, bool> AND = (a, b) => a && b;
Func<bool, bool, bool> OR = (a, b) => a || b;
Func<bool, bool> NOT = (a) => !a;

Assuming that i have boolean tokens for foo, bar and baz, the expressions becomes:

OR(AND(foo, bar), OR(AND(foo, baz), AND(bar,NOT(baz))))

Now, the above expression can be expressed as a function that takes three booleans describing the presence of the mentioned tags, ignoring any other tags that the item has, returning a boolean indicating a successful match. In C# that expression would look like this:

Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));

Next we need to generate this boolean map from the list of tags on the item. Assuming that the tag list is a list of strings, we can define an extension methods on IEnumerable to generate the boolean map like this:

public static bool[] GetBinaryMap(this IEnumerable<string> tags, string[] mask) {
    var map = new bool[mask.Length];
    foreach(var x in tags) {
        for(var i = 0; i < mask.Length; i++) {
            if(x == mask[i]) {
                map[i] = true;
            }
        }
    }
    return map;
}

And with this we can define a linq query that will return us all matching items:

var mask = new[] { "foo", "bar", "baz"};
Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));
var match = from item in items
            where f(item.Tags.GetBinaryMap(mask))
            select item;

Clearly this is isn't the fastest executing query, since we first had to create our items, each item in which has a collection of tags. But there is a lot of optimizations left on the table here, such as using our tag mask to pre-qualify items, breaking down the AST into sub-matches that could be used against a cache to find items, etc.

But at least we have a fairly simple way to take complex boolean algebra on tags and convert them into something that we can evaluate generically

August 24, 2009
in net
3 min read

Setting up mocks for an inner autofac container

This is definitely an edge case testing scenario, so i don't know how useful this utility class is in general, but i thought it was kinda fun deferred execution stuff, so why not post it?

Here's my scenario. I've built some Dream REST services that i've set up to create an inner Autofac container per request to create per request instances of objects -- things like NHibernate ISession and other disposable resources that only make sense in a per request context.

Now i'm writing my unit tests around these services, and need to provide mocks for these inner container created objects. I should also mention that to test the services, i am firing up a DreamHost, since the services can only be tested in the context of the full pipeline. Yeah, i know, smells a bit functional, but that's what I have to work with right now. And i need these objects to be ContainerScoped (i.e. per inner container singletons), so that multiple Resolve's return the same instance, but still return different instances on multiple requests. Ok, ok, i know the tests are doing too much... Like i said, this is an edge case. It's not strictly a unit test, but i still want coverage on this code. Getting around this would require refactoring of code that's not part of my project, so there you go.

What I want to do is set up the mock for the inner container instance on creation, which doesn't happen until i've handed over execution control to the Act part of the test. This lead me to create a factory that provides a hook for setting up the mock on creation of the mock:

public class DelegatedMoqFactory<T> where T : class
{
 private Action<Mock<T>, IContext> setupCallback;
 private Mock<T> mock;

 public Mock<T> CurrentMock { get { return mock; } }

 public T CreateInstance(IContext container)
 {
  mock = new Mock<T>();
  if (setupCallback != null)
  {
   setupCallback(mock, container);
  }
  return mock.Object;
 }

 public void OnResolve(Action<Mock<T>, IContext> setupCallback)
 {
  this.setupCallback = setupCallback;
 }
}

A sample autofac wire-up looks like this:

builder.RegisterGeneric(typeof(DelegatedMoqFactory<>));
builder.Register(c => c.Resolve<DelegatedMoqFactory<IAuthService>>().CreateInstance(c))
    .ContainerScoped();

With a test setup of the IAuthService being done like this:

container.Resolve<DelegatedMoqFactory<IAuthService>>()
    .OnResolve(m => m.Setup(x =>
        x.AuthenticateAndGetAccountId("authtoken")).Returns(1234);

The open generic of DelegateMoqFactory is registered with default scope, since i want it to exist outside the inner scope, so that i can resolve it to wire up my expectations for the mock. Then on the first access for IAuthService inside the inner scope, the DelegateMoqFactory creates the mock and calls my OnResolve callback to set up the mock.

The reason there is also a CurrentMock accessor is so that I can do verification on the mock after the inner container has gone out of scope, like this:

container.Resolve<DelegatedMoqFactory<IAuthService>>()
    .CurrentMock.Verify(x =>
        x.CreateAuthToken(It.IsAny<IAccount>()), Times.Never());

This class should be useful whenever you are testing some code that internally creates an inner container and scoping the objects usually created under ContainerScope as default scope doesn't work (likely because there's multiple inner containers). We still get per inner container instances, but get to wire them up with deferred setups that don't come into play until the mocks are actually pulled from the inner container.

August 23, 2009
in net
1 min read

Moq rocks

Ok, so i'm not proud of it, but i've been a hold-out on mocking frameworks for a while. With the auto-gen of interfaces that resharper gives me, i'd just gotten pretty fast at rolling my own mocks. Once or twice a year, i'd dip my toe into a mocking framework, find its syntax frustrating and rather than get a good used to it, I'd soon find myself replacing NotImplementedException in a stub for yet another custom Mock object.

My reasoning was that if i can roll my mocks just as fast as wiring up a mock, then why bother with the dependency. And I thought I wasn't really causing myself too much extra work.

In the meantime, I even wrote a simple Arrange/Act/Assert mocking harness for Dream, so i could better test REST services. So, it's not like i didn't believe in the benefits of mocking harnesses.

Well, for the last couple of weeks, I've been using Moq and it's pretty much killed off all my desire to roll my own. I'm generally a huge fan of lambdas and have gotten used thinking in expressions. Although even with that, I wasn't able to get comfortable with the latest Rhino.Mocks. Probably just me. But from the very first attempt, Moq worked like i think and I was up and running.

var mock = new Mock();
mock.Setup(x=> x.Bar).Returns("bar").AtMostOnce();
var foo = mock.Object;
Assert.AreEqual("bar",foo.Bar);

I'm a convert!

August 21, 2009
in net
2 min read

Using TDD to learn new code

When I pick up a new framework or library, there's usually that learning curve where I get familiar with its API, find what works, what doesn't work, etc. One habit I've gotten into is that I create a TestFixture for everything i think i should be able to do and build a test for that assumption. The purpose of these tests is both to make sure the code does what I expect it to, but also to serve as a record of what I've already learned. If i later on wonder how some call would function, i first check my test signatures, to see if i've already tested that behavior. If there is an appropriate test, I immediately know what the behavior will be, plus i now have working sample code or how to do it.

For example, I was playing around with setting up Moq's through Autofac and wanted to come up with a registration that would give me a container scoped Moq object that i could set up before executing a particular test. The resulting test looked like this:

public interface IMockWithAccessor
{
 IMockAccessorValue Accessor { get; }
}
public interface IMockAccessorValue
{
 string Foo { get; }
}

[Test]
public void Create_nested_mock_so_it_can_be_altered_in_container_scope()
{
 var builder = new ContainerBuilder();
 builder.Register(c => new Mock<IMockAccessorValue>())
  .As<Mock<IMockAccessorValue>>().ContainerScoped();
 builder.Register(c => c.Resolve<Mock<IMockAccessorValue>>().Object)
  .As<IMockAccessorValue>().ContainerScoped();
 builder.Register(c =>
 {
  var mockBuilder = new Mock<IMockWithAccessor>();
  mockBuilder.Setup(x => x.Accessor)
                       .Returns(c.Resolve<IMockAccessorValue>());
  return mockBuilder.Object;
 }).As<IMockWithAccessor>().ContainerScoped();

 using (var container = builder.Build().CreateInnerContainer())
 {
  var mockAccessorBuilder = container
                     .Resolve<Mock<IMockAccessorValue>>();
  mockAccessorBuilder.Setup(x => x.Foo).Returns("bar");
  Assert.AreEqual("bar", container
                     .Resolve<IMockWithAccessor>().Accessor.Foo);
 }
}

Sometimes, of course, my expectations are not met and the code does not allow me to do what i set out to do. These test are even more valuable for future reference, as long as i make sure to rename the test to reflect the failed expectation, and alter the asserts to reflect the actual behavior.

I was trying to figure out parametrized component registrations in Autofac. The example showed it being used with FactoryScope. I wondered whether, in default (Singleton) scope, Autofac would use the parameters to create singletons per parameter set. My original test was named Parametrized_resolve_creates_different_singleton_per_parameter_Value. Well, it turned out that, no, autofac does not vary singletons, and parametrized registrations only make sense in FactoryScope. The final test looks like this:

public class ParametrizedSingleton { }
[Test]
public void Parametrized_resolve_without_factory_scope_is_always_a_singleton()
{
 var builder = new ContainerBuilder();
 builder.Register((c, p) => new ParametrizedSingleton());
 using (var container = builder.Build())
 {
  var foo1 = container.Resolve<ParametrizedSingleton>(
                     new NamedParameter("type", "foo"));
  var foo2 = container.Resolve<ParametrizedSingleton>(
                     new NamedParameter("type", "foo"));
  var bar1 = container.Resolve<ParametrizedSingleton>(
                     new NamedParameter("type", "bar"));
  Assert.AreSame(foo1, foo2);
  Assert.AreSame(foo1, bar1);
 }
}

I usually keep these test fixtures in a separate test project in my solution as permanent reference, as i continue to develop the main code. It's proven useful a number of times when coming back to some old code and having to reacquaint myself with a third party bit of code.

June 22, 2009
in net, geek
4 min read

Designing a Delegate Injection Container

In my last post I proposed using delegates instead of interfaces to declare dependencies for injection. While delegates are limited to a single function call, this is often sufficient for service dependencies. In addition, this is not a wholesale replacement for traditional IoC, since at the end of the day, you have to have some class instances that provide the methods bound by the delegates and want to return instances require those delegates, so our container will still need to resolve class instances.

The main benefit of using delegates instead of interfaces is that delegates do not impose any requirements on the providing class, and thereby dependencies can be defined by what the client needs rather than what a service provides.

Mapping Delegate Dependencies

To illustrate this mapping, let's bring back the classes defined in the last post:

public class MessageQueue
{
 public void Enqueue(string recipient, string message) { ... }
 public string TryDequeue(string recipient) { ... }
}

public class Producer : IProducer
{
 public delegate void EnqueueDelegate(string recipient, string message);
 public Producer(EnqueueDelegate dispatcher) { ... }
}

public class Consumer : IConsumer
{
 public delegate string TryDequeueDelegate(string recipient);
 public Consumer(TryDequeueDelegate inbox) { ... }
}

What we need is a way to map a delegate to a method:

EnqueueDelegate => MessageQueue.Enqueue
TryDequeueDelegate => MessageQueue.TryDequeue

Aside from the fact that the above is not a legal C# syntax, the above has the implicit assumption that we can resolve a canonical instance of MessageQueue, since MessageQueue.Enqueue really refers to a method on an instance of MessageQueue. I.e. our container must function like a regular IoC container so we can resolve that instance.

The above solves the scenario of a mapping one delegate to one implementation. In addition, we'd probably want the flexibility to map a particular implementation to a particular client class, such that:

Producer =>
  EnqueueDelegate => MessageQueue.Enqueue
Consumer =>
  TryDequeueDelegate => MessageQueue.TryDequeue

Using Expressions to capture delegate mappings

The usage scenarios described are simple enough to understand. If our container were initialized using strings or some external configuration file (Xml, custom DSL, etc.), the actual reflection required for the injection of mappings isn't too complex either. However, I abhor defining typed mappings without type-safety. This is isn't so much about making mistakes in spelling, etc. that the compiler can't catch. It is mostly about being able to navigate the mappings and the dependencies they describe and the ability to refactor while keeping the mappings in sync.

It would be great if we could just say:

Note: I'm purposely using a syntax that mimics Autofac, since the implementation later on will be done as a hack on top of Autofac

builder.Define<Producer.EnqueueDelegate>.As<MessageQueue.Enqueue>();

That looks fine and could even work in the face of polymorphism, since knowing the signature of Producer.EnqueueDelegate we could reflect the proper overload of MessageQueue.Enqueue. However, C# has no syntax for getting at the MethodInfo of a method via Generics (it is not a type). There isn't even an equivalent to typeof(T) for members, the reason for which was well explained by Eric Lippert in In Foof We Trust: A Dialogue. The only way to get MethodInfo relies on string based reflection.

Fortunately, C#3.0 introduced a syntax that allows us to capture method calls as expression trees that we can decompose programatically. This lets us to express our method call like this:

MessageQueue messageQueue = new MessageQueue();
Expression<Producer.EnqueueDelegate> expr = (a,b) => messageQueue.Enqueue(a,b);

This expression conveniently infers the type of a and b. As a sidenote, Producer.EnqueueDelegate does not mean that EnqueueDelegate is a member of Producer. It's just a syntax artifact of nested declarations in C#, which in this case conveniently makes the delegate look attached to the class.

Unfortunately, there we can't just include MessageQueue in the parameter list of the lambda. If we were to include it in the argument list, it could not be inferred, and if we were to define MessageQueue explicitly as a lambda parameter, then we'd be forced to declare all arguments. We want to express the above and only explicitly define MessageQueue. To accomplish this we need to create a composite expression that previously was told about MessageQueue:

Expression<Func<MessageQueue, Expression<Producer.EnqueueDelegate>>> expr
  = x => (a,b)=> messageQueue.Enqueue(a,b);

Now we have enough syntactic sugar to describe our two registration scenarios in terms of the container builder. First, the global registration of the delegate against an implementation:

builder.Define<Consumer.TryDequeueDelegate>()
  .As<MessageQueue>(x => a => x.TryDequeue(a));
builder.Define<Producer.EnqueueDelegate>()
  .As<MessageQueue>(x => (a, b) => x.Enqueue(a, b));

And alternately, the registration of delegates and their implementation in the context of a particular class:

builder.Register<Consumer>().As<IConsumer>()
  .With<Consumer.TryDequeueDelegate>()
  .As<MessageQueue>(x => a => x.TryDequeue(a));

builder.Register<Producer>().As<IProducer>()
  .With<Producer.EnqueueDelegate>()
  .As<MessageQueue>(x => (a, b) => x.Enqueue(a, b));

Next time, I'll go over the implementation of the above to get it working as an injection framework.

June 20, 2009
in geek
3 min read

Ultimate Interface Segregation: Dependency injection by Delegate

I've been on a bit of a tear about declaring dependency contracts and injecting only what is required. While examining the use of Interfaces in IoC and their shortcomings, I decided that taken to the extreme, dependencies come down to call dependencies, which could be modeled with delegates rather than interfaces. Instead of writing a novel, as I've been prone to, i thought I'd do a shorter post on my approach to this solution, and expand on the implementation in later posts.

To recap, in the SOLID principles, the Interface Segregation Principle states: Clients should not be forced to depend upon interfaces that they do not use. This means that interfaces should be fine-grained enough to expose no more than one responsibility. Taken to the extreme, this could be taken to mean that each interface only has a single method. There are valid SRP scenarios where a responsibility is modeled by more than one call, but let's start with the simplest scenario first, then see how well it applies to more complex responsibilities later.

In C# we have delegates, which describe a single method call. A delegate instance is a reference to a method that encapsulates a specific instance of a class, without exposing the underlying class (unless your delegate is a static method). A delegate can even be used to expose internal, protected and private methods.

Instead of declaring a list of interfaces that the IoC container should inject, classes would define their dependencies as delegates. Taking the example from my duck typing post, we would get the following dependency declarations.

First, we have the same service provider, MessageQueue, which still doesn't need to implement an interface:

public class MessageQueue
{
 public void Enqueue(string recipient, string message) { ... }
 public string TryDequeue(string recipient) { ... }
}

Next, we have the new Producer, now declaring its dependency has a delegate:

public class Producer : IProducer
{
 public delegate void EnqueueDelegate(string recipient, string message);
 public Producer(EnqueueDelegate dispatcher) { ... }
}

And finally, we have the new Consumer, also declaring a delegate for construction time injection:

public class Consumer : IConsumer
{
 public delegate string TryDequeueDelegate(string recipient);
 public Consumer(TryDequeueDelegate inbox) { ... }
}

Think of the delegate as your Method Interface. You could define your dependencies as Func's and Action's, but that would obfuscate your dependencies beyond recognition in most scenarios. By using an explicit delegate, you get to attach the dependency to the class that has the dependency, in addition to having a descriptive signature.

Now, if we were to wire this up manually we'd get something like this:

var queue = new MessageQueue();
IProducer producer = new Producer(queue.Enqueue); 
IConsumer consumer = new Consumer(queue.TryDequeue);

That's simple enough, but not really very scalable, once you get a lot of dependencies to wire up. What we really need is an IoC container that let's us register delegates against classes, instead of having to have instances at dependency declaration time. Delegates can't be cast from one to another and are not, strictly speaking, types, which posts some challenges with creating a type-safe registration interface. There are a number of ways to accomplish this syntax, which I will elaborate on in my next post.

June 18, 2009
in geek
5 min read

C# duck-typing to the rescue for Interface Segregation

Interfaces are the method by which we get multiple inheritance in C#, Java, etc. They are contracts without implementation. We don't get the messy resolution of which code to use from multiple base classes, because there's only one inheritance chain that includes code.

They're useful to let us provide one contract and multiple implementations or simply describe a contract for our code and allow someone else to come along and replace our implementation entirely.

In practice, though, I almost always use them purely for decoupling the contract from the code, so that I can replace the implementation with a mock version in unit tests. Hmm.. So, I use interfaces just to get around the yoke of the type system? Wait, why I am so in love with statically typed languages again?

Right... While the above sounds like the interface exists just so that I can mock my implementation, the real purpose of the interface is the ability for the consuming code to express a contract for its dependencies. It's unfortunate that interface implementation forces them to be attached at the implementation side, which is why I say that Interface attachment is upside down. And it's deep rooted in the language, after all it's called Interface Inheritance not Dependency Contract Definition.

Dynamic Proxy

So, it's not surprising that there is no CLR-level way around this limitation. Fortunately, you can always create a dynamic proxy that wraps your class and implements the Interface. Both Castle's DynamicProxy and LinFu's DynamicProxy are excellent frameworks for writing your own proxies. I've never tested them against each other, but have used both in production and neither showed up as culprits when time for profiling came about.

With a dynamic proxy, you can generate an object that claims to implement an interface but under the hood just has a interceptors that provide the call signature to let you respond correctly or proxy the call on to a class you are wrapping. I've previously covered how you can even use them to have a class inherit from more than one base class via a proxy. This is necessary if you want to remote an object, which requires a base of MarshalByRefObject, but you already have a base class.

However, proxies require a fair bit of hand-rolling so they are not the most lightweight way, development time wise, to attach an Interface.

Duck Typing

What would be really useful would be the ability to cast an object to an interface:

IUser user = new User() as IUser;

The above code would even be compile time verifiable, since we can simply see if the User implements the call signatures promised by IUser. This would be provide us strongly typed Duck Typing -- an object that can quack ought to be able to be treated as a duck.

This is where LinFu goes a step further than just DynamicProxy and provides duck typing as well:

IUser user = DynamicObject(new User()).CreateDuck<IUser>();

DynamicObject's constructor takes an instance of a class to wrap. You can then create a duck from that dynamic object which automatically proxies the given interface and will call the appropriate method on the wrapped class on demand.

Using duck typing to satisfy the Interface Segregation Principle

Saying that you may have a class that has the perfect method signature but doesn't implement an interface you already have, does sound rather contrived. However, forcing a class to implement an interface of your choosing does have some real benefits, aside from being able to abstract an existing class into a mockable dependency:

Clients should not be forced to depend upon interfaces that they do not use

One problem with interfaces is that they tell you everything an implementation can do. And often a class acts as a service that provides functionality to more than one client class, but provides just a single interface. That single interface may expose capabilities that you don't care about.

Instead, interfaces should be fine-grained to only include the methods appropriate to the client. But that's not always feasible. Aside from having a class implement lots of tiny interfaces, the service class does not know about the client's requirements, so it really doesn't know what the interfaces should include. The client, on the other hand, does know and can tailor exactly the right interface it wants as a contract with the dependency.

Suppose we have message queue for passing data between decoupled classes:

public class MessageQueue
{
 public void Enqueue(string recipient, string message) { ... }
 public string TryDequeue(string recipient) { ... }
}

Proper interface segregation would have us create a Dispatcher interface for our message Producer

public interface IMessageDispatcher
{
 void Enqueue(string recipient, string message);
}

public class Producer : IProducer
{
 public Producer(IMessageDispatcher dispatcher) { ... }
}

and an inbox interface for our message Consumer

public interface IMessageBox
{
 string TryDequeue(string recipient);
}

public class Consumer : IConsumer
{
 public Consumer(IMessageBox inbox) { ... }
}

Assuming that MessageQueue does not implement our interfaces (yes, in this case it would not have been a problem to have the class implement them both, but this is a simplified example with obvious segregation lines), we can now configure our IoC container (example uses AutoFac) to create the appropriately configured IProducer and IConsumer, each receiving exactly those capabilities they should depend on:

var queue = new MessageQueue();
var builder = new ContainerBuilder();
builder.Register<Producer>().As<IProducer>();
builder.Register<Consumer>().As<IConsumer>();
builder.Register(c => new DynamicObject(queue).CreateDuck<IMessageDispatcher>()).As<IMessageDispatcher>();
builder.Register(c => new DynamicObject(queue).CreateDuck<IMessageBox>()).As<IMessageBox>();

using (var container = builder.Build())
{
 var producer = container.Resolve<IProducer>();
 var consumer = container.Resolve<IConsumer>();
}

But what about C# 4.0 & Dynamic

While I think Dynamic objects in C# 4.0 are very cool, as of right now, they seem to have skipped over duck typing, at least in a strongly typed fashion.

Sure, once you have a dynamic instance, the compiler will let you call whatever signature you wish on it and defers checking until execution time. But that means we have no contract on it, if used as a dependency, nor can we use it to dynamically create objects that provide implementations for existing contracts. So, you've have to wrap a dynamic object with a proxy, in which case, LinFu's existing duck typing already provides a superior solution.

The lack of casting to an interface, imho was already oversight with C# 3.0, which introduced anonymous classes that are so convenient for Linq projections, but can't be passed out of the scope of the current method, due to a lack of type.

So don't expect C# 4.0 to do anything to let you more easily attach your contracts at the dependency level. For the foreseeable future, this remains the territory of Dynamic Proxy.

Next time: Delegate injection

However, there is another way to deal with dependency injection that provides a fine-grained contract and imposes no proxies nor other requirement on the classes providing the dependency: Injection of the required capabilities as delegates

I've been experimenting with a framework to make this as painless as traditional IoC containers make dependency resolution. It's still a bit rough around the edges, but I hope to have some examples to write about soon.

June 16, 2009
in geek
5 min read

Interfaces put contracts at the wrong end of the dependency

Over the years I've hopped back and forth between static and dynamically typed languages, trying to find the sweet spot. I currently still favor managed, static languages like C# and Java. But I agree that sometimes I have to write a whole lot of code to express my intent to the compiler without any great benefit. And no, i don't think that code-generation is a way out of this.

What's not to like about static?

I won't go over the usual arguments for dynamic, which basically boil down to "you can do what you want without having to explain it to the type system first". I'll stipulate that that is why most people choose dynamic, but it's not a significant a pain point with static for me. I did spend a good many years in dynamic land and switched to static of my own free will. Instead, I want to concentrate on some specific cases.

Fine grained basic types are usually overkill

I generally don't care whether i am dealing with int or long, or int64, or double vs. decimal. For the most part, number would do just fine. I think these types can be useful optimizations for both speed and memory, but certainly something that would be better optimized by a tracing rather than declaratively at compile time. And having to call special converters all over the place to go between these various types, is just not useful. I think type inference can handle these scenarios just fine.

Execution speed is a red-herring

I'm not saying that there aren't areas where speed isn't important enough to drop down to C/C++ levels, but anywhere where you are willing to use a statically typed managed language, a dynamic language can either perform right now or is only a short time away from being performant. After all, i already sacrifice performance to be in managed land, so a little more sacrifice for the development benefits seems arbitrary. Besides, recent javascript optimizations paint a pretty good picture that tracing and JIT compilation can make dynamic code fast enough for most scenarios.

Declaration and Discoverability of Dependencies

So what's the pain point of dynamic for me? I care neither about the locking down of a class to handle only statically defined things, nor about the guarantee that a type is really a particular type. Frankly "types" are not important to me. However, declaration of dependencies in a discoverable fashion is!

What do I mean by that? A class should tell me via a machine discoverable contract what it expects the passed instance to be capable of. If I use a class that has service dependencies at construction time, or instance dependencies at method invocation time, I want to be able to discover this in code, rather than by looking at documentation or going by naming convention. After all, hasn't documentation been deemed a code smell? Why is it then, that in dynamic languages the expected capabilities of the object to be passed is not expressed in a fashion that can be discovered without breaking encapsulation and looking at what the code expects to do with the passed instance?

Sure, dynamic languages pride themselves on not requiring an IDE. This is often held up as a strength and a key reason why they are faster to develop in. In my experience, however, I find dynamic languages faster for small things but as the project grows, my velocity decreases:

I have to memorize more and more code and refer back to docs more, instead of letting the IDE guide me on available signatures
Instead of navigating by concrete signatures, i do lots of string searches
Instead of using long, self-documenting class and method names, I use terse syntax to ease typing and recollection
Instead of symbolic refactoring, I do regex replaces, hoping there isn't a syntax collision between classes that replaces the wrong thing
And handing off code or integrating someone else's code is slower because domain knowledge moves out of the code into tribal knowledge and documentation

Declaring Requirements instead of Capabilities

All the above is not a problem in static languages, but at the cost of inflexible, rigid types. Types are a solution that are a trojan horse of limitations that are completely orthogonal to the problem of dependency discovery. A class requiring an object of type User should have no dependence on the implementation details of User. Having such a dependence would be a clear violation of encapsulation. The class should simply want an instance of an object that has the capabilities of a User, i.e. it has a requirement for an object that exposes certain capabilities. The class should be able to declare a contract for the instance to be passed in.

In C# and similar languages this contract is an Interface. Interfaces allow the declaration of capabilities without an implementation. In interface inheritance, a class commits to providing the contract expressed by the interface. So a class requiring an a specific interface can declare its requirements without any knowledge of implementation. All right, problem solved! Right?

Interfaces as Contracts are upside down

Interfaces unfortunately do not solve the problem, because the way the are attached to implementation inverts the dependence hierarchy. I.e. User implements an interface its author declared, called IUser. Now IUser becomes my dependency, which is still a declaration outside of my control. I should not care where the implementation comes from. But an interface, puts the burden on a third party to implement my interface, which means I cannot use anything pre-existing, since it wouldn't have implemented my interface, or the burden is put on the third party to provide an interface tome to use, which means another third party solving the same problem, provides their own interface.

This may be wonderful for mocking and unit testing, but it still ties me to a contract not of my own making and usually violates the Interface Segregation Principle: Clients should not be forced to depend upon interfaces that they do not use. So interfaces provide a solution, but they still enforce rigidity that has no benefit to the definition of dependency contracts.

Contracts for dependencies

At the end of the day, I have less to quarrel about dynamic vs. static, than tribal definition (naming conventions, documentation, etc.) of dependencies vs. declarative definition of dependencies. Until I can discover what a class expects as its input without being told or cracking open the man page, I will suffer the yoke of interfaces. Especially since I can still use Dynamic Proxies to fake a class implementing an interface -- in yet another "more code than you'd think" way of working, tho.

Are there any static or dynamic languages that have tackled declarative contracts that are not attached at the implementation side that I'm not aware of? It seems like a sweet spot that isn't yet addressed.

Update: I realize i left those wanting a solution to the interface issue in C# wanting. There are two ways to solve the problem that I'm aware of, Duck Typing as offered by LinFu and delegate injection, both of which I will cover in future posts.

June 13, 2009
in geek
2 min read

When using won't Dispose

The using statement/block in C# (not the one used to pull in namespaces) is meant to aid in the IDisposable pattern, i.e. cleaning up resources that won't be handled by garbage collection and to do so in a deterministic fashion. I.e. everything that finalization is not. It really is just syntactic sugar to avoid try/finally all the time. I.e. this

using(var disposable = new DisposableObject())
{
  // do something with disposable
}

is pretty much the same as

var disposable = new Disposable();
try
{
  // do something with disposable
}
finally
{
  disposable.Dispose();
}

But there is a common pitfall with using, err, using. It's in the first line above: The disposable object is created outside the try/finally block! Now, a constructor failing shouldn't ever have allocated disposable resources, so you're usually safe here. But beware if the construction of the Disposable object is a Method.

using(var disposable = CreateDisposable())
{
  // do something with disposable
}

If CreateDisposable() fails after it has created Disposable, you'll end up with a resource leak!

You can easily avoid this by catching failure in your method and cleaning up, but you can't use using for this purpose, since success would return an already disposed instance. A safe implementation of CreateDisposable() looks like this:

public Disposable CreateDisposable()
{
  Disposable disposable = new Disposable();
  try {
    // do some extra initialization of disposable
  }
  catch
  {
    if( disposable != null )
      disposable.Dispose();
    throw;
  }
  return disposable; 
}

IDisposable is an important pattern in .NET, but because it is a pattern rather than a construct enforced by the compiler, it is a common source of "leaks" in .NET. using is very useful for handling the disposable, but it is important to remember that the only code covered by the automatic disposition logic is the code inside the using block, not the code inside the using statement.

June 11, 2009
in net
5 min read

Searching a Tree of Objects with Linq, Revisited

A while back, I wrote about searching through a tree using linq to objects. That post was mostly snippets of code about delegates, lambda's, yield and how it applies to linq -- more a technical exploration than an example. So I thought I'd follow it up with concrete extension methods to make virtually any tree searchable by Linq.

Linq, IEnumerable, yield

All that is required to search a tree with Linq is creating a list of all nodes in the tree. Linq to Objects can operate on IEnumerable. Really, Linq to objects is a way of expressing operations we've been doing forever in loops with if/else blocks. That means there isn't any search magic going on, it is a linear traversal of all elements in a set and examining each to determine whether it matches our search criteria.

To turn a tree into a list of node we need to walk and collect all children of every node. A simple task for a recursive list that carries along a list object to stuff every found node into. But there is a better way, using yield to return each item as it is encountered. Now we don't have to carry along a collection. Iterators using yield implement a pattern in which a method can return more than once. For this reason, a method using yield in C# must return an IEnumerable, so that the caller gets a handle to an object it can traverse the result of the multiple return values.

IEnumerable is basically an unbounded set. This is also the reason why unlike collections, it does not have a Count Property. It is entirely possible for an enumerator to return an infinite series of items.

Together IEnumerable and yield are a perfect match for our problem, i.e. recursively walking a tree of nodes and return an unknown number of nodes.

Two types of Tree Traversal

Depth First

In depth-first traversal, the algorithm will dig continue to dig down a nodes children until it reaches a leaf node (a node without children), before considering the next child of the current parent node.

Breadth First

In breadth-first traversal, the algorithm will return all nodes at a particular depth first before considering the children at the next level. I.e. First return all the nodes from level 1, then all nodes from level 2, etc.

Tree to IEnumerable Extension methods

public static class TreeToEnumerableEx
{
 public static IEnumerable<T> AsDepthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
 {
  yield return head;
  foreach (var node in childrenFunc(head))
  {
   foreach (var child in AsDepthFirstEnumerable(node, childrenFunc))
   {
    yield return child;
   }
  }
 }

 public static IEnumerable<T> AsBreadthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
 {
  yield return head;
  var last = head;
  foreach(var node in AsBreadthFirstEnumerable(head,childrenFunc))
  {
   foreach(var child in childrenFunc(node))
   {
    yield return child;
    last = child;
   }
   if(last.Equals(node)) yield break;
  }
 }
}

This static class provides two extension methods that can be used on any object, as long as it's possible to express a function that returns all children of that object, i.e. the object is a node in some type of tree and has a method or property for accessing a list of its children.

An Example

Let's use a hypothetical Tree model defined by this Node class:

public class Node
{
 private readonly List<Node> children = new List<Node>();

 public Node(int id)
 {
  Id = id;
 }

 public IEnumerable<Node> Children { get { return children; } }

 public Node AddChild(int id)
 {
  var child = new Node(id);
  children.Add(child);
  return child;
 }

 public int Id { get; private set; }
}

Each node simply contains a list of children and has an Id, so that we know what node we're looking at. The AddChild() method is a convenience method so we don't expose the child collection and no node can ever be added as a child twice.

The calling convention for a depth-first collection is:

IEnumerable<Node> = node.AsDepthFirstEnumerable(n => n.Children);

The lambda expression n => n.Children is the function that will return the children of a node. It simply states given n, return the value of the Children property of n. A simple test to verify that our extension works and to show us using the extension in linq looks like this:

[Test]
public void DepthFirst()
{
 // build the tree in depth-first order
 int id = 1;
 var depthFirst = new Node(id);
 var df2 = depthFirst.AddChild(++id);
 var df3 = df2.AddChild(++id);
 var df4 = df2.AddChild(++id);
 var df5 = depthFirst.AddChild(++id);
 var df6 = df5.AddChild(++id);
 var df7 = df5.AddChild(++id);

 // find all nodes in depth-first order and select just the Id of each node
 var IDs = from node in depthFirst.AsDepthFirstEnumerable(x => x.Children)
        select node.Id;

 // confirm that this list of IDs is in depth-first order
 Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}

For breadth-first collections, the calling convention is:

IEnumerable<Node> = node.AsBreadthFirstEnumerable(n => n.Children);

Again, we can test that the extension works like this:

[Test]
public void BreadthFirst()
{
 // build the tree in breadth-first order
 var id = 1;
 var breadthFirst = new Node(id);
 var bf2 = breadthFirst.AddChild(++id);
 var bf3 = breadthFirst.AddChild(++id);
 var bf4 = bf2.AddChild(++id);
 var bf5 = bf2.AddChild(++id);
 var bf6 = bf3.AddChild(++id);
 var bf7 = bf3.AddChild(++id);

 // find all nodes in breadth-first order and select just the Id of each node
 var IDs = from node in breadthFirst.AsBreadthFirstEnumerable(x => x.Children)
       select node.Id;

 // confirm that this list of IDs is in depth-first order
 Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}

Searching Trees

The tree used in the example is of course extremely simple, i.e. it doesn't even have any worthwhile data to query attached to a node. But these extension methods could be used on a node of any kind of tree, allowing the full power of Linq, grouping, aggregation, sorting, projection, etc. to be used on the tree.

As a final note, you may wonder, why bother with depth-first vs. breadth first? After all, in the end we do examine every node! There is however one particular case where the choice of algorithm can be very important: You are looking for one match or a particular number of matches. Since we are using yield, we can terminate the traversal at any time. Using the FirstOrDefault() extension on our Linq expression, the traversal would stop as soon as one match is found. And if have any knowledge where that node might be in the tree, the choice of search algorithm can be a significant performance factor.