Skip to content

Iloggable

Implementing "exports" in C

The other day i was musing about improving readability by liberating verbs from the containing objects that own them. Similar things are accomplished with mix-ins or traits in other languages, but I wanted to go another step further and allow for very context specific mapping, rather than just mixing ambigiously named methods from other objects. Since the exports construct looked a lot like a map of expressions, I decided to see what replicating the behavior with existing C# would look like.

To review, this is what I want (in C#-like pseudo code + exports):

class PageWorkflow {
  UserService _userService exports { FindById => FindUserById };
  PageService _pageService exports {
    FindById        => FindPageById,
    UpdatePage(p,c) => Update(this p,c)
  };
  AuthService _authService exports {
    AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)
  };

  UpdatePage(userid, pageid, content) {
    var user = FindUserById(userid);
    var page = FindPageById(pageid);
    if(UserCanUpdatePage(user,page)) {
      page.Update(content);
    } else {
      throw;
    }
  }
}

And this is C# implementation of the above:

public class PageWorkflow {

    private readonly Func<int, User> FindUserById;
    private readonly Func<int, Page> FindPageById;
    private readonly Action<Page, string> Update;
    private readonly Func<User, Page, bool> UserCanUpdatePage;

    public PageWorkflow(IUserService userService, IPageService pageService, IAuthService authService) {
        FindUserById = (id) => userService.FindById(id);
        FindPageById = (id) => pageService.FindById(id);
        Update = (page, content) => pageService.UpdatePage(page, content);
        UserCanUpdatePage = (user, page) => authService.AuthorizeUserForPage(user, page, Permissions.Write);
    }

    public void UpdatePage(int userid, int pageid, string content) {
        var user = FindUserById(userid);
        var page = FindPageById(pageid);
        if(UserCanUpdatePage(user, page)) {
            Update(page, content);
        } else {
            throw new Exception();
        }
    }
}

As I mentioned, it's all possible, short of the context sensitive extension method on Page. Lacking extension methods, I was going to name the imported method UpdatePage, but since it would be a field, it conflicts with the UpdatePage workflow method despite functionally having different signatures.

All in all, the public workflow UpdatePage is pretty close to what I had wanted, but the explicit type declaration of the each exports makes it boilerplate that is likely not worth the trouble of writing, and, no, i won't even consider code generation.

This exercise along with every other language feature I've dreamt up for Promise does illustrate one thing quite clearly to me: My ideal language should provide programatic access to its parser so that the syntax of the language can be extended by the libraries. Internal DSLs are a nice start with most languages, but often they fall short of being able to reduce default boilerplate and just create new boilerplate. Sure, designing the language to be more succinct is desirable, but if anything, that only covers what the language creators could imagine. Being able to tweak a language into a DSL for the task at hand, a la MPS, seems a lot more flexible.

Is this added flexibility worth the loss of a common set of constructs that is shared by all programmers knowing language X? It certainly could be abused to become incomprehensible, but I would suggest that even knowing language X joining any Team working on a project of sufficient complexity adds its own wealth of implicit patterns and constructs and worse than a language whose compiler is extended, these constructs are communicated via comments and documentation that are not part of language X, i.e. the compiler and IDE lack the ability to aid someone learning these constructs.

Considering this, I really need to do a survey of languages that already offer this capability as well as take a closer look at MPS and Nemerle to see if the language I want is just a few parser rules a way from an existing meta programming language.

Of Workflows, Data and Services

This is yet another in my series of posts musing about what my ideal language would look like. This one is about readability.

Most code I write these days seems to utilize three types of classes: Data, Services and Workflow.

Data Classes

These are generally POCO object hierarchies with fields/accessors but minimal logic for manipulating that data. They should not have any dependencies. If an operation on a data object has a dependency, it's really a service for that data. Data objects don't get mocked/stubbed/faked, since we can just create and populate them.

Service Classes

These are really containers for verbs. The verbs could have been methods on the calling object, but by pulling them into these containers we enable a number of desirable capabilities:

  • re-use -- different workflows can use the same logic without creating their own copy
  • testing -- by faking the service we get greater control over testing different responses from the service
  • dependency abstraction -- there might be a number of other bits of logic that have to be invoked in order to provide the work the verb does but isn't a concern of the workflow
  • organization -- related verbs

Workflow Classes

These end up being classes, primarily because in most OO languages everything's a class, but really workflow classes are organizational constructs used to collected related workflows as procedural execution environments. They can be set up with pre-requisites and promote code re-use via shared private members for common sub-tasks of the workflow. They are also responsible for condition and branching logic to do the actual work.

Actions (requests from users, triggered tasks, etc.) start at some entry point method on a workflow object, such as a REST endpoint, manipulate data via Data objects using services, the results of which trigger paths defined by the workflow.

Same construct, radically different purposes

Let's look how this works out for a fictional content management scenario. I'm using a C#-like pseudo syntax to avoid unecessary noise (ironic, since this post is all about readibility):

class PageWorkflow {
  ...
  UpdatePage(userid, pageid, content) {
    var user = _userService.FindById(userid);
    var page = _pageService.FindById(pageid);
    if(_authService.AuthorizeUserForPage(user,page,Permissions.Write)) {
      _pageService.UpdatePage(page,content);
    } else {
      throw;
    }
  }
}

UpdatePage is part of PageWorkflow, i.e a workflow in our workflow class. It is configured with _userService, _pageService and _authService as our service classes. Finally user and page are instances of our data classes. Nice for maintainability and separation of concerns, but awkward from a readibility perspective. It would be much more readable with syntax like this:

class PageWorkflow {
  ...
  UpdatePage(userid, pageid, content) {
    var user = FindUserById(userid);
    var page = FindPageById(pageid);
    if(UserCanUpdatePage(user,page)) {
      page.Update(content);
    } else {
      throw;
    }
  }
}

Much more like we think of the flow. Of course this could easily be done by creating those methods on PageWorkflow, but that's the beginning of the end of building a god object, and don't even get me started on putting Update on the Page data object.

Importing verbs

So let's assume that this separation of purposes is desirable -- i'm sure there'll be plenty of people who will disagree with that premise, but the premise isn't the topic here. What we really want to do here is alias or import the functionality into our execution context. Something like this:

class PageWorkflow {
  UserService _userService exports { FindById => FindUserById };
  PageService _pageService exports {
    FindById        => FindPageById,
    UpdatePage(p,c) => Update(this p,c)
  };
  AuthService _authService exports {
    AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)
  };
  ...
}

Do not confuse this with VB's or javascript's with keywords. Both import the entirety of the referenced object into the current scope. The much maligned javascript version does this by importing it into the global namespace, which, given the dynamic nature of those objects, makes the variable use completely ambiguous. While VB kept scope ambiguity in check by forcing a . (dot) preceeding the imported object's members, it is a shorthand that is only questionably more readable.

The above construct is closer to the @EXPORT syntax of the perl Exporter module. Except instead of exporting functions, exports exports methods on an instance as methods on the current context. It also extends the export concept in three ways:

Aliasing

Instead of just blindly importing a method from a service class, the exports syntax allows for aliasing. This is useful because imported method likely defered some of its functionality context to the owning class and could collide with other imported methods, e.g. FindById on PageService and UserService.

Argument rewriting

As the methodname is rewritten, the argument order may no longer be appropriate, or we may want to change the argument modifiers, such as turn a method into an extension method.

UpdatePage(p,c) => Update(this p,c)

The above syntax captures arguments into p and c and then aliases the method into the current class' context and turns it into a method attached to p, i.e. the page, so that we can call page.Update(content)

Currying

But why stop at just changing the argument order and modifiers. We're basically defining expressions that translate the calls from one to the other, so why shouldn't we be able to make every argument an expression itself?

AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)

This syntax curries the Permissions.Write argument so that we can define our aliases entrypoint without the last argument and instead name it to convey the write permissions implcitly.

Writing workflows more like we think

Great, some new syntactic sugar. Why bother? Well, most language constructs are some level of syntactic sugar over the raw capabilities of the machine to let us express our intend more clearly. Generally syntactic sugar ought to meet two tests: make code easier to read and more compact to write.

The whole of the import mechanism could easily be accomplished (except maybe for the extension method rewrite) by creating those methods on PageWorkflow and calling the appropriate service members from there. The downside to this approach is that the methods are not differentiated from other methods in the body of PageWorkflow therefore not easily recognizable as aliasing constructs. In addition the setup as wrapper methods is syntactically a lot heavier.

The exports mechanism allows for code to be crafted more closely to how we would talk about accomplishing the task without compromising on the design of the individual pieces or tying their naming and syntax to one particular workflow. It is localized to the definition of the service classes and provides a more concise syntax. In this way it aids the readibility as well as theauthoring of a common task.

When lazy evaluation attacks

I just had a lovely object lesson in lazy evaluation of Iterators. I wanted to have method that would return an enumerator over an encapsulated set after doing some sanity checking:

public IEnumerable<Subscription> Filter(Func<Subscription, bool> filter) {
    if(filter == null) {
        throw new ArgumentNullException("filter","cannot execute with a null filter");
    }
    foreach(var subInfo in _subscriptions.ToArray()) {
        Subscription sub;
        try {
            var subDoc = XDocFactory.LoadFrom(subInfo.Path, MimeType.TEXT_XML);
            sub = new Subscription(subDoc );
            if(filter(sub) {
              continue;
            }
        } catch(Exception e) {
            _log.Warn(string.Format("unable to retrieve subscription for path '{0}'", subInfo.Path), e);
            continue;
        }
        yield return sub;
    }
}

I was testing registering a subscription in the repository with this code:

IEnumerable<Subscription> query;
try {
  query = _repository.Filter(handler);
} catch(ArgumentException e) {
  return;
}
foreach(var sub in query) {
   ...
}

And the test would throw a ArgumentNullException because handler was null. What? But, but i clearly had a try/catch around it! Well, here's where clever bit me. By using yield, the method had turned into an enumerator instead of a method call that returned an enumerable. That means that the method body would get squirreled away into an enumerator closure that would not get executed until the first MoveNext(). And that in turn meant that my sanity check on handler didn't happen at Filter() but at the first iteration of the foreach.

Instead of doing "return an Iterator for subscriptions", I needed to do "check the arguments" and then "return an Iterators for subscriptions" as a separate action. This can be accomplished by factoring the yield into a method called by Filter() instead of being in Filter() itself:

public IEnumerable<Subscription> Filter(Func<Subscription, bool> filter) {
    if(filter == null) {
        throw new ArgumentException("cannot execute with a null filter");
    }
    return BuildSubscriptionEnumerator(Func<Subscription, bool> filter);
}

public IEnumerable<Subscription> BuildSubscriptionEnumerator(Func<Subscription, bool> filter) {
    foreach(var subInfo in _subscriptions.ToArray()) {
        Subscription sub;
        try {
            var subDoc = XDocFactory.LoadFrom(subInfo.Path, MimeType.TEXT_XML);
            sub = new Subscription(subDoc );
            if(filter(sub) {
              continue;
            }
        } catch(Exception e) {
            _log.Warn(string.Format("unable to retrieve subscription for path '{0}'", subInfo.Path), e);
            continue;
        }
        yield return sub;
    }
}

Now the sanity check happens at Filter() call time, while the enumeration of subscription still only occurs as its being iterated over, allowing for additional filtering and Skip/Take additions without having to traverse the entire possible set.

Reflections on #jsconf and #nodeconf by a language geek

This isn't a review of the conferences as much as my impression of the different forces acting upon javascript, the language. Before I start, i should get my bias out of the way, as it likely colors my observations: Like many I came to javascript out of nessessity and seeing a C-like syntax tried to make it fit into a mold it was ill-suited for and much frustration ensued. I've taken the language at face value, and being a fan of expressions and lambdas, have found it to be fun and flexible. That said, it does have some well documented warts and in many ways these warts are what are behind the different forces pulling at the language.

jsconf and nodeconf had significantly different vibes, but where I had expected the difference to be due to server vs. client people, it seemed that the difference was more closely aligned to the relationship the attendees had to javascript. My impression is that jsconf is a community brought together by the common goal of creating amazing experiences in the browser. Some embrace the language as is, others rely on frameworks (or this year's hottness, micro-frameworks) to make them productive, while yet others try to bend the language to their will by using javascript as a compilation target.

Of those using javascript as a compilation target, coffeescript was the clear star, with enough talks using it as their defacto language that got the impression that it was a natively supported language. The next to last #jsconf talk featuring @jashkenas even nullified the B Track entirely and was joined by @brendaneich to talk about JS.Next. The talk covered proposed and accepted changes to javascript, and coffeescript was held up as testbed for fast prototyping and experimentation with possible syntax changes

The final jsconf talk was clearly meant to come off as a Jobsian lead-in to a big reveal. This reveal was traceur, google's transpiler for trying out what google wants JS.Next to look like. I don't know whether it was the relatively stilted presentation style or the fact that it re-hashed a lot of Brendan's presentation, but the crowd lacked enthusiam for both the presentation and the reveal. I personally liked what they were proposing, but I can't say I disagree with one attendee later describing it as having a condescending tone, something like "we're here to rescue you from javascript". Brendan seemed to have read the talk this way as well.

All in all, jsconf clearly seemed to be celebrating the possibilities ahead and the power of the language to be mutated into virtually any form. More than once I overhead someone say that they were sold on coffeescript and would try it for their next project.

The following night was the the nodeconf pre-party. I had the pleasure of talking extensively with @izs (of npm fame) and @mikeal about various topics javascript and node. Being the language geek that I am, I brought up traceur and coffeescript and was quick to realize that this was a different crowd than jsconf: Nodeconf is a community that chose javascript as their primary language, finding it preferable to whatever language they had worked with before. Clearly the node community does not need language changes to enable their productivity.

This impression of a community happy with the state of their chosen tool was re-enforced throughout the next day at nodeconf. One talk on Track A was "Mozilla Person, Secret Talk". When I suggested that it would likely be about Mozilla's efforts to create node on top of spidermonkey one of the guys at our table said that if that was the case, he would have to go and check out Track B. As the Mozilla person turned out to be Brendan, our tablemate did leave. The talk itself was briefly about V8Monkey and SpiderNode, the two abstraction layers Mozilla is building to create a node clone, and largely a re-hash of Mozilla's JS.Next talk. The post talk questions seemed generally uninterested in JS.Next and were mostly different forms of "what do we have to gain from SpiderNode."

Clearly the node community is not beholden to any browser vendor. They've created this new development model out of nothing and are incredibly productive in that environment. The velocity of node and the growth of the npm ecosystem is simply unmatched. What node has already proven is that they don't need rescuing from javascript as it stands. Javascript is working just fine for them, thank you.

I do believe that Javascript is at a cross-roads, and being the only choice available for client-side web development, it is being pulled into a lot of directions at once by everyone wanting to influence it with bits from their favorite language. It is clear that JS.Next is actually going to happen and bring some of the most significant changes the language has seen in an age. I can't say I'm not excited about the proposals in harmonizr and traceur, but I certainly can understand why this looming change is seen as a distraction by those who have mastered the current language. Being more of a server-side guy nodeconf was clearly my favorite of the two conferences and while I had started the week in Portland with the intention of writing my future node projects in coffeescript, I've now decided to stick with plain old javascript. I fear not doing so would only lead me back into my original trap of trying to make the language something it wasn't which in the end would only hurt my own productivity.

HTTP-CQRS: REST+RPC

I started this year with a surprise blogging momentum and it was going really great until i started this post at the beginning of March. I made the mistake of writing a novel on the subject, which just ended up in a meandering draft that completely killed all other writing. Lessons learned: If it takes more than two sessions to write a post, scrap it. So here's a single session redux:

The problem with symmetric data models

REST is wonderful as a query pattern. It easily handles resources and collections of resources and let's you represent hierarchical data models. But for anything than a pure data store, that same pattern is horrible for writes. Posting/putting whole documents at a location comes with complications like what's readonly vs. writable, how are business rules applied, how do you handle partial updates, etc. Yes, it's all possible, but it just imposes lots of ad-hoc and opaque rules on the API.

Before you tell me that you've solved all that, let's just get this clear: Most REST APIs out there either are HTTP-RPC or at least use some RPC in them, but call themselves REST_ful_, cause it's, like, cool. I'm no REST purist, but I'm willing to bet that your solution to these problems almost always involves a couple of RPC style calls in your REST_ful_ APIs, which only proves my point.

Consider a public user API and how to deal with the user's password:

-- POST:/users --
<user>
  <name>bob</name>
  <email>[email protected]</email>
  <password>foo</password>
</user>

-- GET:/users/{id} --
<user id="123">
  <name>bob</name>
  <email>[email protected]</email>
  <password-hash>a2e2f5</password-hash>
</user>

-- PUT:/users/{id} --
???

On the POST, we really want the password twice, otherwise we're just a data store pushing the responsiblity for business logic off on the client. On the GET we certainly don't want to return the password. And finally, how do we even update the password? We'd want it in the document twice, plus the old password. So much for a symmetric resource model.

This same problem occurs with Entity models in ORMs: The query and write data models are treated as symmetric, when in reality what we query for and what we manipulate seldomly follows the same model. Query models usually end up getting simplified (flattened, normalized) to favor updates, and update models contain mutable data inappropriate for a specific action.

Separating queries and commands

On the data manipulation side, the CQRS (Command-query Resposibility Separation) pattern has been gaining favor. In it, data is retrieved via queries that match view models, while commands take only the data affected by the command and the command explicitly reflects the user story it advertises.

Commands are procedural, taking as input only the data they require. That certaintly matches HTTP-RPC: It's not a modified resource being stored, although the contract may imply the manipulation of a resource. This pattern gives far greater freedom to manipulate subsets and supersets of resources than a REST PUT can offer and is a more natural match for how data is manipulated in user stories.

On the query side, we've freed REST from representing models that need to be modifiable via PUT, allowing more complex and denormalized data. Yes, this breaks the REST mantra of canonical location of a resource, but that mantra is largely a reflection of having to have a canonical location for manipulating the data. Once models are query only, denormalization isn't a problem anymore, since the command responsible for modification takes on the responsibility of making sure the denormalized changes are appropriately propagated.

Together the use of HTTP-RPC for write and REST for query, we get HTTP-CQRS. Applying this pattern to that public user API from before, we might deal with the password like this:

-- POST:/commands/users/create --
<user>
  <name>bob</name>
  <email>[email protected]</email>
  <password1>foo</password1>
  <password2>foo</password2>
</user>

-- GET:/query/users/{id} --
<user id="123">
  <name>bob</name>
  <email>[email protected]</email>
  <password-hash>a2e2f5</password-hash>
</user>

-- POST:/commands/users/{id}/changepassword --
<command>
  <old-password>foo</old-password>
  <new-password1>bar</new-password1>
  <new-password2>bar</new-password2>
</command>

While you could go all SOAP-like and just have a /commands endpoint and require the action in the body, using descriptive URIs greatly simplifies API comprehension, imho. By separating query and command reponsibility for web services the API actually becomes more descriptive and opens up a lot of operational patterns that aren feasible or at least not sensible with pure REST_ful_ APIs.

Avoiding Events, or how to wrap an Event with a continuation handle

If there is one language feature of .NET that I've become increasingly apprehensive of it is events. On the surface they seem incredibly useful, letting you observe behavior without the observed object having to know anything about the observer. But the way they are implemented has a number of problems that makes me avoid them whenever possible.

Memory Leaks

The biggest pitfall with events is that they are a common source of "memory leaks". Yes, a managed language can leak memory -- it happens anytime you create an object that is still referenced by an active object and cannot be garbage collected. The nasty bit that usually goes unmentioned is that an event subscription represents an object holding a reference to the observed instance. Not only does this go unmentioned, but Microsoft spent years showing off code samples and doing drag and drop demos of subscribing to events without stressing that you need to also unsubscribe from them again.

Every "memory leak" I've ever dealt with in .NET traced back to some subscription that wasn't released. And tracking this down in a large project is nasty work --taking and comparing memory shapshots to see what objects are sticking around, who subscribes to them and whether they should really still be subscribed. All because the observer affects the ability of the observed to go out of scope, which seems like a violation of the Observer pattern.

Alternatives to Events

Weak Event Pattern

A pattern I've implemented from scratch several times (the side-effect of implementing core features in proprietary code) is the Weak Event pattern, i.e. an event that uses a weak reference as the subscription, so that the observed class isn't pinned in memory by a subscriber.

.NET 4 Microsoft has even formalized this with the WeakEventManager to implement the Weak Event Pattern, although I prefer just overriding the add and remove on an event and using weak references under the hood. While this changes the expected behavior of events and is unexpected in public facing APIs, I consider it the way events should have been implemented in the first place, and use it as default in my non-public facing code.

IObservable

A better way of implementing the Observer pattern is IObservable from the Reactive Framework (Rx). Getting a stream of events pushed at you is a lot more natural for observation and allows for following a number of different behaviors in one observer. It also provides a mechanism for terminating the subscription from the observed end, as well a way deal with exceptions occuring in event generation. For new APIs this is definitely my prefered method of pushing state changes at listeners.

Using a continuation handle to subscribe to a single event invocation

A pattern I encounter frequently are one time events that simply signal a change in state, such as a connection being estatblished or closed. What I really want for these is a callback. I've added methods in the vein of AddConnectedCallback(Action callback), but always feel like their unintuitive constructs born out of my dislike of events, so generally I just end up creating events for these after all.

I could just use a lambda to subscribe to an event an capture the current scope much like the .WhenDone handler of Result, the lambda is anonymous making it impossible to unsubscribe:

xmpp.OnLogin += (sender,args) => {
  xmpp.Send("Hello");
  // but how do I unsubscribe now?
};

The mere fact that lambdas are being shown as convenient ways to subscribe to events without any mention about the reference leaks this introduces just further illustrates how broken both events and their guidance are. Using this closure, simplifies attaching behavior at invocation time and makes sure that unsubscribe is handled cleanly.

Doing a lot of asynchronous programming work with MindTouch DReAM's Result continuation handle (think TPL's Task, but available since .NET 2.0), I decided that being able to subscribe to an event with a result would be ideal. Inspired by Rx's Observable.FromEvent, I created EventClosure, which can be used like this:

EventClosure.Subscribe(h => xmpp.OnLogin += h, h => xmpp.OnLogin -= h)
  .WhenDone(r => xmpp.Send("Hello"));

Unfortunately, like Observable.FromEvent, you have to set up the subscribe and unsubscribe using an Action provided handler, since there isn't a way to pass xmpp.OnLogin as an argument and do it programatically. But at least now the subscribe and unsubscribe are handled in one place and I can concentrate on the logic I want executed at event invocation.

I could have implemented this same pattern using Task, but until async/await ships, Result still has the advantage, aside from continuation via .WhenDone or Blocking via .Block or .Wait, Result also gives me the ability to use a coroutine:

public IEnumerator<IYield> ConnectAndWelcome(Result<Xmpp> result) {
    var xmpp = CreateClient();
    var loginContinuation = EventClosure.Subscribe(h => xmpp.OnLogin += h, h => xmpp.OnLogin -= h);
    xmpp.Connect();
    yield return loginContinuation;
    xmpp.Send("hello");
    result.Return(xmpp);
}

This creates the client, starts the connection and suspends itself until connected, so it can then send a welcome message and return the connected client to its invokee. All this happens asynchronously! The implementation of EventClosure looks like this (and could easily be adapted to use Task instead of Result):

public static class EventClosure {
    public static Result Subscribe(
        Action<EventHandler> subscribe,
        Action<EventHandler> unsubscribe
    ) {
        return Subscribe(subscribe, unsubscribe, new Result());
    }

    public static Result Subscribe(
        Action<EventHandler> subscribe,
        Action<EventHandler> unsubscribe,
        Result result
    ) {
        var closure = new Closure(unsubscribe, result);
        subscribe(closure.Handler);
        return result;
    }

    public static Result<TEventArgs> Subscribe<TEventArgs>(
        Action<EventHandler<TEventArgs>> subscribe,
        Action<EventHandler<TEventArgs>> unsubscribe
    ) where TEventArgs : EventArgs {
        return Subscribe(subscribe, unsubscribe, new Result<TEventArgs>());
    }

    public static Result<TEventArgs> Subscribe<TEventArgs>(
        Action<EventHandler<TEventArgs>> subscribe,
        Action<EventHandler<TEventArgs>> unsubscribe,
        Result<TEventArgs> result
    ) where TEventArgs : EventArgs {
        var closure = new Closure<TEventArgs>(unsubscribe, result);
        subscribe(closure.Handler);
        return result;
    }

    private class Closure {
        private readonly Action<EventHandler> _unsubscribe;
        private readonly Result _result;

        public Closure(Action<EventHandler> unsubscribe, Result result) {
            _unsubscribe = unsubscribe;
            _result = result;
        }

        public void Handler(object sender, EventArgs eventArgs) {
            _unsubscribe(Handler);
            _result.Return();
        }
    }
    private class Closure<TEventArgs> where TEventArgs : EventArgs {
        private readonly Action<EventHandler<TEventArgs>> _unsubscribe;
        private readonly Result<TEventArgs> _result;

        public Closure(Action<EventHandler<TEventArgs>> unsubscribe, Result<TEventArgs> result) {
            _unsubscribe = unsubscribe;
            _result = result;
        }

        public void Handler(object sender, TEventArgs eventArgs) {
            _unsubscribe(Handler);
            _result.Return(eventArgs);
        }
    }
}

While this pattern is limited to single fire events, since Result can only be triggered once, it is a common enough pattern of event usage and one of the cleanest ways to receive that notification asynchronously.

Easily add Pre and Post build tasks to a Visual Studio Solution

One dirty little secret about Visual Studio 2008 and even Visual Studio 2010 is that while MSBuild governs the solution build process, the .sln file is not an MSBuild file. The .*proj files are, but solution isn't. So trying to customize the build on the solution level seemed really annoying.

As I dug around trying to find the Solution level equivalent of the Build Events dialog from Visual Studio, Sayed Ibrahim pointed out that in Visual Studio 2010 there is now a hook to let you inject some before and after tasks, but unfortunately the problem I was trying to solve was the build process for MindTouch DReAM, which is still in Visual Studio 2008.

Approach 1: Generating the solution msbuild .proj

Digging around further, I found out that you could get the MSBuild file that the solution was turned into. By setting the environment variable MSBuildEmitSolution=1 and running MSBuild will write out the generated .proj file.

While this enables you to edit it and add new tasks, it means that your build script will drift out of sync with the solution as it is modified. I initially went down this path, since the build i wanted was very specialized to the distribution build. That let me eliminate 90% of the .proj file and I felt confident that the smaller the .proj, the simpler it would be to keep it in sync with the solution.

Approach 2: Calling the solution from a different build script

But wait, all the solution .proj did was call MSBuild on each of its projects. So if one MSBuild script can call another, why do i even need to use a generated version of the solution? Turns out you don't. You can write a very simple MSbuild script, that in turn calls the .sln, letting MSBuild perform the conversion magic, and you still get your pre and post conditions.

<Project DefaultTargets="Build" ToolsVersion="3.5" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <Target Name="Build">
        <CallTarget Targets="PreBuild"/>
        <CallTarget Targets="Dream"/>
        <CallTarget Targets="PostBuild"/>
    </Target>
    <Target Name="PreBuild">
        <Message Text="Pre Build" />
        ...
    </Target>
    <Target Name="PostBuild">
        <Message Text="Post Build" />
        ...
    </Target>
    <Target Name="Dream" Outputs="@(DreamBuildOutput)">
        <Message Text="Building DReAM" />
        <MSBuild Targets="Rebuild"
                 Projects="src\\MindTouchDream.sln"
                 Properties="Configuration=Signed Release; Platform=Any CPU; BuildingSolutionFile=true;">
            <Output TaskParameter="TargetOutputs" ItemName="DreamBuildOutput" />
        </MSBuild>
        <Message Text="Done building DReAM" />
    </Target>
</Project>

Now that I've implemented this, I am surprised that when I looked for a solution, this didn't come up in google and I hope that this post helps the next person that runs into this issue. The only drawback (which it shares with the first approach) is that this script is only for manual execution. Building from within Visual Studio can't take advantage of it.

Namespaces: Obfuscating Xml for fun and profit

One reason Xml is hated by many is namespaces. While the concept is incredibly useful and powerful, the implementation, imho, is a prime example of over-engineered flexibility: It's so flexible that you can express the same document in a number of radically different ways that are difficult to distinguish with the naked eye. This flexibility then becomes the downfall of many users, as well as simplistic parsers, trying to write XPath rather than walking the tree looking at localnames.

Making namespaces confusing

Conceptually, it seems very useful to be able to specify a namespace for an element so that documents from different authors can be merged without collision and ambiguity. And if this declaration was a simple unique map from prefix to Uri, it would be a useful system. You see a prefix, you know know it has a namespace that was defined somewhere earlier in the document. Ok, it could also be defined in the same node -- that's confusing already.

But that's not how namespaces work. In order to maximize flexibility, there are a number of aspects to namespacing that can make them ambiguous to the eye. Here are what I consider the biggest culprits in muddying the waters of understanding:

Prefix names are NOT significant

Let's start with a common misconception that sets the stage for most comprehension failures that follow, i.e that the prefix of an element has some unique meaning. The below snippets are identical in meaning:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <b>foo</b>
  </xsl:template>
</xsl:stylesheet>

<a:stylesheet version="1.0" xmlns:a="http://www.w3.org/1999/XSL/Transform">
  <a:template match="/">
    <b>foo</b>
  </a:template>
</a:stylesheet>

The prefix is just a short alias for the namespace uri. I chose xsl because there are certain prefixes like xsl, xhtml, dc, etc, that are used consistently with their namespace uri's that a lot of people assume that the name is significant. But it isn't. Someone may give you a document with their favorite prefix and on first look, you'd think the xml is invalid.

Default Namespaces

Paradoxically, default namespaces likely came about to make namespacing easier and encourage their use. If you want your document to not conflict with anything else, it's best to declare a namespace

<my:a xmlns:my="ns1"/>
  <my:b>blah</my:b>
</my:a>

But that's just tedious. I just want to say "assume that everything in my document is in my namespace":

<a xmlns="ns1"/>
  <b>blah</b>
</a>

Beautiful. I love default namespaces!

Ah, but wait, there's more! A default namespace can be declared on any element and governs all its children. Yep, you can override previous defaults and elements at the same hierarchy level could have different namespaces without looking different:

<a xmlns="ns1"/>
  <b xmlns="ns2">
    <c>blah</c>
  </b>
  <b xmlns="ns3">
    <c>blah</c>
  </b>
</a>

Here it looks like we have a with two child elements b, each with an element c. Except not only is the first b really {ns2}b and the seconds b {ns3}b, but even worse, the c elements which have no namespace declaration are also different, i.e. {ns2}c and {ns3}c. This smells of someone being clever. It looks like a feature serving readibility when it does exactly the opposite. Use this in larger documents with some more nesting and the only way you can determine whether and what namespace an element belongs to is to use a parser. And that defeats the human readibility property of Xml.

Attributes do not inherit the default namespace

As if default namespaces didn't provide enough obfuscation power, there is a special exception to them and that's attributes:

<a xmlns="ns1"/>
  <b c="who am i">blah</b>
</a>

So you'd think this is equivalent to:

<x:a xmlns:x="ns1"/>
  <x:b x:c="who am i">blah</x:b>
</x:a>

But you'd be wrong. @c isn't @x:c, it's just @c. It's without namespace. The logic goes like this: Namespaces exist to uniquely identify nodes. Since an attribute is already inside a uniquely identifyable container, the element, it doesn't need a namespace. The only way to get a namespace on an attribute is to use an explicit prefix. Which means that if you wanted @c to have be in the namespace {ns1} , but not force every element to declare the prefix as well, you'd have to write it like this:

<a xmlns="ns1"/>
  <b x:c="who am i" xmlns:x="ns1">blah</b>
</a>

Oh yeah, much more readable. Thanks for that exception to the rule.

Namespace prefixes are not unique

That last example is a perfect segway into the last, oh, my god, seriously?, obfuscation of namespacing: You can declare the same namespace multiple times with different prefixes and, even more confusingly you can define the same prefix with different namespaces.

<x:a xmlns:x="ns1">
  <x:b xmlns:x="ns2">
    <x:c xmlns:x="ns1">you don't say</x:c>
  </x:b>
  <y:b xmlns:y="ns1">
    why would you do this?
  </y:b>
</x:a>

Yes, that is legal AND completely incomprehensible. And yes, people aren't likely to do this on purpose, unless they really are sadists. But I've come across equivalent scenarios where multiple documents were merged together without paying attention to existing namespaces. In fairness, trying to understand existing namespaces on merge is a pain, so it might have been purely done in self-defense. This is the equivalent of spaghetti code and it's enabled by needless flexibility in the namespace system.

XPath needs unambiguous names

So far i've only addressed the ambiguity in authoring and in visually parsing namespaced Xml, which has plenty of painpoints just in itself. But now let's try to find something in one of these documents.

<x:a xmlns:x="ns1">
  <x:b xmlns:x="ns2">
    <x:c xmlns:x="ns1">you don't say</x:c>
  </x:b>
  <y:b xmlns:y="ns1">
    why would you do this?
  </y:b>
</x:a>

Let's get the c element with this xpath:

/x:a/x:b/x:c

But that doesn't return any results. Why not? The main thing to remember with XPath is that, again, prefixes are NOT signficant. That means, just because you see a prefix used in the document doesn't actually mean that XPath can find it by that name. Again, why not? Indeed. After all, the x prefix is defined, so why can't XPath just use that mapping? Well, remember about this example that depending on where you are in the document, x means something different. XPath doesn't work contextually, it needs unique names to match. Internally, XPath needs to be able to convert the element names into fully qualified names before ever looking at the document. That means what it really wants is a qury like this:

/{ns1}a/{ns2}b/{ns1}c

Since namspaces can be used in all sorts of screwy ways to use the same prefixes to mean different things contextually, the prefixes seen in the text representation of the document are useless to XPath. Instead, you need to define manual, unique mappings from prefix to namespace, i.e. you need to provide a unique lookup from prefix to uri. Gee, unique prefix.. Why couldn't the Xml document spec for namespaces have respected that requirement as well.

Namespace peace of mind: Be explicit and unique

The best you can do to keep namespacing nightmares at bay is to follow 2 simple rules for formatting and ingesting Xml:

  1. Only use default namespacing on the root node
  2. Keep your prefixes unique (preferably across all documents you touch)

There, done, ambiquity is gone. Now make sure you normalize every Xml document that passes through your hands by these rules and bathe in the light of transparency. It's easier to read, and you can initialize XPath with that global nametable of yours so that your XPath represenation will match your rendered Xml representation.

Platform specific Pre|PostBuildEvent in .csproj files

Xml configuration files have certainly been vilified, but they do have some lovely qualities, such as easy allowing you to stuff additional data into them without screwing things up. To be on the safe side this should be done with namespaces to avoid DTD validation issues, but often even that isn't necessary. Xml is simply, err--- extensible.

Of course, this makes a big presumption that the consuming end a) doesn't have some inflexible parser that pukes on valid but unexpected xml, and b) doesn't just import the xml into its own internal representation only write out just its known representation on save. If that's how you want to treat your xml data source, do us all a favor and stop using Xml already -- you're only invonveniencing people with angle brackets without letting them reap the benefits they could provide.

Anyway, this seems like a non-sequitor intro but I promise to explain its significance in a little bit. Now, on to the point of this post, that you can write pre- and post-build events in Visual Studio projects to target multiple platforms. This behavior is most welcome when you want to xbuild your code under mono on linux.

When you create a PostBuildEvent in Visual studio to copy some files like this:

copy $(TargetPath) $(TargetDir)MyExecutable.exe

Visual Studio actually emits this block into the .csproj Xml:

<PostBuildEvent>copy $(TargetPath) $(TargetDir)MyExecutable.exe</PostBuildEvent>

Sure, I could set up an alias from cp to copy on linux, but that's a hack sidestepping the real issue: I am likely to want different pre- and post-build behavior between windows and linux. I have to apologize for not recalling who pointed this out -- could have been on the mono-devel list or in the mono-devel irc chat -- but someone told me that Ican put a condition on <PreBuildEvent> and <PostBuildEvent> to control when it is to be executed:

<PostBuildEvent Condition=" '$(OS)' == 'Windows_NT' ">
  copy $(TargetPath) $(TargetDir)MyExecutable.exe
</PostBuildEvent>
<PostBuildEvent Condition=" '$(OS)' != 'Windows_NT' ">
  cp $(TargetPath) $(TargetDir)MyExecutable.exe
</PostBuildEvent>

This does mean I'm manually editing the .csproj, not some of the prettiest Xml around, but it establishes separate post-build steps for windows and not windows. I know it's a simplistic example, but works for the 99% use case of .NET vs. mono build environments.

Now, to resume my diatribe about Xml configuration and applications that use it: Well, the first thing that worried me about this solution was whether Visual Studio would puke once I made those changes and if it didn't puke whether it would clobber them. And I have to report, not a problem, on both accounts. Visual Studio is a good xml configuration file citizen, and only uses the parts it knows and uses the file as its data model, modifying it rather than overwriting it. Yay!

Ok, so there are still delegates you have to define

Yesterday I needed to create a bit of code that should retry an http request on failure and i wanted this to happen in a non-blocking fashion, so a plain loop was out. Using DReAM's Plug, I can easily set up a .WhenDone handler, but I need something to call back recursively. I could have created a new method that calls itself recursively, but inspired by my recent scala reading, i wanted to define the function to execute to exist only in the scope of the current function. The resulting code was this:

public void Restart(string wikiId) {
    Action restarter = null;
    var retry = 0;
    restarter = () => {
        retry++;

        // call the stop endpoint
        _self.At("host", "stop", wikiId)
            .Post(new Result<DreamMessage>()).WhenDone(
                m => {

                    // if the endpoint was unavailable and we've got retries left ...
                    if(m.Status == DreamStatus.ServiceUnavailable && retry <= 3) {

                        // ... sleep for 2 seconds ...
                        Async.Sleep(2.Seconds()).WhenDone(r => {

                            // ... then try again
                            restarter();
                        });
                    }
                },
                e => {
                    // plug's don't throw exceptions, they communicate failure in the resulting message
                }
            );
    };

    // kick the restart attempt off
    restarter();
}

Completely non-blocking, yay. Of course, like most continuation based async code, it has a tendency to walk off the right side of the screen, but that's a different topic.

Discussing this pattern, Steve suggested that instead of explictly calling myself to execute the next retry, the Action should take another action as its argument. By passing continuation in, any kind of continuation could be called, even a no-op that terminates the chain.

Ok, cool, let's just change the signature of restarter to Action<Action>... no wait, it need to be an action that takes and action that takes an action, that.... hrm, you see the problem...

Action<Action<Action<Action<_ad infinitum_>>>>

I needed a recursive delegate definition and that is simply not possible with Func<> and Action<>. I needed something like:

delegate void Recursor(Recursor continuation)

So, unlike my previous statement, there are still scenarios where you have to write a delegate :)