As usual, I've been blogging over on the MindTouch Developer blog, and since the topics i post about over there have a pretty strong overlap with what I'd post here, I figured i might as well start cross-posting about it here.
Aside from various technical posts, Steve Bjork and I have started recording a Podcast about concurrent programming. It's currently 2 episodes strong, with a third one coming soon. Information on past and future posts can always be found here.
Today's post on the MindTouch dev blog is about the producer/consumer pattern and how i moved from using dedicated workers with a blocking queue to using Dream's new ElasticThreaPool to dispatch work.
This week, i was digging back into the Coco/R implemented parser of DekiScript, tracking down a bug, which turned out to not be in the parsing bits at all. It did, however, get me familiarized with Coco/R again. So I thought i'd give myself an hour to implement the parser for my Tag related boolean algebra with Coco/R. If i could pull it off, forget about the regex/state-machine approach i was considering.
Took me about 15 minutes to set up the classes to represent the intermediate AST and another 30 minutes for the grammar in Coco/R ATG format. After that I wrote a couple of unit tests to check that the parsing was right, only to realize that while AND and OR are left-to-right associative, NOT is right-to-left associative. Figuring out how to adjust the grammar for that actually took me another 10-15 minutes. But overall, I hit the one hour goal.
Before tackling the grammar to parse, I needed to define data structures to represent the parsed syntax tree, which I'd later convert to executable code. The syntax is fairly simple:
(foo+bar)|(^foo+baz)
This can be represented by just 4 tree node types (with common parent TagExpression:
AndExpression
OrExpression
NotExpression
Tag
The parentheses are just tree structure artifacts, so are not represented in the AST.
I've broken the grammar up to discuss the various parts, but the below sections represent a single ATG file, the Coco/R grammar format.
COMPILER TagAlgebra
public TagExpression Result = TagExpression.Empty;
The COMPILER defines the entrypoint PRODUCTIONfor the parser. The following lines. until the next grammar definition, are inserted into the generated Parser, and can be used to inject class fields, extra methods, etc. into the Parser source. The only thing I inserted was a field to hold the root of the AST and initialize it to empty.
IGNORECASE
This tells Coco/R that our grammar is case insensitive.
CHARACTERS defines characters that the parser should recognize for matches.
TOKENS
tag =
letter { letter | digit | "_" | ":" }
.
The only token in the grammar are tag, which is composed from the characters defined above and extra quoted characters.
IGNORE eol + cr + tab + nbsp + shy
IGNORE tells Coco/R what characters have no meaning in parsing the input.
Next come the PRODUCTIONS, i.e. the meat of the grammar. These are the rules for matching input and converting it into code. Coco/R is an LL(1) parser generator, i.e. the grammar must be parsable from Left to Right with Left-canonical derivations and one look-ahead symbol. We also cannot have a loop in our grammar, i.e all possible branches have to lead to a terminal via a unique set of production matches.
The first production, is the entry point which, again, sets the result to an empty AST, since the same instance of the parser can parse multiple expressions. It then specifies 0 or 1 BinaryExpr productions.
BinaryExpr<out TagExpression expr> (. expr = null; TagExpression right = null; .)
=
NotExpr<out expr> { (. bool and = false; .)
(
"|"
| "+" (. and = true; .)
)
NotExpr<out right> (. expr = and ? (TagExpression)new AndExpression(expr,right) : new OrExpression(expr,right); .)
}
.
BinaryExpr is used for both AND and OR expressions, since both take two sub-expressions and combine them with a single operator. The production specifies a left side of a NotExpr followed by an optional operator and another NotExpr. I.e. should our first match happen to not be a BinaryExpr, the parser can fall through to NotExpr return its result instead, without matching the optional production body.
NotExpr<out TagExpression expr> (. expr = null; int notCount = 0; .)
=
{
"^" (. notCount++; .)
}
(
"(" BinaryExpr<out expr> ")"
| tag (. expr = new Tag(t.val); .)
) (. for(var i=0;i<notCount;i++) expr = new NotExpression(expr); .)
.
END TagAlgebra.
NotExpr, just like BinaryExpr optionally matches on its operator, requiring only that the end of the operation it matches either a BinaryExpr enclosed by parentheses (i.e. not a circular match back into BinaryExpr, since it requires additional surrounding matches), or it matches a tag, the ultimate terminal in the grammar.
There is one tricky bit in this production, i.e. the NOT operator can match multiple times, which means, we need to accumulate the number of operator matches and build the chain of NOT expressions wrapping the current expression once we know how many, if any, matched.
The nice thing with Coco/R is that it adds no runtime dependency at all, building a fully self-contained Scanner and Parser. With these built, it is now possible to take Tag Algebra expressions and turn them into an executable tree of Func calls, as described in "Boolean Algebra for Tag queries".
The grammar could have been written to accumulate the unique tags and construct the Func tree right away, but the two benefits of going to an AST first, is that a)the AST can easily be rendered back into text form (even placing parentheses properly for expressions that previously had not parentheses), and b) the AST can easily be programatically composed with other expressions, or decomposed into sub-expressions, which can be used for caching and other efficiency operations.
I'll probably play with Irony next, but the "no runtime dependency" and existing familiarity made Coco/R the winner this time.
Been using NHibernate a lot recently, and just love having POCO data objects. This means that i can just hand out objects and manipulate them and then save them back to the DB without ever exposing my persistence model. For this purpose, I'd been using Data Access Objects to give me access to the objects. But like stored procedures, after a while you DAOs have this mess of Find methods on them that either have many overloads or, even worse, optionally nullable parameters. Never mind the acrobatics you jump through once you want to provide methods that query entities, based on some other entity, that's abstracted by its on DAO.
The option was to expose the NHibernate query api (talking ICriteria here, never touched HQL because the reason i went NHibernate in the first place is because i didn't want queries in strings), but that didn't taste right, so i added more methods on my DAO instead, hiding the criteria queries.
Once i started playing with the experimental Linq support, i didn't change my abstraction, just started using Linq instead of criteria inside the DAO. But last week, I finally broke down and just exposed session.Linq<T>(); as an IQueryable<T> on my DAO. For example my Account DAO, AccountService now simply looks like this:
NHibernate is completely invisible, I have full query capabilities and even mocking has become easier, just returning a List<T>.AsQueryable() from my mock DAO. I was able to remove all my Find methods. I did leave the by Id accessor GetAccount in there, just because it's the predominant use case, and i didn't want to litter code with identical Linq queries.
For technology geeks, one of the greatest enemies of productivity is teh Shiny, i.e. some algorithm, framework, language that is a possible solution to the problem at hand, but mostly attracts because of its dazzling awesomeness. When something seems like the cool way to do it, chances are you are about to step into a big pile of YAGNI.
Often, the simplest possible solution or, even more importantly, using what you know, rather than what appears to be the "best", is the way to go.
A good example is the boolean algebra parser for my tag queries. Once I realized I had a mini-DSL on my hands, my google fingers were refreshing my knowledge of Coco/R and evaluating Irony in order to build a parser to build me my AST. It took stepping back to realize that to get this to the prototype stage, some regex and a simple state machine could handle the parsing much more quickly (mostly because it avoided the tooling learning curve). It may not be the final solution, but it's one that works well enough to serve as a place holder until the rest of the tech proves out.
Don't get me wrong, this does not mean, hack up the simplest bit of code to get the job done. A straight path to the solution is likely to result in an unmaintainable mess. Certain standards of code quality need to be met, so that you can still test the result and be able to refactor it without starting from scratch. But if you meet that criteria, getting it done fast first, buys you the time to evaluate whether a) the problem you're solving is the right problem, b) the solution provided is sufficient for the intended use and c) whether the shiny tech really would improve the implementation.
Determining whether a path is just shiny or appropriate isn't always easy. One bit of tech that's been consuming a bit more time than I wish, is basing my persistence story on NHibernate, instead of rolling my own. The temptation to use what I know is strong, but fortunately (or unfortunately?), I have enough experience from rolling my own ORM and hand-coded SQL to know that down that road lies madness. So, be sure not to mistake "learning curve" for yagni.
Currently working on a tag query engine and couldn't find anything all that useful in the published approaches. I want to do arbitrary boolean algebras against a set of tags in a database, which seems to be out of scope of SQL approaches. All the various tagging schemas out there reduce to either just AND or just OR queries, but not complex logic. However, I want to be able to do something like:
(foo+bar)|(foo+baz)|(bar+^baz)
If there is a way to do this with SQL, i'd love to know. But the way i look at it, i really have to fetch all tags for each item and then do apply that formula to the list of tags on the item.
But let's break down the matching problem itself into something i can execute. Let's assume I've got a simple parser that can turn the above into an AST. Really, i can decompose any variation into three operations, AND(a,b), OR(a,b) and NOT(a). And I can represent those with some simple Func<> definitions:
Func<bool, bool, bool> AND = (a, b) => a && b;
Func<bool, bool, bool> OR = (a, b) => a || b;
Func<bool, bool> NOT = (a) => !a;
Assuming that i have boolean tokens for foo, bar and baz, the expressions becomes:
Now, the above expression can be expressed as a function that takes three booleans describing the presence of the mentioned tags, ignoring any other tags that the item has, returning a boolean indicating a successful match. In C# that expression would look like this:
Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));
Next we need to generate this boolean map from the list of tags on the item. Assuming that the tag list is a list of strings, we can define an extension methods on IEnumerable to generate the boolean map like this:
public static bool[] GetBinaryMap(this IEnumerable<string> tags, string[] mask) {
var map = new bool[mask.Length];
foreach(var x in tags) {
for(var i = 0; i < mask.Length; i++) {
if(x == mask[i]) {
map[i] = true;
}
}
}
return map;
}
And with this we can define a linq query that will return us all matching items:
var mask = new[] { "foo", "bar", "baz"};
Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));
var match = from item in items
where f(item.Tags.GetBinaryMap(mask))
select item;
Clearly this is isn't the fastest executing query, since we first had to create our items, each item in which has a collection of tags. But there is a lot of optimizations left on the table here, such as using our tag mask to pre-qualify items, breaking down the AST into sub-matches that could be used against a cache to find items, etc.
But at least we have a fairly simple way to take complex boolean algebra on tags and convert them into something that we can evaluate generically
This is definitely an edge case testing scenario, so i don't know how useful this utility class is in general, but i thought it was kinda fun deferred execution stuff, so why not post it?
Here's my scenario. I've built some Dream REST services that i've set up to create an inner Autofac container per request to create per request instances of objects -- things like NHibernate ISession and other disposable resources that only make sense in a per request context.
Now i'm writing my unit tests around these services, and need to provide mocks for these inner container created objects. I should also mention that to test the services, i am firing up a DreamHost, since the services can only be tested in the context of the full pipeline. Yeah, i know, smells a bit functional, but that's what I have to work with right now. And i need these objects to be ContainerScoped (i.e. per inner container singletons), so that multiple Resolve's return the same instance, but still return different instances on multiple requests. Ok, ok, i know the tests are doing too much... Like i said, this is an edge case. It's not strictly a unit test, but i still want coverage on this code. Getting around this would require refactoring of code that's not part of my project, so there you go.
What I want to do is set up the mock for the inner container instance on creation, which doesn't happen until i've handed over execution control to the Act part of the test. This lead me to create a factory that provides a hook for setting up the mock on creation of the mock:
public class DelegatedMoqFactory<T> where T : class
{
private Action<Mock<T>, IContext> setupCallback;
private Mock<T> mock;
public Mock<T> CurrentMock { get { return mock; } }
public T CreateInstance(IContext container)
{
mock = new Mock<T>();
if (setupCallback != null)
{
setupCallback(mock, container);
}
return mock.Object;
}
public void OnResolve(Action<Mock<T>, IContext> setupCallback)
{
this.setupCallback = setupCallback;
}
}
The open generic of DelegateMoqFactory is registered with default scope, since i want it to exist outside the inner scope, so that i can resolve it to wire up my expectations for the mock. Then on the first access for IAuthService inside the inner scope, the DelegateMoqFactory creates the mock and calls my OnResolve callback to set up the mock.
The reason there is also a CurrentMock accessor is so that I can do verification on the mock after the inner container has gone out of scope, like this:
This class should be useful whenever you are testing some code that internally creates an inner container and scoping the objects usually created under ContainerScope as default scope doesn't work (likely because there's multiple inner containers). We still get per inner container instances, but get to wire them up with deferred setups that don't come into play until the mocks are actually pulled from the inner container.
Ok, so i'm not proud of it, but i've been a hold-out on mocking frameworks for a while. With the auto-gen of interfaces that resharper gives me, i'd just gotten pretty fast at rolling my own mocks. Once or twice a year, i'd dip my toe into a mocking framework, find its syntax frustrating and rather than get a good used to it, I'd soon find myself replacing NotImplementedException in a stub for yet another custom Mock object.
My reasoning was that if i can roll my mocks just as fast as wiring up a mock, then why bother with the dependency. And I thought I wasn't really causing myself too much extra work.
In the meantime, I even wrote a simple Arrange/Act/Assert mocking harness for Dream, so i could better test REST services. So, it's not like i didn't believe in the benefits of mocking harnesses.
Well, for the last couple of weeks, I've been using Moq and it's pretty much killed off all my desire to roll my own. I'm generally a huge fan of lambdas and have gotten used thinking in expressions. Although even with that, I wasn't able to get comfortable with the latest Rhino.Mocks. Probably just me. But from the very first attempt, Moq worked like i think and I was up and running.
var mock = new Mock();
mock.Setup(x=> x.Bar).Returns("bar").AtMostOnce();
var foo = mock.Object;
Assert.AreEqual("bar",foo.Bar);
When I pick up a new framework or library, there's usually that learning curve where I get familiar with its API, find what works, what doesn't work, etc. One habit I've gotten into is that I create a TestFixture for everything i think i should be able to do and build a test for that assumption. The purpose of these tests is both to make sure the code does what I expect it to, but also to serve as a record of what I've already learned. If i later on wonder how some call would function, i first check my test signatures, to see if i've already tested that behavior. If there is an appropriate test, I immediately know what the behavior will be, plus i now have working sample code or how to do it.
For example, I was playing around with setting up Moq's through Autofac and wanted to come up with a registration that would give me a container scoped Moq object that i could set up before executing a particular test. The resulting test looked like this:
public interface IMockWithAccessor
{
IMockAccessorValue Accessor { get; }
}
public interface IMockAccessorValue
{
string Foo { get; }
}
[Test]
public void Create_nested_mock_so_it_can_be_altered_in_container_scope()
{
var builder = new ContainerBuilder();
builder.Register(c => new Mock<IMockAccessorValue>())
.As<Mock<IMockAccessorValue>>().ContainerScoped();
builder.Register(c => c.Resolve<Mock<IMockAccessorValue>>().Object)
.As<IMockAccessorValue>().ContainerScoped();
builder.Register(c =>
{
var mockBuilder = new Mock<IMockWithAccessor>();
mockBuilder.Setup(x => x.Accessor)
.Returns(c.Resolve<IMockAccessorValue>());
return mockBuilder.Object;
}).As<IMockWithAccessor>().ContainerScoped();
using (var container = builder.Build().CreateInnerContainer())
{
var mockAccessorBuilder = container
.Resolve<Mock<IMockAccessorValue>>();
mockAccessorBuilder.Setup(x => x.Foo).Returns("bar");
Assert.AreEqual("bar", container
.Resolve<IMockWithAccessor>().Accessor.Foo);
}
}
Sometimes, of course, my expectations are not met and the code does not allow me to do what i set out to do. These test are even more valuable for future reference, as long as i make sure to rename the test to reflect the failed expectation, and alter the asserts to reflect the actual behavior.
I was trying to figure out parametrized component registrations in Autofac. The example showed it being used with FactoryScope. I wondered whether, in default (Singleton) scope, Autofac would use the parameters to create singletons per parameter set. My original test was named Parametrized_resolve_creates_different_singleton_per_parameter_Value. Well, it turned out that, no, autofac does not vary singletons, and parametrized registrations only make sense in FactoryScope. The final test looks like this:
public class ParametrizedSingleton { }
[Test]
public void Parametrized_resolve_without_factory_scope_is_always_a_singleton()
{
var builder = new ContainerBuilder();
builder.Register((c, p) => new ParametrizedSingleton());
using (var container = builder.Build())
{
var foo1 = container.Resolve<ParametrizedSingleton>(
new NamedParameter("type", "foo"));
var foo2 = container.Resolve<ParametrizedSingleton>(
new NamedParameter("type", "foo"));
var bar1 = container.Resolve<ParametrizedSingleton>(
new NamedParameter("type", "bar"));
Assert.AreSame(foo1, foo2);
Assert.AreSame(foo1, bar1);
}
}
I usually keep these test fixtures in a separate test project in my solution as permanent reference, as i continue to develop the main code. It's proven useful a number of times when coming back to some old code and having to reacquaint myself with a third party bit of code.
In my last post I proposed using delegates instead of interfaces to declare dependencies for injection. While delegates are limited to a single function call, this is often sufficient for service dependencies. In addition, this is not a wholesale replacement for traditional IoC, since at the end of the day, you have to have some class instances that provide the methods bound by the delegates and want to return instances require those delegates, so our container will still need to resolve class instances.
The main benefit of using delegates instead of interfaces is that delegates do not impose any requirements on the providing class, and thereby dependencies can be defined by what the client needs rather than what a service provides.
To illustrate this mapping, let's bring back the classes defined in the last post:
public class MessageQueue
{
public void Enqueue(string recipient, string message) { ... }
public string TryDequeue(string recipient) { ... }
}
public class Producer : IProducer
{
public delegate void EnqueueDelegate(string recipient, string message);
public Producer(EnqueueDelegate dispatcher) { ... }
}
public class Consumer : IConsumer
{
public delegate string TryDequeueDelegate(string recipient);
public Consumer(TryDequeueDelegate inbox) { ... }
}
What we need is a way to map a delegate to a method:
Aside from the fact that the above is not a legal C# syntax, the above has the implicit assumption that we can resolve a canonical instance of MessageQueue, since MessageQueue.Enqueue really refers to a method on an instance of MessageQueue. I.e. our container must function like a regular IoC container so we can resolve that instance.
The above solves the scenario of a mapping one delegate to one implementation. In addition, we'd probably want the flexibility to map a particular implementation to a particular client class, such that:
The usage scenarios described are simple enough to understand. If our container were initialized using strings or some external configuration file (Xml, custom DSL, etc.), the actual reflection required for the injection of mappings isn't too complex either. However, I abhor defining typed mappings without type-safety. This is isn't so much about making mistakes in spelling, etc. that the compiler can't catch. It is mostly about being able to navigate the mappings and the dependencies they describe and the ability to refactor while keeping the mappings in sync.
It would be great if we could just say:
Note: I'm purposely using a syntax that mimics Autofac, since the implementation later on will be done as a hack on top of Autofac
That looks fine and could even work in the face of polymorphism, since knowing the signature of Producer.EnqueueDelegate we could reflect the proper overload of MessageQueue.Enqueue. However, C# has no syntax for getting at the MethodInfo of a method via Generics (it is not a type). There isn't even an equivalent to typeof(T) for members, the reason for which was well explained by Eric Lippert in In Foof We Trust: A Dialogue. The only way to get MethodInfo relies on string based reflection.
Fortunately, C#3.0 introduced a syntax that allows us to capture method calls as expression trees that we can decompose programatically. This lets us to express our method call like this:
This expression conveniently infers the type of a and b. As a sidenote, Producer.EnqueueDelegate does not mean that EnqueueDelegate is a member of Producer. It's just a syntax artifact of nested declarations in C#, which in this case conveniently makes the delegate look attached to the class.
Unfortunately, there we can't just include MessageQueue in the parameter list of the lambda. If we were to include it in the argument list, it could not be inferred, and if we were to define MessageQueue explicitly as a lambda parameter, then we'd be forced to declare all arguments. We want to express the above and only explicitly define MessageQueue. To accomplish this we need to create a composite expression that previously was told about MessageQueue:
Expression<Func<MessageQueue, Expression<Producer.EnqueueDelegate>>> expr
= x => (a,b)=> messageQueue.Enqueue(a,b);
Now we have enough syntactic sugar to describe our two registration scenarios in terms of the container builder. First, the global registration of the delegate against an implementation:
builder.Define<Consumer.TryDequeueDelegate>()
.As<MessageQueue>(x => a => x.TryDequeue(a));
builder.Define<Producer.EnqueueDelegate>()
.As<MessageQueue>(x => (a, b) => x.Enqueue(a, b));
And alternately, the registration of delegates and their implementation in the context of a particular class:
builder.Register<Consumer>().As<IConsumer>()
.With<Consumer.TryDequeueDelegate>()
.As<MessageQueue>(x => a => x.TryDequeue(a));
builder.Register<Producer>().As<IProducer>()
.With<Producer.EnqueueDelegate>()
.As<MessageQueue>(x => (a, b) => x.Enqueue(a, b));
Next time, I'll go over the implementation of the above to get it working as an injection framework.
I've been on a bit of a tear about declaring dependency contracts and injecting only what is required. While examining the use of Interfaces in IoC and their shortcomings, I decided that taken to the extreme, dependencies come down to call dependencies, which could be modeled with delegates rather than interfaces. Instead of writing a novel, as I've been prone to, i thought I'd do a shorter post on my approach to this solution, and expand on the implementation in later posts.
To recap, in the SOLID principles, the Interface Segregation Principle states: Clients should not be forced to depend upon interfaces that they do not use. This means that interfaces should be fine-grained enough to expose no more than one responsibility. Taken to the extreme, this could be taken to mean that each interface only has a single method. There are valid SRP scenarios where a responsibility is modeled by more than one call, but let's start with the simplest scenario first, then see how well it applies to more complex responsibilities later.
In C# we have delegates, which describe a single method call. A delegate instance is a reference to a method that encapsulates a specific instance of a class, without exposing the underlying class (unless your delegate is a static method). A delegate can even be used to expose internal, protected and private methods.
Instead of declaring a list of interfaces that the IoC container should inject, classes would define their dependencies as delegates. Taking the example from my duck typing post, we would get the following dependency declarations.
First, we have the same service provider, MessageQueue, which still doesn't need to implement an interface:
public class MessageQueue
{
public void Enqueue(string recipient, string message) { ... }
public string TryDequeue(string recipient) { ... }
}
Next, we have the new Producer, now declaring its dependency has a delegate:
public class Producer : IProducer
{
public delegate void EnqueueDelegate(string recipient, string message);
public Producer(EnqueueDelegate dispatcher) { ... }
}
And finally, we have the new Consumer, also declaring a delegate for construction time injection:
public class Consumer : IConsumer
{
public delegate string TryDequeueDelegate(string recipient);
public Consumer(TryDequeueDelegate inbox) { ... }
}
Think of the delegate as your Method Interface. You could define your dependencies as Func's and Action's, but that would obfuscate your dependencies beyond recognition in most scenarios. By using an explicit delegate, you get to attach the dependency to the class that has the dependency, in addition to having a descriptive signature.
Now, if we were to wire this up manually we'd get something like this:
var queue = new MessageQueue();
IProducer producer = new Producer(queue.Enqueue);
IConsumer consumer = new Consumer(queue.TryDequeue);
That's simple enough, but not really very scalable, once you get a lot of dependencies to wire up. What we really need is an IoC container that let's us register delegates against classes, instead of having to have instances at dependency declaration time. Delegates can't be cast from one to another and are not, strictly speaking, types, which posts some challenges with creating a type-safe registration interface. There are a number of ways to accomplish this syntax, which I will elaborate on in my next post.