Searching a Tree of Objects with Linq, Revisited
A while back, I wrote about searching through a tree using linq to objects. That post was mostly snippets of code about delegates, lambda's, yield and how it applies to linq -- more a technical exploration than an example. So I thought I'd follow it up with concrete extension methods to make virtually any tree searchable by Linq.
Linq, IEnumerable, yield
All that is required to search a tree with Linq is creating a list of all nodes in the tree. Linq to Objects can operate on IEnumerable
To turn a tree into a list of node we need to walk and collect all children of every node. A simple task for a recursive list that carries along a list object to stuff every found node into. But there is a better way, using yield to return each item as it is encountered. Now we don't have to carry along a collection. Iterators using yield implement a pattern in which a method can return more than once. For this reason, a method using yield in C# must return an IEnumerable, so that the caller gets a handle to an object it can traverse the result of the multiple return values.
IEnumerable is basically an unbounded set. This is also the reason why unlike collections, it does not have a Count Property. It is entirely possible for an enumerator to return an infinite series of items.
Together IEnumerable
Two types of Tree Traversal
Depth First
In depth-first traversal, the algorithm will dig continue to dig down a nodes children until it reaches a leaf node (a node without children), before considering the next child of the current parent node.
Breadth First
In breadth-first traversal, the algorithm will return all nodes at a particular depth first before considering the children at the next level. I.e. First return all the nodes from level 1, then all nodes from level 2, etc.
Tree to IEnumerable Extension methods
public static class TreeToEnumerableEx
{
public static IEnumerable<T> AsDepthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
{
yield return head;
foreach (var node in childrenFunc(head))
{
foreach (var child in AsDepthFirstEnumerable(node, childrenFunc))
{
yield return child;
}
}
}
public static IEnumerable<T> AsBreadthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
{
yield return head;
var last = head;
foreach(var node in AsBreadthFirstEnumerable(head,childrenFunc))
{
foreach(var child in childrenFunc(node))
{
yield return child;
last = child;
}
if(last.Equals(node)) yield break;
}
}
}
This static class provides two extension methods that can be used on any object, as long as it's possible to express a function that returns all children of that object, i.e. the object is a node in some type of tree and has a method or property for accessing a list of its children.
An Example
Let's use a hypothetical Tree model defined by this Node
class:
public class Node
{
private readonly List<Node> children = new List<Node>();
public Node(int id)
{
Id = id;
}
public IEnumerable<Node> Children { get { return children; } }
public Node AddChild(int id)
{
var child = new Node(id);
children.Add(child);
return child;
}
public int Id { get; private set; }
}
Each node simply contains a list of children and has an Id
, so that we know what node we're looking at. The AddChild()
method is a convenience method so we don't expose the child collection and no node can ever be added as a child twice.
The calling convention for a depth-first collection is:
The lambda expression n => n.Children
is the function that will return the children of a node. It simply states given n, return the value of the Children property of n. A simple test to verify that our extension works and to show us using the extension in linq looks like this:
[Test]
public void DepthFirst()
{
// build the tree in depth-first order
int id = 1;
var depthFirst = new Node(id);
var df2 = depthFirst.AddChild(++id);
var df3 = df2.AddChild(++id);
var df4 = df2.AddChild(++id);
var df5 = depthFirst.AddChild(++id);
var df6 = df5.AddChild(++id);
var df7 = df5.AddChild(++id);
// find all nodes in depth-first order and select just the Id of each node
var IDs = from node in depthFirst.AsDepthFirstEnumerable(x => x.Children)
select node.Id;
// confirm that this list of IDs is in depth-first order
Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}
For breadth-first collections, the calling convention is:
Again, we can test that the extension works like this:
[Test]
public void BreadthFirst()
{
// build the tree in breadth-first order
var id = 1;
var breadthFirst = new Node(id);
var bf2 = breadthFirst.AddChild(++id);
var bf3 = breadthFirst.AddChild(++id);
var bf4 = bf2.AddChild(++id);
var bf5 = bf2.AddChild(++id);
var bf6 = bf3.AddChild(++id);
var bf7 = bf3.AddChild(++id);
// find all nodes in breadth-first order and select just the Id of each node
var IDs = from node in breadthFirst.AsBreadthFirstEnumerable(x => x.Children)
select node.Id;
// confirm that this list of IDs is in depth-first order
Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}
Searching Trees
The tree used in the example is of course extremely simple, i.e. it doesn't even have any worthwhile data to query attached to a node. But these extension methods could be used on a node of any kind of tree, allowing the full power of Linq, grouping, aggregation, sorting, projection, etc. to be used on the tree.
As a final note, you may wonder, why bother with depth-first vs. breadth first? After all, in the end we do examine every node! There is however one particular case where the choice of algorithm can be very important: You are looking for one match or a particular number of matches. Since we are using yield, we can terminate the traversal at any time. Using the FirstOrDefault()
extension on our Linq expression, the traversal would stop as soon as one match is found. And if have any knowledge where that node might be in the tree, the choice of search algorithm can be a significant performance factor.