Skip to content

Iloggable

LINQ: Immutability vs. Deferred execution

The last couple of nights I've been playing with some Linq to Sql and a whole lot of Linq to Objects and I have to say where coming up with complex Regular Expressions used to be one of my favorite puzzles, coming up with complex projections and transformations through Linq is quickly taking its place. Simple Linq is well documented, but when it comes to aggregation, it's a lot sparser. I expect to write more of that up once I feel more comfortable with the syntax.

In the meantime, I wanted to write up some non-obvious observation about deferred execution with Linq. Considering the gotchas with lambdas, it's easy to extend the lessons learned to linq, since it is after all deferred execution. But what's different with Linq is that, while execution is deferred, the expression tree built via a query is also immutable. I came across this trying to do some simple query re-use.

Let's start with a simple DTO:

public class Order
{
  public Order(int id, int val, bool buyOrder)
  {
    Id = id;
    Value = val;
    IsBuyOrder = buyOrder;
  }
  public int Id { get; set; }
  public int Value { get; set; }
  public bool IsBuyOrder { get; set; }
}

And a set of this data:

Order[] orders = new Order[]
{
  new Order(1,2,true),
  new Order(2,2,false),
  new Order(3,4,true),
  new Order(4,4,false),
  new Order(5,6,true),
  new Order(6,6,false),
};

Let's split those into buy and sell orders:

var buyOrders = from order in orders
          where order.IsBuyOrder
          select order;

var sellOrders = from order in orders
                 where !order.IsBuyOrder
                 select order;

If we want to find the buy and the sell order with a value of 2, you'd think we could write one query and re-use it for both of those queries. Since both queries results in IEnumerable, how about we define a query source and assign the value of either above query.

IEnumerable<Order> orders2 = null;

var orderAtTwo = from order in orders2
                 where order.Value == 2
                 select order;

orders2 = buyOrders;
int buyOrderId = orderAtTwo.First().Id;

orders2 = sellOrders;
int sellOrderId = orderAtTwo.First().Id;

Console.WriteLine("buy Id: {0}, sell Id: {1}", buyOrderId, sellOrderId);

Since the query is deferred until we call .First() on it, that seems like a reasonable syntax. Except this will result in an System.ArgumentNullException because our query grabbed a reference to orders2 at query definition, even though the query won't be executed until later. Giving orders2 a new value does not change the original reference in the immutable expression tree.

A way around this is to replace the actual contents of orders2. However, for us to do that, we have to turn it into the query source into a collection first.

orders2.Clear();
orders2.AddRange(buyOrders);
int buyOrderId = orderAtTwo.First().Id;

orders2.Clear();
orders2.AddRange(sellOrders);
int sellOrderId = orderAtTwo.First().Id;

Console.WriteLine("buy Id: {0}, sell Id: {1}", buyOrderId, sellOrderId);

This gives us the expected

buy Id: 1, sell Id: 2

Let's put aside the awkwardness of clearing out a list and stuffing data back in, this code has another unfortunate sideeffect. .AddRange() actually executes the query passed to it, so we execute our buy and sell queries to populate orders2 and then execute orderAtTwo twice against those collections. The beauty of linq is that if you create a query from a query, your not running multiple queries, but building a more complex query to be executed. So, what we really want is query "re-use" that results in single expression trees at execution time.

To achieve this, we need to move the shared query into a separate method such as:

private IEnumerable<Order> GetTwo(IEnumerable<Order> source)
{
  return from order in source
         where order.Value == 2
         select order;
}

and the code becomes:

int buyOrderId = GetTwo(buyOrders).First().Id;
int sellOrderId = GetTwo(sellOrders).First().Id;

Console.WriteLine("buy Id: {0}, sell Id: {1}", buyOrderId, sellOrderId);

This gives the same output as above, and we're only running two queries, each against the original collection. The method call means that we don't get to re-use an expression tree, since it builds a new one, combining the expression tree passed to it with the one it builds itself.

Struct's via Automatic Properties can be tricky

Here's a bit of code i just got through debugging...

public Point Point { get; private set; }

public void Offset(Point origin)
{
  Point.Offset(-origin.X, -origin.Y);
}

Can you tell what's wrong here? Let's just say that the Offset won't take.

structs are value types, which means anytime you pass one around you get a new copy. So far so good. And that means when you expose a struct value via a property, the accessing party always is looking at a copy. Again, fine, after all when you do a set you change the stored value and if you need to manipulate it, you just manipulate the actual struct.

Enter Automatic properties and you might forget about this last detail and not realize that you never get access to the underlying value, even from within the class. I.e. when I call Point.Offset, i'm calling it on the copy that was passed to me and the resulting value is immediately thrown away. So i just went back to using the property to facade a private Point, which i can now manipulate inside of Offset. Duh.

More Deferred Execution Fun: foreach and delegation scope

This is closely related to my last post on deferred execution gotchas and its basically more "if you inline delegated code, you may easily overlook scope side-effects". This time it's about dealing with foreach and using the local each item for deferred execution.

public void SpawnActions()
{
  foreach (ActionContext context in contexts)
  {
    int id = context.Id;
    Action<int> callback =  
      (workerNumber) =>
      {
        Console.WriteLine("{0} Id: {1}/{2}", workerNumber, id, context.Id);
      };
    ThreadPool.QueueUserWorkItem(new WaitCallback(FutureExecute), callback);
  }
}

public void FutureExecute(object state)
{
  int id = worker++;
  Action<int> callback = state as Action<int>;
  Thread.Sleep(500);
  callback(id);
}

The output looks like this:

0 Id: 0/9
1 Id: 1/9
2 Id: 2/9
3 Id: 3/9
4 Id: 4/9
5 Id: 5/9
6 Id: 6/9
7 Id: 7/9
8 Id: 8/9
9 Id: 9/9

So while the foreach scope variable context is kept alive for the deferred execution, it turns out that foreach re-uses the variable on each pass through the loop and therefore when the Action is later executed, each one has a reference to the last context. So what we need to do is create a true local variable for context so that the lambda's scope can hold on to the reference we want.

foreach (ActionContext context in contexts)
{
  int id = context.Id;
  // locally scoped variable
  ActionContext c2 = context;
  Action<int> callback =  
    (workerNumber) =>
    {
      Console.WriteLine("{0} Id: {1}/{2}", workerNumber, id, c2.Id);
    };
  ThreadPool.QueueUserWorkItem(new WaitCallback(FutureExecute), callback);
}

And now our results are a bit more what we expected:

0 Id: 0/0
1 Id: 1/1
2 Id: 2/2
3 Id: 3/3
4 Id: 4/4
5 Id: 5/5
6 Id: 6/6
7 Id: 7/7
8 Id: 8/8
9 Id: 9/9

Copying a Silverlight User Control from one project to another

Here's something I tracked down with no help from error messages:

When you copy a user control in Silverlight 1.1 from one project to another the Xaml that the control loads will have it's Build Action set to SilverlightPage. When you then run your project and try to create an instance of that control you'll get the ever so informative AG_E_INVALID_ARGUMENT. All you need to do to fix it, is set the Build Action to Embedded Resource again. Tada!

I love declarative definition of UI with behavior wired to static code. But man, at its current state, the debugging support for it just isn't there. I mean it's bad enough that having strings in the declarative side to link to actions that won't be updated by normal refactoring, nor will they show up as references, but at this state Xaml brings the worst part of scripting languages to compile-time checked coding:

Vague runtime errors without a stacktrace

Bah.

XP on Bootcamp & VMWare Fusion, take 2

Time to re-install XP

My setup until yesterday was Bootcamp partion that was running as a VM using VMWare Fusion. The Bootcamp partition was set up as FAT32, because I NTFS came up as readonly when mounted under Mac OS.

Then I started up another VM (Fedora Core 7, although i have since noticed it's not what OS you run, but just a second VM). Maybe it's not enough memory, maybe it's a VM running from bootcamp plus a VM running from a disk image, but while it had worked previously, this time, it locked up my Mac hard. I finally had to hard boot the Mac. When I got back into th VM, I noticed things were broken. Now, I've had to hard boot XP many times and I've never seen this. An indeterminate number of files were corrupted. I noticed one XML file that halfway through turned into binary garbage, so I assume that the other systems failing were suffering from similar corruption. Basically it was hosed, because there was no way to determine what had been corrupted. Time to re-install.

Bootcamp & VMware install

Looking at my post from last time, it was clearly written with the frazzled recollection of a day of trying to make things work, since i once again ran into problems. This time I'm making sure I write the resolution down.

  1. Imaged the old partition, because you always realize that you forgot to back up some vital file.

  2. Removed the old partition and created a new one (using the Bootcamp tool), this time large enough for more than just fallback use (since it's used daily under VMWare). This creates a FAT32 partition.

  3. Inserted my install disk and Bootcamp restarted to fire up the windows install. Important note here, this has to be a real XP install disk. I first tried to use my MSDN DVD with the chooser for picking what OS to install. However, the Mac keyboard doesn't seem to work when you get to the menu. So I used an XP Pro w/ SP2 CD and rebooted. Now the installer ran just fine

  4. Formatted the partition using NTFS. Even if you use FAT32, re-format, don't use the Bootcamp formatted partition. At least for me, using that prepared partition didn't work and created an unbootable image. I know NTFS cannot be written to by MacOS, but it doesn't really matter, since once I boot it as a VM, i can always transfer files via loopback file sharing. I'm going NTFS to get a journaling file system. Theoretically that should prevent the corruption I got last time.

  5. After the install completes, pop in the Mac OS disk and let it install the Bootcamp utilities. This gives you full support for the Macbook Pro hardware.

  6. Activate Windows and reboot into Mac OS X.

  7. Fire up VMWare Fusion. Bootcamp partition should be listed as a VM. Since I previously had a bootcamp partition, I had to go into Library::VMWare Fusion::Application Support::Virtual Machines and remove the old Bootcamp partition folder. Fusion will then do its magic and prep the Bootcamp partition to run as a VMWare Image.

  8. Activate Windows again. That should be the last time you have to do it.

Let's hope this doesn't turn into a bi-monthly process :)

The dangers of deferred execution

I recently wrote about Action & Func, which along with Lambda expression let you do easy inline callbacks like this:

Utility.ActionDownloader.Download(
  Configuration.GetAssetUri(dto.Url),
  (Downloader d) =>
  {
    FloatContainer c = (FloatContainer)XamlReader.Load(d.ResponseText);
    c.Initialize(dto);
  });

i.e. I can call a downloader and inline pass it a bit of code to execute once the download completes. But the catch of course is that looking at the code, and following the usual visual tracing of flow hides the fact that c.Initialize(dto) doesn't get called until some asynchronous time in the future. Now, that's always been a side-effect of delegates, but until they became anonymous and inline, the visual deception of code that looks like it's in the current flow scope but isn't wasn't there.

What happened was that I needed my main routine to execute some code after FloatContainer was initialized, and by habit i created an Initialized event on FloatContainer. Of course this was superfluous, since my lambda expression called the synchronous Initialize, i.e my action could be placed inline after that call to c.Initialize(dto) and be guaranteed to be called after initialization had completed.

This scenario just meant I created some superfluous code. However, I'm sure as I use lambda expression more, there will be more pitfalls of writing code that doesn't consider that its execution time is unknown, as is the state of the objects tied to the scope of the expression.

This last bit about objects tied to the expression scope is especially tricky and I think we will see some help in terms of Immutable concepts weaving their way into C# 3.x or 4.0, as the whole functional aspect of lambda expressions really work best when dealing with objects that cannot change state. Eric Lippert's been laying the groundwork in a number of posts on the subject and while he constantly disclaims that his ponderings are not a roadmap for C#, I am still going to assume that his interest and recognition of the subject of Immutables will have some impact in a future revision of the language. Well, I at least hope it does.

Action & Func: Never write another delegate

With lambda expressions in C#, the Func
generic delegate and it's variations have been getting a lot of attention. So naturally, you might think that the lambda syntax is just a shortcut for creating anonymous delegates, whether they return values or not.

First let's look at the evolution of delegates from 1.1 to now. Delegates, simply are the method equivalent of function pointers. They let you pass a method call as an argument for later execution. The cool thing (and a garbage collection pitfall) is that a delegate creates a lexical closure, i.e. the delegate carries with it the object that the method gets called on. For garbage collection this means that a delegate prevents an object from being collection. That's why it's important to unsubscribe from those events you subscribed to.

But I digress. Let's define a delegate that returns an Integer and a method that matches that delegate:

delegate int IntProducerDelegate();

public int x = 0;
public int IntProducer()
{
  return x++;
}

With the original .NET 1.0 syntax we'd create the delegate like this:

IntProducerDelegate p1 = new IntProducerDelegate(IntProducer);

Now we can call p1() and get an integer back, and since it's closure, each time we call p1() the originating objects x increases as does our return value.

Then, in .Net 2.0 we got anonymous delegates.

IntProducerDelegate p2 = delegate { return IntProducer(); };

// or with IntProducer's action inlined...
IntProducerDelegate p3 = delegate { return x++; };

This got rid of the need to create a method just to pass along a closure that manipulated our object at a later time. The other thing that anonymous delegates re-inforce is that delegates just care about signature. IntProducerDelegate can get assigned any delegate that takes no argument and returns an int. That sounds like a perfect scenario for generics and in .NET 3.5, we got just that, a set of generic delegates called Func. Using Func, we quickly get to our lambda expression replacing the original delegate syntax like this:

// create a new Func delegate just like the IntProducerDelegate
IntProducerDelegate p3 = new Func<int>(IntProducer);

// which means that we don't need IntProducerDelegate at all anymore
Func<int> p4 = delegate { return x++; };

// and the anonymous delegate can also be shorthanded with a lambda expression
Func<int> p5 = () => { return x++; };
// which says, given that we take no argument "()", execute and return the following "return x++;"

However, before there ever was Func, .Net 2.0 introduced the generic delegate Action, which is a natural counterpart to Func, encapsulating a method that does not return anything. Following through the example of the producer, we'll create a consumer like:

delegate void IntConsumerDelegate(int i);

public void IntConsumer(int i)
{
  Console.WriteLine("The number is {0}", i);
}

Now following the same evolution of syntax we get this:

IntConsumerDelegate c1 = new IntConsumerDelegate(IntConsumer);

IntConsumerDelegate c2 = new Action<int>(IntConsumer);

Action<int> c3 = delegate(int i) { Console.WriteLine("The number is {0}", i); };

Action<int> c4 = (i) => { Console.WriteLine("The number is {0}", i); };

So lambda syntax can be used to create either a Func or an Action. And that also means that we never have to explicitly need to create another delegate, being able to use a variation of these two generic delegates as our arsenal for storing lambda expressions of all kinds.

Software Activation vs. Virtualization, Part 3

Part of an ongoing saga.

Rebooted back into VMWare Fusion and yeah, Illustrator Activation was indeed screwed there as well. Office 2007 too, but at least it just let's me reactivate (no doubt noting me as a repeat offender somewhere). So I called Adobe and was told that "it's a sensitive piece of software". No it's not. Illustrator can take any beating you give it.. It's the "anti-piracy" crap that's sensitive. I got an "emergency activation code" to get it going again and was advised to Deactivate before i switch VM setups and then re-activate after the reboot. OMFG. Seriously, just give me USB dongle if you are so sensitive about it. That would be inifintely more convenient.

Dug around the net a bit and it seems that if i fake my mac address to be the same between boot camp and the VM boot, it'll not invalidate my activation. Might try that next. Of course, the same board i found that on also noted that if I just got a crack for my legally purchased product, all troubles would be gone as well. Yes, once again, anti-piracy crap is not stopping pirates but legitimate customers. You'd figure someone might have spotted the pattern here, but may those DRM-colored glasses filter reality a bit too well.

Software Activation vs. Virtualization (and multiple PC ownership)

Just as Virtualization is finally becoming a useful technology, everybody and their uncle has decided that software activation is the new hot way to stop theft. Of course, like all anti-piracy tools, the paying customers get screwed, because the pirates have already patched their copies to not require activation. Bravo! You know i'd prefer friggin USB dongles to this big brother activation business.

I've talked about these problems before, but I've got more fun with the VM vs. bootcamp image activation troubles. I just got Adobe CS3 and for a programmer with occasional Photoshop/Illustrator needs, that's a pretty serious expense. I mean it costs me more than MSDN and gets used a fraction of the time. But I need it. And forget that I have three different computers I use at different times and I really ought to be able to install my purchased software on all of these machines, since I, the owner of the license, will never be using two computers at once. But that's a whole other story.

Back to the re-activation on hardware change business... I've been running Windows under VMware for the last couple of weeks, but for the Illustrator work I need to do right now, it was a bit sluggish. No problem, reboot into Bootcamp! Mind you, this isn't a differnt install of Windows. This is the same physical disk partition, but booted natively vs. via VMware. What happens? Illustrator bitches about activation, as does office, because it saw the hardware change. Let me guess, when i reboot in the virtual machine it'll bitch yet again. Sooner or later it'll just shut me down as a serial offender. Thanks! Way to reward my purchase.

A case for XML

XML gets maligned a lot. It's enterprisey, bloated, overly complex, etc. And the abuses visited upon it, like trying to express flow control or whole DSLs in it or being proposed as some sort of panacea for all interop problems only compound this perception. But as long as you treat it as what it is, data storage, I generally can find little justification to use something else. Not because it's the best, but because it's everywhere.

If you are your own consumer and you want a more efficient data storage, just go binary already. If you're not, then I bet your data consumers are just tickled that they have to add another parser to their repository of data ingestors. Jim Clark probably put it best when he said:

"For the payload format, XML has to be the mainstay, not because it's technically wonderful, but because of the extraordinary breadth of adoption that it has succeeded in achieving. This is where the JSON (or YAML) folks are really missing the point by proudly pointing to the technical advantages of their format: any damn fool could produce a better data format than XML."

Ok, I won't get religious on the subject, but mostly wanted to give a couple of examples, where the abilities and the adoption of XML have been a godsend for me. All this does assume you have a mature XML infrastructure. If you're dealing with XML via SAX or even are doing the parsing and writing by hand, then you are in a world of hurt, I admit. But unless it's a memory constraint there really is no reason to do that. Virtually every language has an XML DOM lib at this point.

I love namespaces

One feature a lot of people usually point to when they decry XML to me is namespaces. They can be tricky, i admit, and a lot of consumers of XML don't handle them right, causing problems. Like Blend puking on namespaces that weren't apparently hardcoded into its parser. But very simply, namespaces let you annotate an existing data format without messing with it.

<somedata droog:meta="some info about somedata">
  <droog:metablock>And a whole block of extra data</droog:metablock>
</somedata>

Here's the scenario. I get data in XML and need to reference metadata for processing further down the pipeline. I could have ingested the XML and then written out my own data format. But that would mean I'd have to also do the reverse if I wanted to pass the data along or return it after some modifications and I have to define yet another data format. By creating my own namespace, I am able to annotate the existing data without affecting the source schema and I can simply strip out my namespace when passing the processed data along to someone else. Every data format should be so versatile.

Transformation, Part 1: Templating

When writing webapps, there are literally dozens of templating engines and there's constantly new ones emerging. I chose to learn XSLT some years back because I liked how Cocoon and AxKit handled web pages. Just create your data in XML and then transform it using XSLT according to the delivery needs. So far, nothing especially unique compared to other templating engines. Except unlike most engines, it didn't rely on some program creating the data and then invoking the templating code. XSLT works with dynamic Apps as easily as with static XML or third party XML without having.

Since those web site roots, I've had need for email templating and data transformation in .NET projects and was able to leverage the same XSLT knowledge. That means I don't have to pick up yet another tool to do a familiar task just a little differently.

What's the file format?

When I first started playing with Xaml, I was taking Live For Speed geometry data and wanted to render it in WPF and Silverlight. Sure, I had to learn the syntax of the geometry constructs, but I didn't have to worry about figuring out the data format. I just used the more than familiar XmlDocument and was able to concentrate on geometry, not file formats.

Transformation, Part 2: Rewriting

Currently I'm working with Xaml again for a Silverlight project. My problem was that I had data visualization in Xaml format (coming out of Illustrator), as well as associated metadata (a database of context data) and I needed to attach the metadata to the geometry, along with behavior. Since the first two are output from other tools I needed a process that could be automated. One way would be to walk the Visual tree once loaded, create a parallel hierarchy of objects containing the metadata and behavior and attach their behavior to the visual tree. But i'd rather have the data do this for itself.

<Canvas x:Name="rolloverContainer_1" Width="100" Height="100">
  <!-- Some geometry data -->
</Canvas>

<!-- becomes -->

<droog:RolloverContainer x:Name="rolloverContainer_1" Width="100" Height="100">
  <!-- Some geometry data -->
</droog:RolloverContainer>

So I created custom controls that subclassed the geometry content containers. I then created a post-processing script that simply loaded the Xaml into the DOM and rewrote the geometry containers as the appropriate custom controls using object naming as an identifying convention. Now the wiring happens automatically at load, courtesy of Silverlight. Again, no special parser required, just using the same XmlDocument class I've used for years.

And finally, Serialization

I use XML serialization for over the wire transfers as well as data and configuration storage. In all cases, it lets me simply define my DTOs and use them as part of my object hierarchy without ever having to worry about persistence. I just save my object graph by serializing it to XML and rebuild the graph by deserializing the stream again.

I admit that this last bit does depend on some language dependent plumbing that's not all that standard. In .NET, it's built in and let's me mark in my objects with attributes. In Java, I use Simple for the same effect. Without this attribute driven mark up, I'd have to walk the DOM and build m objects by hand, which would be painful.

Sure, for data, binary serialization would be cheaper and more compact, but that misses the other benefits I get for free. The data can be ingested and produced by a wide variety of other platforms, I can manually edit it, or easily build tools for editing and generation, without any specialized coding.

For my Silverlight project, I'm currently using JSON as my serialization layer between client and server, since there currently is no XmlSerializer or even XmlDocument in Silverlight 1.1. It, too, was painless to generate and ingest and, admittedly, much more compact. But I then I added this bit to my DTO:

List<IContentContainer> Containers = new List<IContentContainer>();

It serialized just fine, but then on the other end it complained about there not being a no-argument constructor for IContentContainer. Ho Hum. Easily enough worked around for now, but I will be switching back to XML for this once Silverlight 2.0 fleshes out the framework. Worst case, I'll have to build XmlSerializerLitem, or something like that, myself.

All in all, XML has allowed me to do a lot of data related work without having to constantly worry about yet another file format, or parser. It's really not about being the best format, but about it virtually being everywhere and being supported with a mature toolchain across the vast majority of programming environment and that pays a lot of dividents, imho.