Skip to content

Iloggable

XP on Bootcamp & VMWare Fusion, take 2

Time to re-install XP

My setup until yesterday was Bootcamp partion that was running as a VM using VMWare Fusion. The Bootcamp partition was set up as FAT32, because I NTFS came up as readonly when mounted under Mac OS.

Then I started up another VM (Fedora Core 7, although i have since noticed it's not what OS you run, but just a second VM). Maybe it's not enough memory, maybe it's a VM running from bootcamp plus a VM running from a disk image, but while it had worked previously, this time, it locked up my Mac hard. I finally had to hard boot the Mac. When I got back into th VM, I noticed things were broken. Now, I've had to hard boot XP many times and I've never seen this. An indeterminate number of files were corrupted. I noticed one XML file that halfway through turned into binary garbage, so I assume that the other systems failing were suffering from similar corruption. Basically it was hosed, because there was no way to determine what had been corrupted. Time to re-install.

Bootcamp & VMware install

Looking at my post from last time, it was clearly written with the frazzled recollection of a day of trying to make things work, since i once again ran into problems. This time I'm making sure I write the resolution down.

  1. Imaged the old partition, because you always realize that you forgot to back up some vital file.

  2. Removed the old partition and created a new one (using the Bootcamp tool), this time large enough for more than just fallback use (since it's used daily under VMWare). This creates a FAT32 partition.

  3. Inserted my install disk and Bootcamp restarted to fire up the windows install. Important note here, this has to be a real XP install disk. I first tried to use my MSDN DVD with the chooser for picking what OS to install. However, the Mac keyboard doesn't seem to work when you get to the menu. So I used an XP Pro w/ SP2 CD and rebooted. Now the installer ran just fine

  4. Formatted the partition using NTFS. Even if you use FAT32, re-format, don't use the Bootcamp formatted partition. At least for me, using that prepared partition didn't work and created an unbootable image. I know NTFS cannot be written to by MacOS, but it doesn't really matter, since once I boot it as a VM, i can always transfer files via loopback file sharing. I'm going NTFS to get a journaling file system. Theoretically that should prevent the corruption I got last time.

  5. After the install completes, pop in the Mac OS disk and let it install the Bootcamp utilities. This gives you full support for the Macbook Pro hardware.

  6. Activate Windows and reboot into Mac OS X.

  7. Fire up VMWare Fusion. Bootcamp partition should be listed as a VM. Since I previously had a bootcamp partition, I had to go into Library::VMWare Fusion::Application Support::Virtual Machines and remove the old Bootcamp partition folder. Fusion will then do its magic and prep the Bootcamp partition to run as a VMWare Image.

  8. Activate Windows again. That should be the last time you have to do it.

Let's hope this doesn't turn into a bi-monthly process :)

The dangers of deferred execution

I recently wrote about Action & Func, which along with Lambda expression let you do easy inline callbacks like this:

Utility.ActionDownloader.Download(
  Configuration.GetAssetUri(dto.Url),
  (Downloader d) =>
  {
    FloatContainer c = (FloatContainer)XamlReader.Load(d.ResponseText);
    c.Initialize(dto);
  });

i.e. I can call a downloader and inline pass it a bit of code to execute once the download completes. But the catch of course is that looking at the code, and following the usual visual tracing of flow hides the fact that c.Initialize(dto) doesn't get called until some asynchronous time in the future. Now, that's always been a side-effect of delegates, but until they became anonymous and inline, the visual deception of code that looks like it's in the current flow scope but isn't wasn't there.

What happened was that I needed my main routine to execute some code after FloatContainer was initialized, and by habit i created an Initialized event on FloatContainer. Of course this was superfluous, since my lambda expression called the synchronous Initialize, i.e my action could be placed inline after that call to c.Initialize(dto) and be guaranteed to be called after initialization had completed.

This scenario just meant I created some superfluous code. However, I'm sure as I use lambda expression more, there will be more pitfalls of writing code that doesn't consider that its execution time is unknown, as is the state of the objects tied to the scope of the expression.

This last bit about objects tied to the expression scope is especially tricky and I think we will see some help in terms of Immutable concepts weaving their way into C# 3.x or 4.0, as the whole functional aspect of lambda expressions really work best when dealing with objects that cannot change state. Eric Lippert's been laying the groundwork in a number of posts on the subject and while he constantly disclaims that his ponderings are not a roadmap for C#, I am still going to assume that his interest and recognition of the subject of Immutables will have some impact in a future revision of the language. Well, I at least hope it does.

Action & Func: Never write another delegate

With lambda expressions in C#, the Func
generic delegate and it's variations have been getting a lot of attention. So naturally, you might think that the lambda syntax is just a shortcut for creating anonymous delegates, whether they return values or not.

First let's look at the evolution of delegates from 1.1 to now. Delegates, simply are the method equivalent of function pointers. They let you pass a method call as an argument for later execution. The cool thing (and a garbage collection pitfall) is that a delegate creates a lexical closure, i.e. the delegate carries with it the object that the method gets called on. For garbage collection this means that a delegate prevents an object from being collection. That's why it's important to unsubscribe from those events you subscribed to.

But I digress. Let's define a delegate that returns an Integer and a method that matches that delegate:

delegate int IntProducerDelegate();

public int x = 0;
public int IntProducer()
{
  return x++;
}

With the original .NET 1.0 syntax we'd create the delegate like this:

IntProducerDelegate p1 = new IntProducerDelegate(IntProducer);

Now we can call p1() and get an integer back, and since it's closure, each time we call p1() the originating objects x increases as does our return value.

Then, in .Net 2.0 we got anonymous delegates.

IntProducerDelegate p2 = delegate { return IntProducer(); };

// or with IntProducer's action inlined...
IntProducerDelegate p3 = delegate { return x++; };

This got rid of the need to create a method just to pass along a closure that manipulated our object at a later time. The other thing that anonymous delegates re-inforce is that delegates just care about signature. IntProducerDelegate can get assigned any delegate that takes no argument and returns an int. That sounds like a perfect scenario for generics and in .NET 3.5, we got just that, a set of generic delegates called Func. Using Func, we quickly get to our lambda expression replacing the original delegate syntax like this:

// create a new Func delegate just like the IntProducerDelegate
IntProducerDelegate p3 = new Func<int>(IntProducer);

// which means that we don't need IntProducerDelegate at all anymore
Func<int> p4 = delegate { return x++; };

// and the anonymous delegate can also be shorthanded with a lambda expression
Func<int> p5 = () => { return x++; };
// which says, given that we take no argument "()", execute and return the following "return x++;"

However, before there ever was Func, .Net 2.0 introduced the generic delegate Action, which is a natural counterpart to Func, encapsulating a method that does not return anything. Following through the example of the producer, we'll create a consumer like:

delegate void IntConsumerDelegate(int i);

public void IntConsumer(int i)
{
  Console.WriteLine("The number is {0}", i);
}

Now following the same evolution of syntax we get this:

IntConsumerDelegate c1 = new IntConsumerDelegate(IntConsumer);

IntConsumerDelegate c2 = new Action<int>(IntConsumer);

Action<int> c3 = delegate(int i) { Console.WriteLine("The number is {0}", i); };

Action<int> c4 = (i) => { Console.WriteLine("The number is {0}", i); };

So lambda syntax can be used to create either a Func or an Action. And that also means that we never have to explicitly need to create another delegate, being able to use a variation of these two generic delegates as our arsenal for storing lambda expressions of all kinds.

Software Activation vs. Virtualization, Part 3

Part of an ongoing saga.

Rebooted back into VMWare Fusion and yeah, Illustrator Activation was indeed screwed there as well. Office 2007 too, but at least it just let's me reactivate (no doubt noting me as a repeat offender somewhere). So I called Adobe and was told that "it's a sensitive piece of software". No it's not. Illustrator can take any beating you give it.. It's the "anti-piracy" crap that's sensitive. I got an "emergency activation code" to get it going again and was advised to Deactivate before i switch VM setups and then re-activate after the reboot. OMFG. Seriously, just give me USB dongle if you are so sensitive about it. That would be inifintely more convenient.

Dug around the net a bit and it seems that if i fake my mac address to be the same between boot camp and the VM boot, it'll not invalidate my activation. Might try that next. Of course, the same board i found that on also noted that if I just got a crack for my legally purchased product, all troubles would be gone as well. Yes, once again, anti-piracy crap is not stopping pirates but legitimate customers. You'd figure someone might have spotted the pattern here, but may those DRM-colored glasses filter reality a bit too well.

Software Activation vs. Virtualization (and multiple PC ownership)

Just as Virtualization is finally becoming a useful technology, everybody and their uncle has decided that software activation is the new hot way to stop theft. Of course, like all anti-piracy tools, the paying customers get screwed, because the pirates have already patched their copies to not require activation. Bravo! You know i'd prefer friggin USB dongles to this big brother activation business.

I've talked about these problems before, but I've got more fun with the VM vs. bootcamp image activation troubles. I just got Adobe CS3 and for a programmer with occasional Photoshop/Illustrator needs, that's a pretty serious expense. I mean it costs me more than MSDN and gets used a fraction of the time. But I need it. And forget that I have three different computers I use at different times and I really ought to be able to install my purchased software on all of these machines, since I, the owner of the license, will never be using two computers at once. But that's a whole other story.

Back to the re-activation on hardware change business... I've been running Windows under VMware for the last couple of weeks, but for the Illustrator work I need to do right now, it was a bit sluggish. No problem, reboot into Bootcamp! Mind you, this isn't a differnt install of Windows. This is the same physical disk partition, but booted natively vs. via VMware. What happens? Illustrator bitches about activation, as does office, because it saw the hardware change. Let me guess, when i reboot in the virtual machine it'll bitch yet again. Sooner or later it'll just shut me down as a serial offender. Thanks! Way to reward my purchase.

A case for XML

XML gets maligned a lot. It's enterprisey, bloated, overly complex, etc. And the abuses visited upon it, like trying to express flow control or whole DSLs in it or being proposed as some sort of panacea for all interop problems only compound this perception. But as long as you treat it as what it is, data storage, I generally can find little justification to use something else. Not because it's the best, but because it's everywhere.

If you are your own consumer and you want a more efficient data storage, just go binary already. If you're not, then I bet your data consumers are just tickled that they have to add another parser to their repository of data ingestors. Jim Clark probably put it best when he said:

"For the payload format, XML has to be the mainstay, not because it's technically wonderful, but because of the extraordinary breadth of adoption that it has succeeded in achieving. This is where the JSON (or YAML) folks are really missing the point by proudly pointing to the technical advantages of their format: any damn fool could produce a better data format than XML."

Ok, I won't get religious on the subject, but mostly wanted to give a couple of examples, where the abilities and the adoption of XML have been a godsend for me. All this does assume you have a mature XML infrastructure. If you're dealing with XML via SAX or even are doing the parsing and writing by hand, then you are in a world of hurt, I admit. But unless it's a memory constraint there really is no reason to do that. Virtually every language has an XML DOM lib at this point.

I love namespaces

One feature a lot of people usually point to when they decry XML to me is namespaces. They can be tricky, i admit, and a lot of consumers of XML don't handle them right, causing problems. Like Blend puking on namespaces that weren't apparently hardcoded into its parser. But very simply, namespaces let you annotate an existing data format without messing with it.

<somedata droog:meta="some info about somedata">
  <droog:metablock>And a whole block of extra data</droog:metablock>
</somedata>

Here's the scenario. I get data in XML and need to reference metadata for processing further down the pipeline. I could have ingested the XML and then written out my own data format. But that would mean I'd have to also do the reverse if I wanted to pass the data along or return it after some modifications and I have to define yet another data format. By creating my own namespace, I am able to annotate the existing data without affecting the source schema and I can simply strip out my namespace when passing the processed data along to someone else. Every data format should be so versatile.

Transformation, Part 1: Templating

When writing webapps, there are literally dozens of templating engines and there's constantly new ones emerging. I chose to learn XSLT some years back because I liked how Cocoon and AxKit handled web pages. Just create your data in XML and then transform it using XSLT according to the delivery needs. So far, nothing especially unique compared to other templating engines. Except unlike most engines, it didn't rely on some program creating the data and then invoking the templating code. XSLT works with dynamic Apps as easily as with static XML or third party XML without having.

Since those web site roots, I've had need for email templating and data transformation in .NET projects and was able to leverage the same XSLT knowledge. That means I don't have to pick up yet another tool to do a familiar task just a little differently.

What's the file format?

When I first started playing with Xaml, I was taking Live For Speed geometry data and wanted to render it in WPF and Silverlight. Sure, I had to learn the syntax of the geometry constructs, but I didn't have to worry about figuring out the data format. I just used the more than familiar XmlDocument and was able to concentrate on geometry, not file formats.

Transformation, Part 2: Rewriting

Currently I'm working with Xaml again for a Silverlight project. My problem was that I had data visualization in Xaml format (coming out of Illustrator), as well as associated metadata (a database of context data) and I needed to attach the metadata to the geometry, along with behavior. Since the first two are output from other tools I needed a process that could be automated. One way would be to walk the Visual tree once loaded, create a parallel hierarchy of objects containing the metadata and behavior and attach their behavior to the visual tree. But i'd rather have the data do this for itself.

<Canvas x:Name="rolloverContainer_1" Width="100" Height="100">
  <!-- Some geometry data -->
</Canvas>

<!-- becomes -->

<droog:RolloverContainer x:Name="rolloverContainer_1" Width="100" Height="100">
  <!-- Some geometry data -->
</droog:RolloverContainer>

So I created custom controls that subclassed the geometry content containers. I then created a post-processing script that simply loaded the Xaml into the DOM and rewrote the geometry containers as the appropriate custom controls using object naming as an identifying convention. Now the wiring happens automatically at load, courtesy of Silverlight. Again, no special parser required, just using the same XmlDocument class I've used for years.

And finally, Serialization

I use XML serialization for over the wire transfers as well as data and configuration storage. In all cases, it lets me simply define my DTOs and use them as part of my object hierarchy without ever having to worry about persistence. I just save my object graph by serializing it to XML and rebuild the graph by deserializing the stream again.

I admit that this last bit does depend on some language dependent plumbing that's not all that standard. In .NET, it's built in and let's me mark in my objects with attributes. In Java, I use Simple for the same effect. Without this attribute driven mark up, I'd have to walk the DOM and build m objects by hand, which would be painful.

Sure, for data, binary serialization would be cheaper and more compact, but that misses the other benefits I get for free. The data can be ingested and produced by a wide variety of other platforms, I can manually edit it, or easily build tools for editing and generation, without any specialized coding.

For my Silverlight project, I'm currently using JSON as my serialization layer between client and server, since there currently is no XmlSerializer or even XmlDocument in Silverlight 1.1. It, too, was painless to generate and ingest and, admittedly, much more compact. But I then I added this bit to my DTO:

List<IContentContainer> Containers = new List<IContentContainer>();

It serialized just fine, but then on the other end it complained about there not being a no-argument constructor for IContentContainer. Ho Hum. Easily enough worked around for now, but I will be switching back to XML for this once Silverlight 2.0 fleshes out the framework. Worst case, I'll have to build XmlSerializerLitem, or something like that, myself.

All in all, XML has allowed me to do a lot of data related work without having to constantly worry about yet another file format, or parser. It's really not about being the best format, but about it virtually being everywhere and being supported with a mature toolchain across the vast majority of programming environment and that pays a lot of dividents, imho.

Searching a Tree of Objects with Linq

UPDATE: Posted a follow-up here. I've finally had legitimate use for LINQ to Objects, not just to make the syntax cleaner, but also to simplify the underlying code and provide me a lot of flexibility without significant work.

The scenario

I have a tree of objects that have both a type and a name. The name is unique, the Type is not. The interface is this:

public interface INode
{
  int Id { get; }
  string Name { get; set; }
  string Type { get; set; }
  List<INode> Children { get; }
}

I want to be able to find a single named Node in the tree and I want to be able to retrieve a collection of all nodes for a particular type. The searchable interface could be expressed as this:

public interface ISearchableNode : INode
{
  INode FindByName(string name);
  IEnumerable<INode> FindByType(string name);
}

Both require me to walk the tree and examine each node, so clearly I just want to have one walk routine and generically evaluate the node in question. In C# 2.0 parlance, that means I could pass an anonymous delegate into my generic find routine and have it recursively iterate through all the children. I also pass along a resultset to be populated.

The signature for the evaluation delegate looks like this:

delegate bool FindDelegate(INode node);

but since I'm using C# 3.0 (i.e. .NET 3.5) I can use lambda expressions to avoid creating a delegate and simplify my syntax. Instead of FindDelegate, I can simply use Func<INode,bool>:

// Instead of this:
private void Find(INode node, List<INode> resultSet, FindDelegate findDelegate);
// called like this for a Name search:
Find(this, resultSet, delegate(INode node) { return node.Name == name; });

// I can use this:
private void Find(INode node, List<INode> resultSet, Func<INode, bool> f)
// called like this:
Find(this, resultSet, node => node.Name == name);

Thus giving me the following implementation for ISearchableNode:

public INode FindByName(string name)
{
  List<INode> resultSet = new List<INode>();
  Find(this, resultSet, x => x.Name == name);
  return resultSet.FirstOrDefault();
}

public IEnumerable<INode> FindByType(string type)
{
  List<INode> resultSet = new List<INode>();
  Find(this, resultSet, x => x.Type == type);
  return (IEnumerable<INode>)resultSet;
}

private void Find(INode node, List<INode> resultSet, Func<INode, bool> f)
{
  if (f(node))
  {
    resultSet.Add(node);
  }
  foreach (INode child in node.Children)
  {
    Find(child, resultSet, f);
  }
}

Problem solved, move on... Well, except there is significant room for improvement. Here are the two main issues that ought to be resolved:

  1. Syntax is limited to two types of searches and exposing the generic find makes for an ugly syntax. It would be much nicer if queries to the tree could be expressed in LINQ syntax.
  2. It's also inefficient for the Name search, since I'm walking the entire tree, even if the first node matched the criteria.

LINQ to Hierarchical Data

In order to use LINQ to objects, I need to either create a custom query provider or implement IEnumerable. The latter is significantly simpler and could be expressed using the following interface:

public interface IQueryableNode : IEnumerable<INode> { }

Ok, ok, I don't even need an interface, I could just implement IEnumerable... But what does that actually mean? In the simplest sense, I'm iterating over the node's children, however, I also with do descend into the children's children and so on. So a simple foreach won't do. I could just do the same tree walking with a resultset as I did above and return the Enumerator of the resulting list to implement the interface, but C# 2.0 introduced a much more useful way to implement non-linear Iterators, i.e. the yield keyword. Instead of building a list to be interated over, yield let's the iterating code return values as they are found, which means it can be used for recursive iteration. Thus the GetEnumerator is implemented simply as follows

#region IEnumerable<Node> Members
public IEnumerator<INode> GetEnumerator()
{
  yield return this;
  foreach (Node child in Children)
  {
    foreach (Node subchild in child)
    {
      yield return subchild;
    }
  }
}
#endregion

#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
  return this.GetEnumerator();
}
#endregion

Nice and simple and ready for LINQ.

Searching for all Nodes of a type becomes

var allBar = from n in x
                where n.Type == "bar"
                select n;
foreach (Node n in allBar)
{
  // do something with that node
}

and the search for a specifically named node becomes

INode node = (from n in x
              where n.Name == "i"
              select n).FirstOrDefault();

But the real benefit of this approach is that I don't have hard-coded search methods, but can express much more complex queries in a very natural syntax without any extra code on the Node.

Deferred execution

As it turns out, using yield for the recursive iteration also solved the second issue. As yield returns values as it encounters them during iteration, the search doesn't happen until the query is executed. And one of the side effects of LINQ syntax is that creating a query does not execute it until the result set is iterated over. Therefore, FirstOrDefault() actually short-circuits the query as soon as the first match (and in case of Name, it's going to be the only match) is hit.

Threading: Mail.app vs. Thunderbird

I generally prefer message forums over mailing lists for community discussions because of the better separation of topics and implicit threading. Not that forums are ideal for threads, since they generally cannot easily spawn sub-threads.

So, when reading discussions on mailing lists, I try to use the threading mode of the mail client to bring some clarity to the discussion and filter out discussions I don't care about. I am a big Imap proponent and read mail on PC with Thunderbird and Mail.app on Mac. Overall i like Mail.app better but sometimes it does exhibit the Mac app tendency of "if you don't like the way we do things, well, sucks for you", vs. Thunderbird's more liberal configurability.

I'll leave the whole subject of how well clients group messages into threads.... Ok, just one stab at that subject. Subject of "Hey" is not too uncommon. So all messages of "Hey", which are almost guaranteed to be unrelated, get lumped into a thread together. Happens in both readers. I know that threading in mail is ad-hoc, so I am not faulting the clients. It's just a silly artifact.

But here's some behavior that I find not only unintuitive but downright tedious:

On both Mail.app and Thunderbird, i can collapse and expand threads by using left and right arrows. So when I decide that a discussion is not of interest to me, i just collapse the thread and hit delete. On Mail.app, the thread is deleted, as I'd expect. On Thunderbird, the current message in the thread is deleted and the next message becomes the head of the thread. So to delete a thread i have to open the thread, select all messages manually and then delete. Bah!

I've dug around the config and even advanced config, but can't find a way to change this behavior.

Stupid ExtensionMethod tricks

I have yet to decide whether Extension Methods in C# are a boon or bane. I've already several times, been frustrated by Intellisense not showing me a method that was legal somewhere else, until I could figure out what using statement brought that extension method into scope. On one hand Extension Methods can be used to simplify code, on the other, I see them as the source of much confusion as they become popular.

Worse yet, they have potential for more about than the blink tag, imho.

The one place I see extension methods being instrumental is in defining fluent interfaces, yet another practice I have yet to decide whether I am in favor of or not. Partially, because I don't see them as intrinsically easier to read. Partially because they allow for much syntactic abuse.

So today, I created a fluent interface for an operation that I wish was just support in the language in the first place -- the between operator. It exists in some SQL dialects and is a natural part of so many math equations. I wish I could just write:

if( 0.5 < x < 1.0 )
{
  // do something
}

Instead, I'll settle for this:

if( x.Between(0.5).And(1.0) )
{
  // do something
}

The first part is easy, it's just an Extension Method on double. And if I just had it take the lower and upper bound, then we would have been done. But this is where the fluent interface bug bites me and I want to say And. This means, that Between can't return a boolean. It needs to return the result of the lower bound test and the value to be tested. That means that Between returns a helper class, which has one method And, which finally returns the boolean value.

public static class DoubleExtensions
{

  public static BetweenHelper Between(this double v, double lower)
  {
    return new BetweenHelper(v > lower, v);
  }

  public struct BetweenHelper
  {
    public bool passedLower;
    public double v;

    internal BetweenHelper(bool passedLower, double v)
    {
      this.passedLower = passedLower;
      this.v = v;
    }

    public bool And(double upper)
    {
      if (passedLower && v < upper)
      {
        return true;
      }
      else
      {
        return false;
      }
    }
  }
}

That's a lot of code for a simple operation and it's still questionable whether it really improves readability. But it is a common enough operation if you have a lot of bounds checking, that it might be worth throwing into a common code dll. I've yet to make up my mind, I mostly did it because i wanted to play with the syntax.

One week of Windows on Macbook

Spent this past week doing development in XP under VMWare Fusion. I hooked up a windows keyboard and mouse when I'm stationary and when in the Windows Space, there was no way to tell that wasn't on a native machine. I actually had a harder time on the Mac side, since I had to remeber to hit the Windows key to get the normal Apple-Command behavior.

The latest thing I tried, that I seriously didn't expect to work was hook up my HTC Apache. As much trouble as I've had with ActiveSync on my desktop machine, I figured that either the USB connection wouldn't even be seen by Windows or that ActiveSync just bombed. But instead, ActiveSync found the phone and synced everything. Now I even have Standard Time with me on a laptop instead of my old setup of Desktop and phone for time tracking.