Namespaces: Obfuscating Xml for fun and profit
One reason Xml is hated by many is namespaces. While the concept is incredibly useful and powerful, the implementation, imho, is a prime example of over-engineered flexibility: It's so flexible that you can express the same document in a number of radically different ways that are difficult to distinguish with the naked eye. This flexibility then becomes the downfall of many users, as well as simplistic parsers, trying to write XPath
rather than walking the tree looking at localnames.
Making namespaces confusing
Conceptually, it seems very useful to be able to specify a namespace for an element so that documents from different authors can be merged without collision and ambiguity. And if this declaration was a simple unique map from prefix to Uri, it would be a useful system. You see a prefix, you know know it has a namespace that was defined somewhere earlier in the document. Ok, it could also be defined in the same node -- that's confusing already.
But that's not how namespaces work. In order to maximize flexibility, there are a number of aspects to namespacing that can make them ambiguous to the eye. Here are what I consider the biggest culprits in muddying the waters of understanding:
Prefix names are NOT significant
Let's start with a common misconception that sets the stage for most comprehension failures that follow, i.e that the prefix of an element has some unique meaning. The below snippets are identical in meaning:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<b>foo</b>
</xsl:template>
</xsl:stylesheet>
<a:stylesheet version="1.0" xmlns:a="http://www.w3.org/1999/XSL/Transform">
<a:template match="/">
<b>foo</b>
</a:template>
</a:stylesheet>
The prefix is just a short alias for the namespace uri. I chose xsl because there are certain prefixes like xsl, xhtml, dc, etc, that are used consistently with their namespace uri's that a lot of people assume that the name is significant. But it isn't. Someone may give you a document with their favorite prefix and on first look, you'd think the xml is invalid.
Default Namespaces
Paradoxically, default namespaces likely came about to make namespacing easier and encourage their use. If you want your document to not conflict with anything else, it's best to declare a namespace
But that's just tedious. I just want to say "assume that everything in my document is in my namespace":
Beautiful. I love default namespaces!
Ah, but wait, there's more! A default namespace can be declared on any element and governs all its children. Yep, you can override previous defaults and elements at the same hierarchy level could have different namespaces without looking different:
Here it looks like we have a with two child elements b, each with an element c. Except not only is the first b really {ns2}b
and the seconds b {ns3}b
, but even worse, the c elements which have no namespace declaration are also different, i.e. {ns2}c
and {ns3}c
. This smells of someone being clever. It looks like a feature serving readibility when it does exactly the opposite. Use this in larger documents with some more nesting and the only way you can determine whether and what namespace an element belongs to is to use a parser. And that defeats the human readibility property of Xml.
Attributes do not inherit the default namespace
As if default namespaces didn't provide enough obfuscation power, there is a special exception to them and that's attributes:
So you'd think this is equivalent to:
But you'd be wrong. @c isn't @x:c, it's just @c. It's without namespace. The logic goes like this: Namespaces exist to uniquely identify nodes. Since an attribute is already inside a uniquely identifyable container, the element, it doesn't need a namespace. The only way to get a namespace on an attribute is to use an explicit prefix. Which means that if you wanted @c to have be in the namespace {ns1}
, but not force every element to declare the prefix as well, you'd have to write it like this:
Oh yeah, much more readable. Thanks for that exception to the rule.
Namespace prefixes are not unique
That last example is a perfect segway into the last, oh, my god, seriously?, obfuscation of namespacing: You can declare the same namespace multiple times with different prefixes and, even more confusingly you can define the same prefix with different namespaces.
<x:a xmlns:x="ns1">
<x:b xmlns:x="ns2">
<x:c xmlns:x="ns1">you don't say</x:c>
</x:b>
<y:b xmlns:y="ns1">
why would you do this?
</y:b>
</x:a>
Yes, that is legal AND completely incomprehensible. And yes, people aren't likely to do this on purpose, unless they really are sadists. But I've come across equivalent scenarios where multiple documents were merged together without paying attention to existing namespaces. In fairness, trying to understand existing namespaces on merge is a pain, so it might have been purely done in self-defense. This is the equivalent of spaghetti code and it's enabled by needless flexibility in the namespace system.
XPath needs unambiguous names
So far i've only addressed the ambiguity in authoring and in visually parsing namespaced Xml, which has plenty of painpoints just in itself. But now let's try to find something in one of these documents.
<x:a xmlns:x="ns1">
<x:b xmlns:x="ns2">
<x:c xmlns:x="ns1">you don't say</x:c>
</x:b>
<y:b xmlns:y="ns1">
why would you do this?
</y:b>
</x:a>
Let's get the c element with this xpath:
But that doesn't return any results. Why not? The main thing to remember with XPath is that, again, prefixes are NOT signficant. That means, just because you see a prefix used in the document doesn't actually mean that XPath can find it by that name. Again, why not? Indeed. After all, the x prefix is defined, so why can't XPath just use that mapping? Well, remember about this example that depending on where you are in the document, x means something different. XPath doesn't work contextually, it needs unique names to match. Internally, XPath needs to be able to convert the element names into fully qualified names before ever looking at the document. That means what it really wants is a qury like this:
Since namspaces can be used in all sorts of screwy ways to use the same prefixes to mean different things contextually, the prefixes seen in the text representation of the document are useless to XPath. Instead, you need to define manual, unique mappings from prefix to namespace, i.e. you need to provide a unique lookup from prefix to uri. Gee, unique prefix.. Why couldn't the Xml document spec for namespaces have respected that requirement as well.
Namespace peace of mind: Be explicit and unique
The best you can do to keep namespacing nightmares at bay is to follow 2 simple rules for formatting and ingesting Xml:
- Only use default namespacing on the root node
- Keep your prefixes unique (preferably across all documents you touch)
There, done, ambiquity is gone. Now make sure you normalize every Xml document that passes through your hands by these rules and bathe in the light of transparency. It's easier to read, and you can initialize XPath with that global nametable of yours so that your XPath represenation will match your rendered Xml representation.