Iloggable

January 3, 2011
in geek
2 min read

The Three Laws of XP

I was just readying David Tchepak's inspiring new year's resolution and also re-read his linked article "There is no U in Collective Ownership" and I thought that it sounded very much like an adaptation of Isaac Asimov's First Law of Robotics. So, just for fun, I wondered if all three laws couldn't be adapted as laws for practicing coders.

A Coder may not harm the code base or, through inaction, allow the code base to come to harm.

A Coder must follow the guidance set up for the code base except where such practices would conflict with the First Law.

A Coder must protect their livelihood as long as such protection does not conflict with the First or Second Law.

While this is mostly in jest, I do think this works as a pretty good set of guidelines. The overarching principle here is that code quality must be the highest goal, followed by consistent application of team standards and finally, that not wanting to rock the boat is no excuse for letting quality slip.

Of course the hard and subjective term here is code quality. It's easy to slip into academic excercises of aethestics and call them quality, just as it is easy to label horrible spaghetti code as pragmatism. My personal test for quality is best expressed by the following questions:

Does it solve the problem at hand?
It it easy to test?
Is it the simplest possible thing that could work?
Does it avoid code duplication?

It just happens that following the Uncle Bob's SOLID patterns is a good way at achieving all these.

January 2, 2011
in geek, javascript
5 min read

Happy New Year, part I

It's the beginning of a new year and you know what that means: Public disclosure of measurable goals for the coming year. I've not made a New Year's Resolution post before, which of course means that not living up to them was slightly less embarrassing. Well, this year, I'm going to go on the record with some professional development goals. I consider the purpose of these goals to be exercises of taking myself out of my comfort zone. Simply resolving to do something that is a logical extension of my existing professional development road map is just padding. But before I get into enumerating my resolutions for 2011, let's see how I fared on those undisclosed ones from last year.

How did i do in 2010?

Make git my defacto version control - success

By the end of 2009, I had only used git to the minimum extend required to get code from github. I knew i didn't like svn, because I was a big branch advocate and it just sucked at all related tasks, like managing, merging, re-merging, re-branching branches. I had been using perforce for years and considered it the pinnacle of revision control because of its amazing branch handling and excellent UI tooling. I also got poisoned against git early when I watched Linus assigned the sins of svn to all non-distributed version control system in his googletalk on git. I knew this was irrational and eventually i would need to give git a chance to stand on its merits. But the only way that was going to happen was by going git cold turkey and forcing myself to use it until i was comfortable with it. That happened on January 1st, 2010. I imported all my perforce repos into git and made the switch. I also started using git on top of all projects that were in svn that i couldn't change, keeping my own local branches and syncing/merging back into svn periodically. This latter workflow has been amazingly productive and gives me far greater revision granularity, since i constantly commit WIP to my local branches that wouldn't be fit for a shared SVN trunk.

One other aspect about DVCS that had kept me from it was that I consider version control both my work history and my offsite backup. So, I probably still push a lot more than most git folks. Sure, i've only lost work once due to disk failure vs. several times because of ill-considered disk operations or lack of appropriate rollback points, but I also work on a number of machines and religious pushing and pulling lets me move between machines more easily. Basically, I never leave my desk without committing and pushing because I've been randomized by meetings or other occasions that led me home before making sure i had everything pushed for work from home.

After a year, I can safely say, i'm not looking back. Git has spoiled me and I even use it for keeping track of CM changes for this and other blogs.

Get serious about javascript -- partial success, at best

The last couple of years I've been doing toy projects in Ruby as an alternative to my daily C# work. But unlike seemingly everyone else, i never found it to be more fun than C#. Maybe it's because i used to be dynamic language guy doing perl and I became a static typing guy by choice. As dynamic languages go, Ruby doesn't really buy me anything over perl, which I'd worked with off and on for the last 15 years. And while the energy of the Ruby community is amazing, too much of that energy seems to be devoted to re-discovering patterns and considering them revolutionary inventions.

Javascript, on the other hand, offered something over C# other than just being a dynamic language. It was a language that could be used efficiently on both client and server. That was compelling (and was the same reason why I liked Silverlight as a developer, although i never considered it viable for use on the web). Up until last year, I used javascript like so many server-side programmers: only in anger. I knew enough to write crappy little validation and interactivity snippets for web pages, but tried to keep all real logic on the server where i was most comfortable. When i did venture into javascript, I'd try to treat it like C# and hated it even more because I perceived it to be a crappy object-oriented language. But even then I understood that what I hated more than anything was the DOM and its inconsistencies and that blaming javascript for those failures was misguided.

So in 2010 I was going to get serious about javascript and but initially went down the frustrating path of trying treat javascript like other OO languages I knew. It wasn't until I watched Douglas Crockford's InfoQ talk "The State and Future of Javascript", that it clicked. What I called object-oriented was really a sub-species called class oriented. If I was to grok and love javascript, i needed to meet it on its own turf.

In the end, 2010 never went beyond lots of reading, research and little toy projects. I should have committed to solving a real problem without flinching. While my understanding of javascript is pretty good now on an academic level, i certainly didn't get serious.

Lessons learned from my resolutions

It wasn't as much a new lesson learned as a re-affirmation of an old axiom: Learning something to the extend that you can truly judge its merits and become more than just proficient requires immersion. Casual use just doesn't build up the muscle memory and understanding required to reason in the context of the subject. If you don't immerse yourself, your use of that subject will always be one of translation from your comfort zone into the foreign concept, and like all translations, things are only likely to get lost in the process.

December 26, 2010
in geek, mono
3 min read

Installing mono 2.8.1 on Amazon Linux AMI 1.0

I generally stay away from anything that's not in rpm form, because playing the configuration/build game is just such a waste of time especially when trying to come up with a repeatable recipe. Unfortunately, there's no rpm's for 2.8.x, 2.6.7 being the latest on the mono-project site.

I don't have a clue about building rpm's from source and have not found a build recipe either, but if one exists, I'd be more than happy to start building and distributing current rpm's for mono, since i don't feel like building from source on every new machine and having that machine's config fall out of the management purview of yum/rpm.

Goal: ASP.NET MVC2 environment

The purpose of build 2.8.1 is running ASP.NET MVC2. In theory this should work under 2.6.7, but there's a couple of bugs that mean it's not quite usable for production purposes. While I develop in Visual Studio, i don't want to commit binaries to my git repo and I want the ability to make fixes with emacs on the development server. To make this work, I also include xbuild as part of the recipe, since it lets me build the Visual Studio solution.

Anyway, here's the chronicle of my install. This is less of an article than a log of what I did to get mono 2.8.1 installed from source on a fresh Amazon Linux AMI 1.0, so it may include things you don't need that were dead-ends i went down. I started by looking at Nathan Bridgewater's Ubuntu & Fedora recipe.

Pre-requisites

yum install gcc gcc-c++ httpd httpd-devel bison gettext autoconf automake libtool libtiff-devel
  glib2 glib2-devel freetype fontconfig fontconfig-devel libpng-devel libexif-devel
  libX11-devel giflib-devel libjpeg-devel

Setup

cd /tmp
mkdir  mono-2.8.1
cd mono 2.8.1

wget http://ftp.novell.com/pub/mono/sources/xsp/xsp-2.8.1.tar.bz2
wget http://ftp.novell.com/pub/mono/sources/mod_mono/mod_mono-2.8.tar.bz2
wget http://ftp.novell.com/pub/mono/sources/mono/mono-2.8.1.tar.bz2
wget http://ftp.novell.com/pub/mono/sources/libgdiplus/libgdiplus-2.8.1.tar.bz2

tar jxf libgdiplus-2.8.1.tar.bz2
tar jxf mod_mono-2.8.tar.bz2
tar jxf mono-2.8.1.tar.bz2
tar jxf xsp-2.8.1.tar.bz2

Building mono 2.8.1

cd libgdiplus-2.8.1
./configure --prefix=/opt/mono-2.8.1 (30s)
make
make install
cd ..

cd mono-2.8.1
./configure --prefix=/opt/mono-2.8.1 (2m)
make
make install
cd ..

In order to build xsp, we need to configure the environment to find mono:

export PATH=$PATH:/opt/mono-2.8.1/bin
export PKG_CONFIG_PATH=/opt/mono-2.8.1/lib/pkgconfig/:$PKG_CONFIG_PATH

The above export is just affects the current session. But also needs to be set up as part of the profile:

cat >>/etc/profile.d/mono_path.sh
export PATH=$PATH:/opt/mono-2.8.1/bin
export PKG_CONFIG_PATH=/opt/mono-2.8.1/lib/pkgconfig/:$PKG_CONFIG_PATH

Now we can build xsp and finally mod_mono:

cd xsp-2.8.1
./configure --prefix=/opt/mono-2.8.1
make
make install
cd ..

cd mod_mono-2.8
./configure --prefix=/opt/mono-2.8.1
make
make install

That took a while

I did this build on an Amazon EC2 micro instance, and it took 3 hours. So might be worth attaching the EBS to a EC2 High CPU instance building it there and then re-attaching it to a micro. But you can see why I prefer rpm's. Sure i can zip up the build directories and run make install on another identical machine, but the make install on the micro instance for mono took 20 minutes. Compare that to the 2.6.7 rpm install, which took me 42 seconds including downloading via yum from novell.

December 25, 2010
in geek
3 min read

Backing up wordpress with git

To back up my wordpress install, i used to do a mysqldump of the DB, bzip it, separately tar/bzip the install directory and rsync the timestamped files to my backup server. This has worked well and was a simple enough setup. A couple of days ago, however, i decided to update wordpress and all extensions and noticed that some of my manual tweaks were gone.

Now, i have to say that the install/upgrade system of wordpress is one of the simplest and troublefree i've used in both commercial and free software -- this is not a complaint about that system. Clearly I manually tweaked files, which means i inherited the responsibility of migrating those tweaks forward. But it did get me thinking about a different upgrade strategy. But even though I have the backups to determine the changes i had to re-apply, the process was annoying. Untar the backup and manually run diffs between live and backup. The task reminded me too much of how I deal with code changes, so why wasn't I treating this like code?

A blog is (mostly) an append-only system

Sure, there's some editing, but most of it is already revisioned, which makes it append only. That means storage of all wordpress assets, including the DB dump should make it an ideal candidate for revision control. Most of the time, the only changes are additions, and if files are actually changed it does represent an update of an existing system and should be tracked with change history. If the live install was under revision control you could just run the upgrade, do a diff to see the local changes and tweak/revert them one at a time, the commit the finished state.

Setting up git and importing old revisions

My hierarchy looks like this:

 .../iloggable/
              /wordpress
              /backup

Inside backup, I kept the tar'ed up copies of the wordpress directory with date suffix, as well as dated, bzipped mysqldumps of the DB. I generally deleted older backups after rsync, since on the live server i only cared about having the most recent backup.

First thing i did was create a mirror of this hierarchy. I used an old wordpress tar for the wordpress directory and bunzipped the db archive from the same date as iloggable.sql as the only file in backup, since for the git repo i no longer needed the other backups, only the most current database dump. I then ran git init inside of iloggable. I also created .gitignore with wordpress/wp-content/cache in it to avoid capturing the cache data.

I added all files and committed this state. I then unarchived the subsequent archives, copied them on top of the mirror hierarchy and added/committed those files in succession. At this point i had created a single git repo of the backup history I had. Now i could review previous states via git log and git diff.

Turning the live copy into the git repo

Finally, i copied the .git directory into the live hierarchy. Immediately, doing git status, it showed me the changes on production since the last backup. I deleted all the old and again added/committed the resulting "changes". That gave me a git repo of the current state, that I pushed to my back-up server.

Now, my backup script just overwrites the current mysqldump, does a git commit -a -m'$timestamp' and git push. From now on, as i do tweaks, or upgrade wordpress, i can do a git commit before and after and I have an exact change history for the change.

October 4, 2010
in geek, promise
1 min read

But what about your Promise?

See what i did there? Err... yeah, i know it's horrible. I apologize.

I did want to post an update about Promise since I've gone radio-silent since I finished up my series about features and syntax. I've started a deep dive into the DLR, but mostly got sidetracked with learning antlr since I don't want to build the AST by hand for testing. However, coming up with a grammar for Promise is the real stumbling block. So that's where i'm currently at, spending a couple of hours here and there playing with antlr and the grammar.

In the meantime, I've been doing more ruby coding over the last couple of weeks and even dove back into perl for some stuff and the one thing I am more sure is that I find dynamically code incredibly tedious to work with. The lack of static analysis and even simple contracts turns dynamic programming into a task of memorization, terse syntax and text search/replace. I'm not saying that the static type systems of yore are not equally painful with the hoops you have to jump through to explain contracts with any flexibility to the compiler, but given the choice I tend towads that end. This experience just solidifies my belief that a type system like Promise, i.e. types describe contracts, but classes aren't typed other than by their ability to quack appropriately, would be so much more pleasant.

September 22, 2010
in net, mindtouch
2 min read

I need a pattern for extending a class by an unknown other

I'm currently splitting up some MindTouch Dream functionality into client and server assemblies and running into a consistent pattern where certain classes have augmented behavior when running in the server context instead of the client context. Since the client assembly is used by the server assembly, and these classes are part of the client assembly, they do not even know about the concept of the server context. Which leads me to the dilemma, how do I inject different behavior into these classes when they run under the server context?

Ok, the short answer to this is probably "you're doing it wrong", but let's just go with it and accept that this is what's happening. Let's also assume that the server context can be discovered by static means (i.e. a static singleton accessor) which also lives in the server assembly.

Here's what I've come up with: Extract the desired functionality into an interface and chain implementations together. Each class that needs this facility would have to create it's own interface and default implementation and looks something like this:

public interface IFooHandler {
  ushort Priority { get; }
  bool TryFoo(string input, out string output);
}

TryFoo gives the handler implementation a chance to look at the inputs and decide whether to handle it, or whether to pass on it. The usage of collection of handlers takes the following form:

public string Foo(string input) {
  string output = null;
  _handlers.Where(x => x.TryFoo(input, out output)).First();
  return output;
}

This assumes that _handlers is sorted by priority. It return the result of the first handler report true on invocation. Building up the _handlers happens in the static constructor:

static Bar() {
  _handlers = typeof(IUriTranslator)
    .DiscoverImplementors()
    .Instantiate()
    .Cast<IUriTranslator>()
    .OrderBy(x => x.Priority)
    .ToArray(); 
}

where DiscoverImplementors and Instantiate are extension methods i'll leave as an exercise to the reader.

Now the server assembly simply creates its implementation of IFooHandler, gives it a higher priority and on invocation checks its static accessor to see if it's running in the server context and if not, lets the chain fall through to the next (likely the default) implementation.

This works just fine. I don't really like the static discovery of handlers, and if it weren't for backwards compatibility, I'd move the injection of handlers into a factory class and leave all that wire-up to IoC. Since that's not an option, I think this is the most elegant and performant solution.

It still feels clunky, tho. Anyone have a better solutions for changing the behavior of a method in an existing class that doesn't required the change to be in a subclass (since the instance will be of the base type)? Is there a pattern i'm overlooking?

September 18, 2010
in geek, rant
4 min read

Oh, you can keep the disc, but the bits ain't yours

After writing "Maybe it's time to stop pretending we buy software?", I talked to a couple of gamer friends about their purchasing habits. The concept of DLC that has a strong "withheld content" smell came up and whether this was a "first buyer bonus" or "selling crippled products" had no straight forward and agreed answer. But what did emerge is that pricing of games is a key factor in why used sale and purchase are considered a right. The sentiment that at ~$60 there is an expectation that the game has a residual value you can recoup should the game not justify itself as a keeper. Which, of course, itself is an indicator that our usage patterns of games are much closer aligned with a rental than purchase relationship. In particular, one friend uses Amazon's Trade-In store as a form of game rental. Depending on how many games you play, it is a better deal than Gamefly.

Now it turns out that arguing about used games and whether they are crippled or not may not even be an issue in the future. Ars Technica did a great summary called "No, you don't own it: Court upholds EULAs, threatens digital resale" of the US Court of Appeals for the Ninth Circuit ruling re: Vernor v. Autodesk. The gist is that EULAs are enforceable and that you may really only own a non-transferable license. In the light of keeping your upgrade while selling the old version, that makes sense to me. Of course fairness should dictate that you can sell your license. Then again fairness should also dictate that you don't make copies of the software for all your friends. So, given the unenforceability of fairness, software companies use draconian licensing EULAs and consumers have chosen to ignore them out of hand. This legal decision, however has the possibilities of escalating this conflict and if companies go after used game stores, used DVD stores, etc. I predict that piracy will really run rampant, as consumers will take the only option available to them in fighting rules that violate their sense of fairness.

Physical products engender First Sale Doctrine expectations

I personally have not bought a new Xbox game, relying on my amazon wishlist for those. Of all the games I've played on the Xbox, only GTA4 has felt justified of its full price. The ones I have bought were used and the ones that had no replay value I sold. After all, I had a box with a disc sitting there, so of course I can sell that box.

I have, however, bought plenty of games on Steam. It's a digital sale -- I can install it on any computer when i want to play it but I can't ever sell it or even let someone borrow it. Yet I am happy about those purchases. Ok, what is wrong with me? The difference to me, if were to try to put a finger on it, lies in a combination of pricing and purchasing experience.

New PC games are ususally at least $10 cheaper. Whether you claim that this price difference is historical or because console's are recouping hardware costs, it makes a new game easier to digest. Add to that that Steam has mastered the art of the long tail, reducing prices for older games, frequently doing brief yet radical sales and even adding games not originally released on Steam along with patches and support for newer features such as cloud save game storage. Finally, with Steam (even if this is more Valve itself than anyone else), you usually get a longer life out of the game, with updates and free multiplayer.

The purchasing itself further severs the physical ownership bond you have with boxed games. Aside from a less painful price point, it's simple, immediate gratification, being able to buy a game at any time of day or night. You also generally don't run an installer, you just buy and wait until Steam tells you that the game is ready to play. In all respects the experience feels like a service not a product, which reduces the feeling that you own something that you can resell.

As a game dev, what seems like a better way to deal with the fact that only some percentage will buy your game at full price? Only make money at full sale and try to encourage new purchase by devaluing used games with exclusive DLC, etc? Or sell electronically and cut out used games entirely, but attract those not willing to pay full price by offering sales later? After all, 9 months after release $19.99 (like Left 4 Dead right now) still beats not seeing a dime.

While the Vernor v. Autodesk decision may embolden publishers to crack down on used sales, I sure hope more will follow Valve's model. After all, gamers generally don't talk fondly of publishers, but Valve is almost uniformly a hero to the community and that's while preventing gamers from selling their games. Sounds like they've got a good model going.

September 11, 2010
in geek, rant
6 min read

People have no business driving on the highway

I'm going to go rather deeply off-topic and venture into tl;dr territory: Every time I drive through LA or am on the long 4-lane interstate corridors of Barstow-Las Vegas or the central valley, my mind spends a lot of time contemplating how highway driving is such an inefficient process. It's a perfect scenario of continuous lanes with defined entry/exit/merge points. You get on at one point and off at some other point. The whole having to drive the vehicle between the two points is not only a misapplication of resources but human nature seems to ensure that it'll always be slower than it has to be.

Why autonomous highway vehicles won't happen (anytime soon)

Before I make the case why and how autonomous highway travel could happen, let's just get the naysaying out of the way. Won't may be strong, but since my objections are based on people, not on technology, I can't forsee this change to happen in any near term. Long after the technological hurdles are crossed, the psychological ones, fear and self-determination, are likely to linger.

Fear of yielding control to machines is as old as machines. We're deeply suspicious of anything that wants to take over a task we believe requires our own skillset. Even if repeatedly proven wrong, we believe that a human being can do things better than any machine. Disregarding the thousands of people that die in car accidents due to their own failings (exceeding their or their cars reaction capabilities, driving impaired, etc.), we are willing to accept those deaths as the cost of driving. But should a single person die because of a computer malfunction, all computer controlled cars should immediately be suspended. We only have to look at the recent, false, accusation that Prius' we're running amok because of a faulty on-board computer and the public outcry as proof.

And even if we trusted cars to be better drivers, we still would not yield control because we want to be the ones that decide what to do next. This is more true in car cultures like the US, but the need for self-determination means that we want to be able to drive where and how we want at all times (ignoring that we already have agreed to a meriad of rules of the road). Maybe we want to cross three lanes to get off at that exit. Or we want to weave through traffic. After all, our superior cognitive skills will gets us there faster than flowing with the computer controlled pack, right?

How could it work?

There's just too much variability and unpredictability involved in driving for computers to take over. Well, not so fast. On surface streets that's certainly true. There are so many unexpected factors that require making decisions based not on hard rules, such as bikes, pedestrians, ambigious signage, bad directions, etc. that will keep daily driving out of the reach of autonmous vehicles reach for a while. But highways are different. 99% of all unexpected decision making on highways is due to humans driving in the first place. If you didn't have to react to the unpredictable cars around you, it's a simple set of rules: There's lanes, there's entrance and exit points, there's lane merges and splits and with communication at lightspeed, reacting to conditions created by another car would be far more reliable than the visual detection and reaction of a driver.

So let's say highways are a controlled environment that can be managed by today's technology, how would something like this come to pass, especially since we can't just set up a new, separate highway system and can't turn it on over night.

Autonomous vehicles

One fear and realistic obstacle in computer controlled cars is the central control of all traffic, that even with redundancy is seen as a single point of failure. Also extending trust in computers to trusting some massive government controlled computer is a special leap that's spawned a hundred dystopian sci-fi stories. For this system to have a chance, each car needs to be in control of itself. People will trust their own cars before they trust an outside entity.

You would pull onto the entrance ramp, determine where you want to get off and the car would take over, merge into the traffic flow and on exit at your destination, the car would hand control back over or stop if it sensed that you weren't acknowledging transfer of control. I'll cover how this is possible next, but the important concept is that it's really just an auto-pilot for your car.

Recognition of the static environment

In order for your car to work on auto-pilot, it needs to have a way to recognize entrances, exits, lanes, etc. This could be done with a combination of GPS markers and RFID. GPS for the layout of major features, such as interchanges, entrances and exits and RFID to determine boundaries, etc. This static environment can be built out and expanded one highway at a time and the combination of GPS and RFID means that there is a general expectation with a local verification of that expectation, i.e. a physical safe-guard to override outdated data.

Recognition of the dynamic environment

Just as important as recognizing the lanes is recognizing cars and other obstacles. By using RFID, radar and/or visual recognition and WIFI networking, cars would be able to detect surrounding cars as well as communicate speed changes and negotiate merges. This communication would also allow the forwarding of conditions far ahead without requiring a central traffic computer. It's basically peer-to-peer traffic control. Since the computers would lack the ego of drivers, merges would not require sudden stops that ripple for miles behind and cars could drive faster and closer while still being safer.

The awareness of all other autonomous vehicles and the propagation of information also allows the detection and avoidance of out-of-system obstacles, such as physical objects, cars with malfunctioning systems or rogue drivers who are controlling their cars manually. Once one of these conditions is detected, it might trigger manual control for everyone, which would just return us to the crappy situation we already have, but it still wouldn't be sudden since traffic ahead would warn our cars long before we'd encounter it.

Oh, right, then there's reality

All the technology to bring this about exists today. Mapping our highways for GPS is already done. Implanting RFID markers is no more complicated than existing highway maintenance. Converting the fleet will take a while, but we could easily start with HOV lanes as autonomous lanes and add more lanes until the entire highway and fleet is converted. Sorry, classic cars, you will be relegated to surface streets or require transport. But considering your polluting nature, that's a good thing.

But let's say the government did decide to undertake this, the implementation reality would be lobbying by large government contractors to create their proprietary systems, attach patents to the tech and create inferior technology (just look at voting machines). They'd create unreliable crap that would erode any trust in autonomous vehicles that people could muster. Maybe the government would require some standard but the development of a standard would be a pissing match between car conglomerates that ends up with something as useless as Cablecard and still lock out any innovative application. Finally, the hunger for data would mean that all this peer-to-peer communication and travel data would be an irresistible analytics goldmine for upselling car, travel, etc. products and services, turning the autonomous system into some kind of giant big brother of movement. Of course, considering present consumer behavior, the big brother scenario would probably not act as an obstacle.

I guess I'm going to continue to be stuck behind the guy in the left lane whose speed is the righteous amount over the limit and who only accelerates when his ego is threatened by me passing him on the right. And i'll continue to have to hit the brakes or react to someone else having to hit their brakes because someone decided that their lane change was of higher priority than the flow of the remaining traffic. All of which is completely uneccessary and counter-productive to everyone on the road and highway travel could be as simple as treating your car as your personal travel compartment in a massive compartment routing system. Well, a geek can dream.

August 30, 2010
in net, geek, mono
7 min read

libBeanstalk.NET, a Beanstalkd client for .NET/mono

A couple of years back I wrote a store-and-forward message queue called simpleMQ for vmix. A year later, Vmix was kind enough to let me open source it and I put it up on sourceforge (cause that was the place back in the day). But it never got any documentation or promotion love from me, being much too busy building notify.me and using simpleMQ as its messaging backbone. Over those last couple of years, simpleMQ was served us incredibly well at notify.me, passing what must be billions of messages by now. But it does have warts and I have a backlog of fixes/features i've been meaning to implement.

Beanstalkd: simple & fast workqueue

Rather than invest more time on simpleMQ, i've been watching other message/work queues to see whether there is another product i could use instead. I've yet to find a product that i truly like, but Beanstalkd is my favorite of the bunch. Very simple, fast and with optional persistence, it addresses most of my needs.

Beanstalkd's protocol is inspired by memcached. It uses a persistent TCP connection, but relies on connection state only to determine which "tube" (read: workqueue) you are using and to act as a work timeout safeguard. The protocol is ASCII verbs with binary payloads and uses yaml for structured responses.

Tubes are created on demand and destroyed once empty. By default beanstalkd is in-memory only, but can use a binary log to store items and recover the in-memory state by log playback. I had briefly looked at zeromq, but after finding out that its speed relies on no persistence, I decided to give it a skip. zeromq might be web scale, but i prefer a queue that doesn't degrade to behaving like /dev/null :) Maybe my transactional RDBMS roots are showing, but I have a soft spot for at least journaling data to disk.

One concept of beanstalkd that i'm still conflicted about is that work is given a processing time-out (time-to-run) by the producer of the work, rather than having the consumer of the work declare its intended responsiveness. Since the producer doesn't know how soon the work gets picked up, i.e. time-to-run is not a measure of work item staleness, I don't see a great benefit for having the producer dictate the work terms.

The other aspect of work distribution beanstalkd lacks for my taste is the idea of being able to produce work in one instance and have it forwarded to another instance when that instance is available, i.e. store-and-forward queues. I like to keep my queues on the current host so i can produce work without having to rely on the uptime of the consumer or some central facility. However, store-and-forward is an implementation detail I can easily fake with a daemon on each machine that acts as a local consumer and distributor of work items, so it's not something i hold against beanstalkd.

libBeanstalk.NET

Notify.me being a mix of perl and C#, i needed a .NET client. A protocol complete one not existing and given the simplicity of the Beanstalkd protocol, I opted to write my own and have released it under Apache 2.0 on github.

I've not put DLLs up for downlad, since the API is still somewhat in flux as I continue to add features, but the current release supports the entire 1.3 protocol. By default it considers all payloads as binary streams, but I've included extension methods to handle simple string payloads:

  // connect to beanstalkd
  using(var client = new BeanstalkClient(host, port)) {

    // put some data
    var put = client.Put("foo");

    // reserve data from queue
    var reserve = client.Reserve();
    Console.Writeline("data: {0}", reserve.Data);

    // delete reserved data from queue
    client.Delete(reserve.JobId);
  }

The binary surface is just as simple:

  // connect to beanstalkd
  using(var client = new BeanstalkClient(host, port)) {

    // put some data
    var put = client.Put(100, 0, 120, data, data.Length);

    // reserve data from queue
    var reserve = client.Reserve();

    // delete reserved data from queue
    client.Delete(reserve.JobId);
  }

I've tried to keep the interface IBeanstalkClient to be a close as possible to the protocol verb signatures and rely on extension methods to create simpler versions on top of that interface. To facilitate extensions that provide smart defaults, the client also has an instance of a Defaults member that can be used to initialize those values.

The main deviation from the protocol is how I handle producer and consumer tubes. Rather than have a separate getter and setter for the tube that put will enter work into, I simply have a settable property CurrentTube. And rather than surfacing watch, ignore and listing of consumer tubes, the client includes a special collection, WatchedTubes, with the following interface:

interface IWatchedTubeCollection : IEnumerable<string> {
    int Count { get; }
    void Add(string tube);
    bool Remove(string tube);
    bool Contains(string tube);
    void CopyTo(string[] array, int arrayIndex);
    void Refresh();
}

I was originally going to use ICollection<string>, but Clear() did not make sense and I wanted to have a manual method to reload the list from the server, which is exposed via Refresh(). Under the hood, watched tubes is a hashset, so adding the same tube multiple times has no effect, neither is order of tubes in the collection guaranteed.

Future work

The client is functional and can do everything that Beanstalkd offers, but it's really just a wire protocol, akin to dealing with files as stream. To make this a useful API, it really needs to take the 90% use cases and remove any friction and repetition they would encounter.

Connection pooling

BeanstalkClient isn't, nor is it meant to be, thread-safe. It assumes you create a client when you need it and govern access to it, rather than sharing a single instance. This was motivated by Beanstalkd's behavior of storing tube state as part of the connection. Given that I encourage clients to be created on the fly to enqueue work, it makes sense that under the hood clients should use a connection pool both to re-use existing connections rather than constantly open and close sockets and to limit the maximum sockets a single process tries to open to Beanstalkd. Pooling wouldn't mean sharing of a connection by clients, but handing off connections to new clients and putting them in a pool to be closed on idle timeout once the client is disposed.

Most of this work is complete and on a branch, but i want to put it through some more testing before merging it back to master, especially since it will introduce client API changes.

Distributed servers

The Beanstalkd FAQ has this to say about distribution:

Does beanstalk inherently support distributed servers?

Yes, although this is handled by the clients, just as with memcached. The beanstalkd server doesn't know anything about other beanstalkd instances that are running.

I need to take a look at the clients that do implement this and determine what that means for me. I.e. do they use some kind of consistent hashing to determine which node to use for a particular tube, etc. But I do want to have parity with other clients on this.

POCO Producers and Consumers

For me, the 90% use case for a work queue is produce work on some threads/processes/machines and consume that work on a number of workers. Generally that item will have some structured fields describing the work to be done and producers and consumers will use designated tubes for specific types of work. These use cases imply that producers and consumers are separate user stories, that they are tied to specific tubes and deal with structured data. My current plan is to address these user stories with two new interfaces that will look similar to these:

public interface IBeanstalkProducer<T> {
  BeanstalkProducerDefaults Defaults { get; }
  PutResponse Put(T);
}

public interface IBeanstalkConsumer<T> {
  BeanstalkConsumerDefaults Defaults { get; }
  Job<T> Reserve();
  bool Delete(Job<T> job);
  Release(Job<T> job);
}

The idea with each is that it's tied to a tube (or tubes for the consumer) at construction time and that the implementation will have a simple way of associating a serializer for the entity T (will provide protobuf and MetSys.Little support out of the box).

Rx support via `IObservable<Job<T>>`

Once there is the concept of a Job<T>, it makes sense that reservation of jobs should be exposed as a stream of work that can be processed via link. Although since items should only be reserved when the subscriber accepts the work, it should probably be encapsulated in something like this:

public interface Event<T> {
  Job<T> Take();
  Job<T> Job { get; }
}

This way, multiple subscribers can try to reserve an item and items not reserved by anyone are released automatically.

As I work on the future work items, I will also use the library in production so i can get better educated about the real world behavior of Beanstalkd and what uncovered scenarios the client runs into. There is ok test coverage over the provided behavior but I certainly want to increase that signficantly as i keep working on it.

For the time being, I hope the library proves useful to other .NET developers and would love to get feedback, contributions and issues you may encounter.

August 28, 2010
in geek, rant
4 min read

Maybe it's time to stop pretending we buy software?

There's been a lot of noise about comments made by THQ's Cory Ledesma about used games. Namely,

"I don't think we really care whether used game buyers are upset because new game buyers get everything. So if used game buyers are upset they don't get the online feature set I don't really have much sympathy for them." -cvg

Well, this has gotten a lot of gamers upset, and my immediate reaction was something like "dude, you are just pissing off your customers." And while Cory may have been the one to say it out loud, actions by EA and others in providing free DLC only to the original buyer and similar original buyer incentives show that the industry in general agrees with his sentiments.

Holding steadfast to my first-sale doctrine rights, I, like most gamers, software and media purchasers, strongly believe that we can sell those bits we bought. Of course, EULAs have said nu-uh to that belief for just as long. We purchasers of bits only own a license to those bits, we don't own a product. But just as nobody reads an EULA, everybody believes those EULAs to be unenforcable. I own those bits, man!

So I continue to believe that when I purchase a product, let's say some bits on a DVD, i can sell it again or buy such a product from someone else. It wasn't until I read Penny Arcade earlier this week, that I had to admit that, first-sale doctrine notwithstanding, I am not their customer.

Penny Arcade - Words And Their Meanings

But, I thought, just like buying CDs used, I am actually contributing to a secondary market that promotes the brand of the artist. Buying that old CD used makes it more likely that I will buy the next one new, or that I will go to their show when they come to town, etc. Put aside whether this secondary market really has the magical future revenue effects i ascribe to it, for games there is no such secondary market. As Tycho said in his post accompanying the strip:

"If I am purchasing games in order to reward their creators, and to ensure that more of these ingenious contraptions are produced, I honestly can't figure out how buying a used game was any better than piracy. From the the perspective of a developer, they are almost certainly synonymous." - tycho, penny arcade

Ok, maybe you think the secondary market is sequels that you will buy new because you bought the original used. Never mind that most sequels are farmed out to another development house by the publisher, buying used games, at best, actively encourages the endless milking of sequels rather than new IP. But it's even worse for games, because virtually all games now include some multi-player component and keeping that running costs real money. You paying for Xbox Live doesn't mean the publisher isn't still paying more cash to Microsoft to run those servers. So every used Modern Warfare player costs the publisher money while only Gamestop made any cash on the sale. So, sure, you own that disk, but you're insane if you think that the developer/publisher owes you anything.

Now, let's extend this to the rest of the software market. Here you can argue a bit more for a secondary market, since software regularily comes out with new versions, encouraging you to pugrade. If you look at that boxed software revenue cycle it becomes clear that the added features and version revving just exist to extend a product into a recurring revenue stream. And if that's the motivation, it also means we're encouraging developers to spend less on quality and bug fixes (because nobody wants to pay for those), and more on bells and whistles, cause those justify the version rev and with it the upgrade price. In reality, if you use Photoshop professionally you've long ago stopped being a purchaser of boxed software and are instead a subscriber to the upgrade path.

This fickle revenue stream also has an effect on pricing. You may only use Powerpoint once in a while, but you paid to use it 24/7. Or maybe because you don't use it enough you've rationalized pirating it, which only serves to justify a high price tag, since the paying customers are subsidizing the pirates. Either way, the developer inflates the price to smooth out the revenue stream.

The sooner we stop pretending that we buy software and just admit that really we just want to rent it, the better. Being addicted to high retail prices, some publishers certainly will try to keep the same pricing as they move to the cloud, but the smart ones will adjust their pricing to attract those buyers who would never have bought the boxed version. Buying metered or by subscription has the potential for concentrating on excellence rather than bloat and the responsiveness and frequent updates of existing services seem to bear that promise out already. It's really in our favor to let go of idea of wanting a boxed product with a resale value.