Once again, this is a progression of building out my Amazon Linux AMI, so the pre-requisites might be off, since I've previously installed a number of other things. And once again, this is simply a log of tasks for my own future reference, rather than a build recipe. Maybe this will be useful to someone else as well, so I've gone back and tagged all AMI articles with aws-linux-ami, so you can at least see the history of pre-requisites.
Next, build the passenger apache2 module. I actually killed the install the first time around because libcurl-devel and openssl-devel were missing. The installer assured me that it would guide me through getting those dependencies resolved, but I wanted to make sure they came in through yum rather than have this installer download and build them from source. Anyway the command was:
This installed flawlessly and ended with instructions to put the following in my apache config:
Finally, the apache vhost config was exceedingly simple:
<VirtualHost \*:80>
ServerName www.yourhost.com
DocumentRoot /somewhere/public # <-- be sure to point to 'public'!
<Directory /somewhere/public>
AllowOverride all # <-- relax Apache security settings
Options -MultiViews # <-- MultiViews must be turned off
The important thing is that the DocumentRoot needs to point to the rails public directory not the root of the rails application.
The last task was running
rake db:create:all
to set up the expected db locally. After that, and an apache restart, the app came up without a hitch.
Of course, while setting all this up, I finally figured out why mod_mono was leaking semaphores, making all of this likely moot. But i'm glad to have this alternative while I determine whether the mod_mono behavior is really fixed.
This isn't yet another .NET developer defecting to Ruby. I have very little interest in making Ruby my primary language. I've done a couple of RoR projects over the years, nothing serious I admit, but I just don't seem to enjoy it in the way that so many of my peers do. That said, RoR does hit a sweetspot for websites. The site I'm porting has very little in terms of business logic -- it's primarily HTML templating with navigation -- so this was an exercise to circumvent my mod_mono issues.
I'm a huge C# fanboy, but having worked with ASP.NET MVC for a while I have to admit that the amount of cruft one has to assemble to stay DRY in ASP.NET templating is just not worthwhile. While views can be strongly typed, it's an exercise in frustration trying to write templates generically. Maybe this becomes easier with dynamic usage in MVC3, but i haven't checked it out. What certainly doesn't help is that the MVC team decided to make TemplateHelper internal, turning the addition of helpers in the vein of .DisplayFor or .EditorFor into a major task that still ends up being a pile of hacks. Now I'm not an ASP.NET MVC expert and there's probably a lot of extension points I just don't know about. But the articles on extending it that I have found are usually pages of code. I shouldn't have to become a framework internals expert just to add some generic templating extensibility.
Ok, enough ranting. ASP.NET MVC is still a huge improvement over webforms, but right now I'm watching Manos de Mono and OWIN to see what develops in .NET land for websites there. The ASP.NET stack, in my opinion, is just too heavy for something that should be simple.
So, why RoR instead of node.js, since I claimed that I was going to get serious about javascript this year? Mostly because this port has a deadline, so use what you know applies, and it's a production site, so use known stable tech applies. Another benefit was that RoR uses the same <% %> syntax as webforms views and MVC was clearly heavily inspired by RoR.
I ported the site over 3 nights, maybe 10 hours of cumulative seat time which feels like time well spent. Strategic search and replace got me 80% there, faking Html. for my custom extension in RoR got me another 10%, leaving only 10% for actual new business logic written in ruby. Once I get to more complex business logic for the site I may stick to Ruby, although I know I'll be sorely tempted to write it as REST services in C# on top of Dream.
For me, static typing serves two purposes: First, compile time checking that arguments to calls are the appropriate type, and second, static discovery of code usage, dependency and wire-up. The former is about correctness and safety, and constitutes the "free unit testing" the compiler provides (which statically-typed detractors deride and statically-typed proponents celebrate). The latter is about productivity. It allows me to quickly navigate through a complex codebase, find dead code, track down unintended coupling, etc. In many ways, the discovery aspects is the greater reason why I like statically-typing. And that's why I consider Convention over Configuration a paradigm that should be approached with caution.
If you are in a dynamically typed language, discovery relies on reading code and documentation. While your code may be expressed as symbols not strings, navigation of those symbols is often impossible until runtime. You have nothing that's guarding you from your own typos, so all code is equally suspect and you are likely to manually or automatically test basic syntax a lot more. With that premise, automatic configuration of instances and wireup doesn't complicate the existing testing burden significantly but greatly cuts down on typing and code size, i.e. it's a good thing.
Now in statically-typed land things are different: Compiler checked syntax can lull you into complacency, especially if you enjoy refactorings that track down on the symbolic rather than search/replace level. Renaming something to fit the context can quickly break mappings or wire-ups relying on convention. Don't get me wrong, config files for mapping or wire-up is no better -- I am advocating mapping and wire-up in code as much as possible. The minutes spent on initial coding are worth hours in maintainability.
I'm not going to discount Convention over Configuration in totallity -- I think it is immensely useful at code boundaries, such as mapping schemas to entities, or Urls to controller actions. These touch points already are on the code edge where represenation is leaving the homogenous domain, so using convention only provides naming guidance and reduces tedious scaffolding.
So, my advice is, don't use Convention over Configuration to wire up your code internals unless the relationships are obvious and/or well covered by tests. But do favor it for the plumbing required to connect your code to outside input and output sources.
I've been using Apache with mod_mono for some ASP.NET MVC2 projects and kept having problems with semaphore arrays being leaked. Under 2.6.7 this even broke xbuild after a while. I then went to 2.8 and 2.8.1, but it didn't stop the leaks. I posted on the mono-devel list and after lack of response simply asked if anyone was actually running ASP.NET under mod_mono, which also elicited no replies. Finally, I posted the problem on stackoverflow, again without any resolution.
The problem manifests itself as a build up of semaphore arrays by the apache process, which is visible via ipcs. When the site is first started the output looks like this:
At some point all ASP.NET pages will return blank. No errors, no nothing, .NET logging reports normal behavior, but no content is sent. And you can restart the mono processes and apache all you want, it won't come back. Sorry.
What does work is to remove all semaphore arrays via ipcrm and restart apache. For the time being i've had a script in cron that did this:
/usr/bin/ipcrm sem $(/usr/bin/ipcs -s | grep apache | awk '{print$2}');
Unfortunately, the leaking semaphores are somehow related to traffic, so eventually i'd either have to increase the frequency of the restart script or make it more intelligent. I opted for neither and decided to try out nginx+fastcgi+mono.
Like my mono 2.8.1 install, I'm doing this on an Amazon Linux AMI 1.0. And like that article, this isn't so much a recipe than a log of my actions. Note that this was done after the 2.8.1 install from source so there might be dependencies i'm not mentioning since they'd already been addressed.
Voila, ASP.NET MVC2 under nginx. This may have other issues, but i have not yet observed them, so this seems to be a way to get around the mod_mono issues.
Of course that's a bit cumbersome. What we really need is an init script so we can start and stop teh fastcgi server like other services:
# chkconfig: - 85 15
# description: Fast CGI mono server
# processname: fastcgi-mono-server2.exe
MONOSERVER_PID=$(ps auxf | grep "${LISTENER}" | grep -v grep | awk '{print $2}')
case "$1" in
if [ -z "${MONOSERVER_PID}" ]; then
echo "starting mono server"
${MONOSERVER} /applications=${WEBAPPS} /socket=${LISTENER} &
echo "mono server started"
echo ${WEBAPPS}
echo "mono server is running"
if [ -n "${MONOSERVER_PID}" ]; then
echo "mono server stopped"
echo "mono server is not running"
exit 0
While this takes care of my ASP.NET troubles, it now means that I'd have to migrate the various php packages over as well. Wordpress is no problem, but OpenCart would be a bit of hacking, which is really the last thing I want to do when it comes to ecom.I thought about running both nginx and apache and using one to proxy the sites on the other (since EC2 won't let me attach multiple IPs to a single host), but decided against that as well, since it would just be a hack of a different color. There's also the option of running fastcgi against apache, but I've not found any docs on how to set up ASP.NET MVC that way, all the existing examples map ASP.NET file extensions to fastcgi, which isn't an option.
Apache is still the most supported solution, so when integrating a number of sites on a single host, it ends up being the best option. It's just that mod_mono doesn't seem to be playing along for me :( So, I hatched a scheme to rid myself of ASP.NET for this site, since it really only has trivial business logic and I have a holiday coming up. More on that later.
Like my last post, "Materializing an Enumerable" this may be a bit academic, but as a linq geek, whether I should use .ToList() or .ToArray() is something the piques my curiosity. Most of the time when I return IEnumerable<T> i want it to be in a threadsafe manner, i.e. i don't want the list to change underneath the iterator, so I return a unique copy. For this I have always used .ToArray(), since it's immutable and I figured it was leaner.
Finally having challenged this assumption, it turns out that .ToList() is theoretically faster for sources that are not ICollection<T>. When the final count is known, as is the case with ICollection<T>, both .ToList() and .ToArray() create an array under the hood for storage that is sufficiently large and copy the source into the destination. When the count isn't known however, both allocate an array and write to it, copying the contents to a larger array anytime the size is exceeded. So far, both are nearly identical in execution. However, once the end of the source is reached, .ToList() is done, while .ToArray() does one more copy to return a properly sized array. Of course, the overhead of iterating on that source, which is more than likely hitting some I/O or Computation barrier, means that in terms of measurable performance difference, again, both are identical.
It is still true that a List<T> object uses more memory than an T[], but even that difference is almost always going to be irellevant as the collections size is insignificant compared to the items it contains. That means that using .ToList() or .ToArray() to create an IEnumerable<T> is really a matter of personal preference.
Yesterday I posted the question "Is there a way to Memorize or Materialize an IEnumerable?" on stackoverflow, hoping that there was already a built in way in the BCL. The answers and comments showed, that there wasn't but also challenged my existing assumptions as well as illustrated that materializing and/or memorizing could be interpreted in a number of ways. I figured that amount of ambiquity required a deeper dive into the subject.
I try to use IEnumerable<T> as the return value for any method that is supposed to return a sequence meant purely for consumption. I choose IEnumerable<T> over an array or list because T[] exposes an unneeded implementation details while returning IList<T> or ICollection<T> allow modification of the sequence which is almost always undesirable behavior. And that doesn't even address that the enumerable might be a stream of items coming from an external source like a database cursor, a file stream or from executing a linq AST.
The drawback of this is that making multiple calls on an IEnumerable<T> that enumerate it under the hood may either incur a large cost, in the case of executing a linq AST repeatedly, or fail, in the case of stream or cursor. In order to be able to do something like the below, you really want to be certain that you have a finite sequence to query:
.Any() has to get an enumerator and call .MoveNext() once to see if it returns true and foreach, of course, gets the enumerator and iterates over it until the end. In order to safely write the above code, you really want the IEnumerable<T> converted into a computed collection.
The usual solution is to just call either .ToList() or .ToArray() and be done with it. But both have undesirable side-effects. Both will always create a new copy of the collection, which may have a non-insignifcant cost. And both change the type from IEnumerable<T>. Sure you can cast it back, but because both are not idempotent, casting to IEnumerable<T> hides the only clue that you don't want to call .ToList()/.ToArray() again. In addition, .ToList() also produces a mutable collection.
Most of the time, none of these side-effects are significant detractors, but should you return the memorized version from a method, you probably would want to cast it back to IEnumerable<T> and then the cost of this behavior can start to add up. Having a method that lets you memorize or materialize in an idempotent fashion would be useful.
What is the expected behavior of .Memorize()? It should capture the current state of sequence at the time of call and return an immutable sequence and it should force that sequence into memory so that multiple enumerations are relatively cheap. This one is fairly simple to implement:
Arrays are already immutable sequences, so we can use them reliably as our memorized collection. And if the source already is an array, we can safely return it unmodified. Now we can pass the resultant enumerable arround without concern that someone else calling .Memorize() again needlessly copies it.
Unlike .Memorize(), .Materialize() does not imply that the enumerable becomes a private, immutable copy. It only wants to make certain that the type can be safely enumerated. This lesser requirement actually complicates the idempotency scenario, requiring a internmediate collection class to be created:
public static class LinqEx {
public static IEnumerable<T> Materialize<T>(this IEnumerable<T> enumerable) {
if(enumerable is MaterializedEnumerable<T> || enumerable.GetType().IsArray) {
return enumerable;
return new MaterializedEnumerable<T>(enumerable);
private class MaterializedEnumerable<T> : IEnumerable<T> {
private readonly ICollection<T> _collection;
public MaterializedEnumerable(IEnumerable<T> enumerable) {
_collection = enumerable as ICollection<T> ?? enumerable.ToArray();
public IEnumerator<T> GetEnumerator() {
return _collection.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
The purpose of MaterializedEnumerable<T> is as marker for a previous materialization that can wrap or coerce a collection, so that no unnecessary copying is done.
A word on the use of .ToArray() instead of .ToList(): I've always leaned towards .ToArray(), both because it creates an immutable collection and because I thought arrays to be more lightweight than lists. After cracking them both open in Reflector, it became apparent that they should be about the same and confirmed that there is no significant difference with some simple tests.
While memorize and materialize have subtly different meaning, both intending to optimize access to an enumerable idempotently, in day to day use simply using .ToArray() will usually be just fine.
A while back I wrote that you really never have to write another delegate again, since any delegate can easily be expressed as an Action or Func. After all what's preferable? This:
var work = worker.ProcessTaskWithUser(delegate(Task t, User u) {
// define the work callback
or this:
var work = worker.ProcessTaskWithUser((t, u) => {
// define the work callback
I know I prefer lambda's over delegates. But this is just on the consuming end. The signature for the above could be either:
Either one can be used with the same lambda, so using the delegate doesn't inconvenience us in consumption. But writing the Func version is certainly more concise so it seems like the winner once again. But In terms of consumption of that API, we've lost the signature of the method which would explain what each parameter is used for. Sure, .Where(Func<T,bool> filter) is pretty self-explanatory, but .WhenDone(Func<T,V,string,T> callback) really doesn't tell us much of anything.
So there seems to be straight forward usability rule of thumb: Use a delegate if the parameter's meaning isn't obvious from the usage of the lambda. But if the goal here is to make it easier for the consumer of the API, unfortunately it's not that simple, since the primary tool for communicating the API's documentation, intellisense, actually makes things worse.
For maximum usability, let's document the the API so it's meaning is discoverable:
/// <summary>
/// The task user delegate is meant to transform a given task into a new one in the context of a user.
/// </summary>
/// <param name="inputTask">The task to transform.</param>
/// <param name="activeUser">The user context to use for the transform.</param>
/// <returns>A new task in the user's context.</returns>
delegate Task TaskUserDelegate(Task inputTask, User activeUser);
/// <summary>
/// Transform all tasks for a set of users.
/// </summary>
/// <param name="processCallback">Callback for transforming each task for a specific user</param>
/// <returns>Sequence of transformed tasks</returns>
IEnumerable<Task> ProcessTaskWithUser(TaskUserDelegate processCallback) {
And this is what it looks like on code completion:
While TaskUserDelegate is well documented, this does not get exposed via intellisense. Worse, this signature tells us nothing about the arguments for our lambda. So, yeah, we created a better documented API, but made it's discovery worse.
/// <summary>
/// Transform all tasks for a set of users.
/// </summary>
/// <param name="processCallback">Callback for transforming each task for a specific user</param>
/// <returns>Sequence of transformed tasks</returns>
IEnumerable<Task> ProcessTaskWithUserx(Func<Task, User, Task> processCallback) {
which gives us this completion:
Now we at least know the exact signature of the lambda we're creating, even if we don't know what the purpose of the arguments is.
In both cases, the best discoverability ends up being plain old textual documentation of the parameter and even though the delegate provides extra documentation possibilities, their access is not convenient that for expediency i'd still have to vote for the Func signature.
The one exception to the rule would be a lambda that is meant as a dependency. I.e. a class or method that has a callback that you attach for later use, rather than immediate dispatch. In that case the lambda really functions as a single method interface and should be treated like any other dependency and be as explicit as possible.
Since last year ended up being an academic exercise in learning javascript, that task is still outstanding. But it is no less important. I consider javascript to be the one language I cannot afford not being good at. It is truly the assembly language of the web and is rapidly gaining ground on the server as well. I may or may not warm up to going back to a dynamically typed language but that's really independent of the need represented by this gap in expertise.
In many ways, javascript provides a lot of the features I am aiming for with promise. It follows the same pattern of methods being lambda's attached to slots on objects. Although syntax-wise, CoffeeScript is an even better match -- getting rid of the overly verbose function prefix for lambda's among many other cool changes.
I still doubt that javascript will become my new favorite language, simply because of my strongly-typed tendencies. But being a C# programmer i'm in kind of a weird space. I don't much care for windows as client and abhor it as a server. So almost everything I do in C# runs under mono. I admire Miguel DeIcaza's relentless drive for creating the best environment possible regardless of detractors and am continually amazed at the quality and completeness of mono. That said being a C# advocate in the linux world is asking for additional pain. Mono will always be trailing MS's implementation by a bit and for your troubles you do end up being a pariah in the linux world. Finding a language that is more at home and accepted on my favorite platform would be beneficial. For a long time, java and C# were just close enough that in the worst case scenario, I could always go java again. But now that i'm used to C#'s lambda syntax and linq, java just feels ancient and dead to me.
Of all the languages I've looked at Scala hits my personal feature bingo the best. I love the actor pattern and built it using Xmpp in C# for notify.me. From my sideline review Akka seems to be the actor implementation to beat, so picking a project and implementing it start to finish in Scala/Akka seems like the way to go. After that I should have enough of a feel for the language to see whether it's a contender for my C# affections
To stick with the common thread, I think that going forward in the mobile space, javascript once again is going to be the most important tool in the development toolkit. But at the same time, I am a sucker for native clients and am happy that the current crop of smartphones have revived writing client side software.
However, the last time I did mobile client programming was WM5, so I have some serious catching up to do. The goal here is to pick a useful app and write it for all three of the above platforms and release it. I'm going to stick somewhat to my comfort zone here by using C#, the default on Windows Phone 7 and enabled by MonoTouch and MonoDroid on the other two. The departure from my comfort zone is venturing back to the client after spending almost all my time on the server and figuring out that the re-use vs. platform specific stories are and what deployment looks like. I've not settled on an app, but most likely it will be a native notify.me client.
Those resolutions should keep me busy enough, especially since they are spare time activities for when I'm not busy working on MindTouch and Dream, extending notify.me or maintaining curdsandwine.com.
I was just readying David Tchepak's inspiring new year's resolution and also re-read his linked article "There is no U in Collective Ownership" and I thought that it sounded very much like an adaptation of Isaac Asimov's First Law of Robotics. So, just for fun, I wondered if all three laws couldn't be adapted as laws for practicing coders.
A Coder may not harm the code base or, through inaction, allow the code base to come to harm.
A Coder must follow the guidance set up for the code base except where such practices would conflict with the First Law.
A Coder must protect their livelihood as long as such protection does not conflict with the First or Second Law.
While this is mostly in jest, I do think this works as a pretty good set of guidelines. The overarching principle here is that code quality must be the highest goal, followed by consistent application of team standards and finally, that not wanting to rock the boat is no excuse for letting quality slip.
Of course the hard and subjective term here is code quality. It's easy to slip into academic excercises of aethestics and call them quality, just as it is easy to label horrible spaghetti code as pragmatism. My personal test for quality is best expressed by the following questions:
It's the beginning of a new year and you know what that means: Public disclosure of measurable goals for the coming year. I've not made a New Year's Resolution post before, which of course means that not living up to them was slightly less embarrassing. Well, this year, I'm going to go on the record with some professional development goals. I consider the purpose of these goals to be exercises of taking myself out of my comfort zone. Simply resolving to do something that is a logical extension of my existing professional development road map is just padding. But before I get into enumerating my resolutions for 2011, let's see how I fared on those undisclosed ones from last year.
By the end of 2009, I had only used git to the minimum extend required to get code from github. I knew i didn't like svn, because I was a big branch advocate and it just sucked at all related tasks, like managing, merging, re-merging, re-branching branches. I had been using perforce for years and considered it the pinnacle of revision control because of its amazing branch handling and excellent UI tooling. I also got poisoned against git early when I watched Linus assigned the sins of svn to all non-distributed version control system in his googletalk on git. I knew this was irrational and eventually i would need to give git a chance to stand on its merits. But the only way that was going to happen was by going git cold turkey and forcing myself to use it until i was comfortable with it. That happened on January 1st, 2010. I imported all my perforce repos into git and made the switch. I also started using git on top of all projects that were in svn that i couldn't change, keeping my own local branches and syncing/merging back into svn periodically. This latter workflow has been amazingly productive and gives me far greater revision granularity, since i constantly commit WIP to my local branches that wouldn't be fit for a shared SVN trunk.
One other aspect about DVCS that had kept me from it was that I consider version control both my work history and my offsite backup. So, I probably still push a lot more than most git folks. Sure, i've only lost work once due to disk failure vs. several times because of ill-considered disk operations or lack of appropriate rollback points, but I also work on a number of machines and religious pushing and pulling lets me move between machines more easily. Basically, I never leave my desk without committing and pushing because I've been randomized by meetings or other occasions that led me home before making sure i had everything pushed for work from home.
The last couple of years I've been doing toy projects in Ruby as an alternative to my daily C# work. But unlike seemingly everyone else, i never found it to be more fun than C#. Maybe it's because i used to be dynamic language guy doing perl and I became a static typing guy by choice. As dynamic languages go, Ruby doesn't really buy me anything over perl, which I'd worked with off and on for the last 15 years. And while the energy of the Ruby community is amazing, too much of that energy seems to be devoted to re-discovering patterns and considering them revolutionary inventions.
Javascript, on the other hand, offered something over C# other than just being a dynamic language. It was a language that could be used efficiently on both client and server. That was compelling (and was the same reason why I liked Silverlight as a developer, although i never considered it viable for use on the web). Up until last year, I used javascript like so many server-side programmers: only in anger. I knew enough to write crappy little validation and interactivity snippets for web pages, but tried to keep all real logic on the server where i was most comfortable. When i did venture into javascript, I'd try to treat it like C# and hated it even more because I perceived it to be a crappy object-oriented language. But even then I understood that what I hated more than anything was the DOM and its inconsistencies and that blaming javascript for those failures was misguided.
So in 2010 I was going to get serious about javascript and but initially went down the frustrating path of trying treat javascript like other OO languages I knew. It wasn't until I watched Douglas Crockford's InfoQ talk "The State and Future of Javascript", that it clicked. What I called object-oriented was really a sub-species called class oriented. If I was to grok and love javascript, i needed to meet it on its own turf.
In the end, 2010 never went beyond lots of reading, research and little toy projects. I should have committed to solving a real problem without flinching. While my understanding of javascript is pretty good now on an academic level, i certainly didn't get serious.
It wasn't as much a new lesson learned as a re-affirmation of an old axiom: Learning something to the extend that you can truly judge its merits and become more than just proficient requires immersion. Casual use just doesn't build up the muscle memory and understanding required to reason in the context of the subject. If you don't immerse yourself, your use of that subject will always be one of translation from your comfort zone into the foreign concept, and like all translations, things are only likely to get lost in the process.