Fixing leaking semaphores with mod_mono
After porting my mod_mono ASP.NET MVC application to Ruby and Rails and setting up Phusion Passenger up to run the application under mono, I finally figured out how to fix the leaking semaphore issue. The real title of this post should probably be "PEBKAC or Don't assume errors are unrelated, you idiot".
Recap of the problem
The problem manifests itself as a build up of semaphore arrays by the apache process, which is visible via ipcs
. When the site is first started the output looks like this:
[root@host ~]# ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x01014009 1671168 root 600 52828 48
0x0101400a 1703937 root 600 52828 25
0x0101400c 1736706 root 600 52828 35
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 10616832 apache 600 1
0x00000000 10649601 apache 600 1
0x00000000 10682370 apache 600 1
0x00000000 10715139 apache 600 1
------ Message Queues --------
key msqid owner perms used-bytes messages
Eventually it'll look like this:
[root@host ~]# ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x01014009 1671168 root 600 52828 48
0x0101400a 1703937 root 600 52828 25
0x0101400c 1736706 root 600 52828 35
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 10616832 apache 600 1
0x00000000 10649601 apache 600 1
0x00000000 10682370 apache 600 1
_...
lots more
..._
0x00000000 11141158 apache 600 1
0x00000000 11173927 apache 600 1
0x00000000 11206696 apache 600 1
------ Message Queues --------
key msqid owner perms used-bytes messages
At some point all ASP.NET pages will return blank pages. No errors, no nothing, .NET logging reports normal behavior, but no content is sent. And you can restart the mono processes and apache all you want, it won't come back. Sorry.
What's really wrong
Since day one i'd been receiving warnings at apache startup up, but since i didn't understand what they meant and things seemed to be working, i had been ignoring the,. Of course, that was a lie on its face. Things were clearly not working, with the leaking semaphores, but I conveniently filed the two issues as unrelated in my head and ignored them at my peril. The warning was this:
[Mon Jan 24 00:12:50 2011] [crit] The unix daemon module not initialized yet.
Please make sure that your mod_mono module is loaded after the User/Group
directives have been parsed. Not initializing the dashboard.
This warning was repeated for as many times as I had ASP.NET vhosts defined. I looked at my vhost configurations and saw nothing about users and groups and thought it was some weird mono issue and left it at that. But the actual problem was not in the vhost configuration but in httpd.conf. The problem was this default section:
#
# Load config files from the config directory "/etc/httpd/conf.d".
#
Include conf.d/\*.conf
#
# If you wish httpd to run as a different user or group, you must run
# httpd as root initially and it will switch.
#
# User/Group: The name (or #number) of the user/group to run httpd as.
# . On SCO (ODT 3) use "User nouser" and "Group nogroup".
# . On HPUX you may not be able to use shared memory as nobody, and the
# suggested workaround is to create a user www and use that user.
# NOTE that some kernels refuse to setgid(Group) or semctl(IPC_SET)
# when the value of (unsigned)Group is above 60000;
# don't use Group #-1 on these systems!
#
User apache
Group apache
Obviously User and Group are set after all vhost configs are loaded. Pretty much exactly what the warning was saying (doh). I simply moved the Include below User/Group and since then I have not seen more than 9 semaphores and I've restarted mono, rebuilt the application and hit the app with ApacheBench, the combination of which used to drive semaphores up.
Since things are working now and ASP.NET MVC under mod_mono is significantly faster than the Rails port, I'm sticking with ASP.NET MVC for production right now, monitoring semaphores to make sure this really did fix the problem.