I’ve been hosting my flagship SaaS app on Heroku since 2008. Overall it’s been a stable, if a bit overpriced, platform. Over the past year, however, I’ve been experiencing mysterious performance problems. The app runs fine for several weeks. Then suddenly I begin receiving exception reports about certain methods not being found on certain objects. Restarting my dynos would fix the problem for a few days or a few weeks, but eventually I would start getting errors again. It definitely felt like some sort of memory issue.
After profiling the app and discovering nothing, I installed the Librato dashboard which offers a basic line graph of memory usage across dynos. I began noticing a correspondence between this line getting above 200 MB and my app throwing errors.
Each dyno on Heroku theoretically has 512 MB of memory. I was running my app on Unicorn with 2 processes per dyno. I wouldn’t expect problems unless each process exceeded 256 MB. I was confused why I was seeing problems at just 200 MB of usage. True, the line would continue creeping up if I didn’t restart my dynos, and would eventually exceed 256 MB which would trigger an auto-restart of the dynos. But this took a long time to happen, and in the meantime my visitors were experiencing a slower app and/or outright errors.
I spent several days attempting to identify where the app was leaking memory. Why did the memory usage line continue climbing? I tried various techniques to identify the problem but was unable to reproduce the leak on my local system. Eventually I decided a different tactic was necessary. Heroku has been recommending Puma as an alternative to Unicorn for a while now, so my first thought was to switch to Puma which uses threads for concurrency instead of processes. However, my app runs under MRI, not JRuby, so I wouldn’t necessarily be able to take advantage of those performance gains. Instead I opted for Passenger which now runs on Heroku.
The results have been beyond what I expected. My memory usage line is now perfectly straight. No increase over time. No eventual errors and dyno restarts due to overconsumption. What Passenger is doing under the covers is spinning up new processes during high traffic periods and killing them during low traffic periods. My app has been running for 3 months now and I haven’t had to restart any of my dynos, nor have I encountered any performance issues with the app. Success!
I can think of two explanations as to why Passenger fixed these problems: first, perhaps Unicorn itself was causing my app to leak memory in a strange way. Second, and more likely, Passenger’s built-in ability to spin up processes on demand is keeping memory leakage to a minimum due to processes regularly being refreshed. Regardless of which explanation is correct, I’m happy the app is no longer throwing errors at inconvenient times. Most importantly, my users are having a far more consistent experience. If they’re happy, I’m happy.