July 29, 2008
Dr. Brendan Reilly instituted a simple test to determine whether a patient was suffering from a heart attack. It combined just 4 questions with the results of an ECG. This simple test was 70% better than the barrage of questions previously asked by hospital staff to identify patients that weren’t having a heart attack and was nearly 20% better at identifying patients that were having heart attacks.
More information on the patient’s symptoms often led to an incorrect diagnosis. It distracted doctors from the real issues.
I’ve seen this many times with developers trying to debug performance issues in Rails applications. They look at outliers instead of the obvious culprits. It’s part of the reason I’ve never felt a need for a deep, detailed Rails monitoring application (i.e. – benchmarks from the controller to the database level on every request).
The majority of the time, our performance problems have nothing to do with the Rails framework (and we’ve worked through a lot of issues since we started building Rails apps in 2005). Why benchmark the entire request cycle when the vast majority of issues are isolated at the database layer? After I’ve ruled out the database, I can see benchmarking a single request (there’s a great free tool below), but I simply don’t want the other, often irrelevant information clouding my mind.
Contrary to what you heard on the Interweb, it’s probably not Rails itself that’s making your app slow. We conducted an internal survey of the Highgroove Studios team to see where we’ve encountered performance issues and the root cause:
The database layer has a huge edge on all other issues. In fact, almost all of the performance problems could have
#each), and memory leaks occur in many languages.
It’s not a bad thing to have performance issues, your web app is growing, but it’s a problem if they aren’t quickly fixed.
First, we want to be aware of slow web requests ASAP. We use Scout’s Slow Rails Requests plugin for real-time notification of slow requests because:
Once this plugin is installed, we’ll quickly be alerted of slow requests. Now,
Most nix servers measure a form of server health called the Server Load. Usually, the Server Load is given in Load Averages over 3 different time periods.
Your Server's Load is essentially a rough idea of the number of queued processes waiting for a resource to become available. This resource is generally CPU time, but could also include a number of other factors like Memory, swap space, disk, etc. A lower number is a good indicator of your overall system health and responsiveness.
The 3 averages are for the last minute, the last 5 minutes, and the last 15 minutes. Using these averages, we can see how busy your server really is.
Take a look at the "top" program's output on this server:
We can see this server is not busy at all! In fact, this server is currently at 0.00 load on all three load averages. This is ideal, and indicates an idle server, waiting for a process to handle.
It’s common to see that when the load reaches a certain threshold (perhaps 3.0), processes can slow to a crawl and your Rails app may stop responding. We typically generate an alert through Scout’s Load Average plugin if the load exceeds 3.00.
A slow web request could cause a spike in the load or it could be slow because a background job is using a lot of the CPU, a large number of requests are coming through, etc. Tracking the load helps us figure out these issues.
On the memory-side, there are 2 things we typically monitor on our Rails setups:
It is important to note that as processes use resident memory, they will also increase their use of virtual memory, in step. Processes will actually appear to consume more of this “virtual memory” than the amount of actual physical memory of the system. This is perfectly
Think about it this way. If you worked in a restaurant and I gave you a big load of dishes (your processes) and 5 really fast dish-washing machines (resident / physical memory), and 5 really slow dish-washers (hard
Many Rails applications – either the apps themselves or third party libraries – suffer from memory leaks. As your server uses more and more memory, both their resident memory and virtual memory begin to grow. They begin to use the hard drive as swap space for virtual memory, which is far slower than physical memory. This can dramatically slow performance of the entire system, and thus, all requests. We generate an alert through the Process Usage Plugin if our Mongrel processes exceed a given threshold (usually around 100 MB) and if the percentage of swap space used exceeds a given threshold (usually around 60%) using the Memory Profiler Plugin.
This is often an easy problem to fix: if finding the leak is hard (and it usually is), you can do a scheduled restart. If you are constantly using a lot of swap space, you probably need more memory (that’s cheap compared to development hours).
So, Scout sends you an alert regarding a slow web request – now what?
As stated earlier, most of our performance issues are related to the database, and the Query Reviewer Plugin does a tremendous job of finding issues with MySQL and benchmarking the entire request cycle. The key feature of this plugin is that the query information is embedded directly on in the view.
We use the following process when Scout identifies a slow web request:
We’ve seen lots of people waste time tracing the Rails stack for performance issues when the cause is usually quite simpler – look at the obvious places first before digging through the Rails stack.