June 30, 2009
UPDATED 6/30 – The fix for the old scout client (run via cron) is now available in version 2.0.7 (sudo gem install scout).
In rare (and difficult to reproduce) cases we’ve seen the Scout Agent not observe a Timeout during a checkin error with the Scout server. Scout uses Ruby’s RestClient gem to connect to the Scout Server and it uses the standard Net::HTTP library to manage the connection. Some versions of the Net::HTTP library can run into a bug in IO.select() on some platforms. This causes the request to hang forever in some rare cases.
Our fix? We added a redundant Timeout for the request, in addition to Net::HTTP’s own Timeout. You have to be careful how you nest those calls though, since they will throw the same Exception by default. We followed Eric Hodel’s advice to get our implementation right.
If you’re using Net::HTTP and notice the same issue, try adding a redundant Timeout with a custom halting Exception (our committed fix for this is on github).
This fix is included in version 3.2.6 of the Scout Agent. We’re planning on backporting the fix to the old client late next week (available now). Follow our Twitter feed to stay updated with the latest releases.