Be careful how you count in Rails
Sep 12th, 2008 by nahum
I’ve just encountered this problem yet again effecting code I’m working though, so I thought I’d quickly blog about it.
Put basically if you call count on a relationship it will always make a db call to see how many there are even if the relationship has been populated. If you use size or any? they will too, unless the relationship is populated, in which case they treat it like an array and leave the DB the heck alone.
For example, when displaying a bunch of users you want nice html when showing if they own any books , so you’ll probably end up checking user.books.count multiple times per user shown. This will ask the db for a book count every time you call count and it will do it for all users shown. Not fun.
Unfortunately, simply using size or any? instead will make no difference unless the relationship is already populated or coincidentally 0 is returned as, for some odd reason, this result is cached by Rails.
Additionally you should eager load if you’re accessing data within the relationship when rendering the user’s line - you were doing this already ay? If a ‘count’ of books is simply being used rather than book related data then still use size and any? as this may change later down the line and it’s no extra effort.
Now to throw a spanner in the works, named_scope is different, in fact it behaves the exact opposite. way to size and any?. If you call them on a named scope they grab the whole result set (ie all the models), tell you what you want to know and then discard the result set - EVERY TIME.
Named scopes are not like relationships as they don’t cache their results, so successive calls to a named scope will hit up the DB each time, ignoring any previous calls made to it. Ironically, calling count on a named scope makes a ‘count(*)’ call to the DB so it’s much lighter weight than size or any?.
I almost thinking that this behavior is inconsistent and could be changed so that calling count, size and any? on an unpopulated relationship would cache the result for that request as it could actually change during execution potentially causing errors or show something weirdly.
This problem can also occur with caching as cached fragments are usually checked for in actions and then in the view. The fragment could be expired by a different request in between these two checks causing the view to barf as the action won’t have set up any data thinking the view won’t need it. I’ve had this unlikely situation happen multiple times and ended up building an in-request cache to combat it.
I feel a plugin, or even attempted patch to Rails welling up from within…
UPDATE: I forgot all about using length, which muddies the waters further! Length populates the relationship and then returns it’s size. Meaning subsequent calls don’t talk to the DB or if the relationship was already populated there isn’t any DB activity either - just like size and any?. The big thing to note here though is that it attempts to populate the relationship first which you many not want happening. Remember eager loading?
With named scopes it behaves the same, but since named_scopes behave differently to relationships there is no caching or populating happening - so each call involves the database returning all the records and Rails turning them into models.
I’m still going to be using size and any? as they’ll do ‘count(*)’ calls to the db if the relationship isn’t populated, otherwise they’ll just return it’s array size - or a boolean in the case of any?. Meaning a git a bit of code future proof happening - ie less ‘why is that suddenly real slow?’ type situations.

Have just been bitten by this combined with named scopes. Seems named scopes breaks associations a little.
In what way? Got an example?