Performance optimisation through named_scope
Sep 11th, 2008 by nahum
I recently gave a presentation about using named_scopes to WellRailed and while it was still on my mind I thought that I’d cover named_scope in conjunction with some performance optimisation work I’ve been doing.
I used to use relationship extensions lots and it always irked me that I was putting business logic for a model, say Book, on a related model, Owner, like this:
class Owner < ActiveRecord::Base has_many :books, :as => 'owner' do def old all(:conditions => ['published < ?', 20.years.ago]) end end end
Especially if multiple related models wanted the same logic:
class Company < ActiveRecord::Base has_many :books, :as => 'owner' do def old all(:conditions => ['published_at < ?', 20.years.ago]) end end end
So to eliminate any code duplication problems down the line I would put the business logic in a module within book.rb:
Module BookExtention def old all(:conditions => ['published_at < ?', 20.years.ago]) end end
and extend the relationships using that instead:
class Owner < ActiveRecord::Base has_many :books, :extend => BookExtention end class Company < ActiveRecord::Base has_many :books, :extend => BookExtention end
That still didn’t completely resolve my discomfort though as the business logic was still outside the model itself. It was enough for me to move on to other problems though due to the code duplication issue being negated.
Why use extensions at all I expect you might think? It means in your actions you don’t have to continually repeat the condition and working through relationships means you have no security worries due to Rails taking care of that for you - walk the relationships for tighter more secure code people
Another way of getting a list of the user’s old books is this way:
current_user.books.select {|b| b.published_at < 20.years.ago }
Yep this works and is real quick on your dev machine with a limited dataset, say your fixtures for example. Here is what it’s doing. It’s getting ALL the user’s books from the DB, rails is then turning ALL that data into models, slow and memory hungry, and then the select is iterating over ALL the user’s books to find just the old ones. If you’re planning on people actually using your website, then use the DB for what it’s good at and filter the data as much as you can BEFORE it hands it over to your app.
Worse still is if this was done:
Books.all.select {|b| b.published_at < 20.years.ago }
Additionally, don’t use your fixtures for development data, use a recent backup form your production server, when that gets too big use a subset of that. Don’t have user generated data? Make some up. I jam chunks of Wikipedia into my database when I don’t have any real data to play with. Unless you do this you won’t know how your code performs in real life.
I’ve seen this style of coding quite a bit as Rails with it’s ORM wrapper, ActiveRecord, encourages developers to discard the shackles of SQL and do it all in ruby using rails with highly readable code. I’m all for that, but you have to write SQL as some point and this is where named scopes come in. They allow you to define (name) bunches of business logic (scope) and then string them together - which you can’t do with relationship extensions, for example:
Module BookExtention def old all(:conditions => ['published < ?', 20.years.ago]) end def romantic all(:conditions => "genre = 'romantic'") end end
What if I want old romantic books? I have to use one of the methods and then manually select based on the other’s condition. Ugh!
This is where named scope steps up to the plate and impresses the crowd, instead of the relationship extension do this in your model:
class Book < ActiveRecord::Base belongs_to :owner, :polymorphic => true named_scope :old, :conditions => ['published < ?', 20.years.ago] named_scope :romantic, :conditions => "genre = 'romantic'" end
The beautiful thing here is that all your existing code which references the old methods off of the books relationship do not need to change as this still works:
current_user.books.old
Basically named_scope is jamming the :conditions into the current finder scope and applying that along with the relationships scope and defaulting to calling ‘all’. This means you can also do these:
current_user.books.old.all(:limit => 20) Books.romantic.count
Instead of ‘all’ you can call what ever you normally could off of the relationship or the model’s class. And it generates the appropriate SQL given the scopes in play at the time. Nice. As the named_scope is just pushing itself onto the :find scope you can string them together like this:
current_user.books.old.romantic.all(:limit => 20)
This means that the DB will only return the data that you want, so no time or memory is wasted on data that would immediately be discarded by running a select over the users old books. Hurray.
As named_scopes are defined at class level the 20.years.ago is actually executed as the class definition is read into memory when the rails app first spins up. So if it was running for a year it would actually be 21.years.ago. To handle this you can do the following:
named_scope :old, lambda { {:conditions => ['published < ?', 20.years.ago]} }
Each time ‘old’ is used the lambda is executed returning a hash of finder options, in this case just a condition. Your eyes are probably lighting up at this stage thinking of the possibilities with dynamic named scopes and yes you can pass parameters to them. Thats for my next blog entry though sorry.
In summary, this gives two types of optimisation. Firstly higher quality though better readability, cleaner and tighter code, no code duplication and business logic living where it should be. Secondly, not only can you serve more requests from the same hardware due to faster execution, but because it’s more memory efficient it’s likely you can run more application instances, ie mongrel, on that same hardware.
So that covers the basics of why I like named_scope, in my next entry I’ll take this a couple of big steps further and possibly get into some nitty gritty log examinations.
