Wordpress with Nginx on Slicehost
I've just moved this blog to being self-hosted, other people made the work easy for me.

Over the weekend I moved this blog from being hosted on my Dreamhost account to one of my 256mb Slicehost servers.

I’ve done this mainly to stop paying for hosting twice, but also because if I’m going to be talking about performance I can’t possibly have a slow blog and my options were limited with the Dreamhost account.

My immediate problem being that I wanted to move it straight to being served by Ngnix rather than Apache – not something that Wordpress is setup for.  After a bit of googling I found that someone had solved all the problems I was going to have for me!  This post has blow by blow instructions for setting up a wordpress blog on Slicehost and Ubuntu, I couldn’t have found anything better.

It worked exactly as advertised, but me being me I could help tweaking and updated the nginx site config to include expire header for static assets, like so:

server
  {
 
  listen   80;
  server_name www.motionstandingstill.com;
 
  root   <path_to_site>;
 
  access_log <custom_log_path>/access.log;
  error_log <custom_log_path>/error.log;
 
  location ~* i.+\.(css|js|jpg|jpeg|gif|png)$
    {
    expires      7d;
    }
 
  location /
    {
    index  index.php index.html;
 
    # Basic version of Wordpress parameters, supporting nice permalinks.
    # include /etc/nginx/conf/wordpress_params.regular;
    # Advanced version of Wordpress parameters supporting nice permalinks and WP Super Cache plugin
    include /etc/nginx/conf/wordpress_params.super_cache;
    }
 
  # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
  location ~ \.php$
    {
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    include /etc/nginx/conf/fastcgi_params;
    fastcgi_param SCRIPT_FILENAME <site_path>/public/$fastcgi_script_name;
    }
  } # server

Later I found someone had posted a different conf with easy support for multiple blogs, but I’ve not moved to using it as what I’ve got works.  What it also does is add the expire clause support like I have done above.

When moving this blog to Slicehost I initially set it up on a temp subdomain to make sure it was working sweet first and because I wanted to try out some different themes and play with a couple of other things.

I quickly discovered that as it was running on a different doman than it was configured for, Wordpress would redirect my browser to the correct domain meaning I couldn’t get into the new site.  Ugh.  So I had to whip out my sql skills and manually update the appropriate settings.

UPDATE wp_options SET option_value = 'http://temp.motionstandingstill.com' WHERE option_value = 'http://www.motionstandingstill.com';

I actually used a different domain name and explicitly updated two rows rather than the generic ‘where’ clause I have, but you get the idea.  This allowed me in which is what I wanted.

Then I took the opportunity to tidy up what plugins I’m using as I’d gotten a bit ‘plugin install happy’ when first setting the blog up.  As well as getting rid of unwanted plugins I installed two new plugins I’d recently found, the first a redirection plugin with regex support and the second a security checking plugin.

Once I’d got everything how I wanted it, just sticking with the same theme (the misty tree photo is mine), I grabbed a final db backup from Dreamhost and restored it on Slicehost then changed my DNS A record.

All in all, much easier than I thought it would going to be thanks to some helpful results from google. :-)

Update: Slicehost has been aquired by Rackspace.

Posted in Miscellaneous | Tagged , , | 3 Comments

Performance optimisation through named_scope

I recently gave a presentation about using named_scopes to WellRailed and while it was still on my mind I thought that I’d cover named_scope in conjunction with some performance optimisation work I’ve been doing.

I used to use relationship extensions lots and it always irked me that I was putting business logic for a model, say Book, on a related model, Owner, like this:

class Owner < ActiveRecord::Base
 
  has_many :books, :as => 'owner' do
    def old
      all(:conditions => ['published < ?', 20.years.ago])
    end
  end
 
end

Especially if multiple related models wanted the same logic:

class Company < ActiveRecord::Base
 
  has_many :books, :as => 'owner' do
    def old
      all(:conditions => ['published_at < ?', 20.years.ago])
    end
  end
 
end

So to eliminate any code duplication problems down the line I would put the business logic in a module within book.rb:

Module BookExtention
  def old
    all(:conditions => ['published_at < ?', 20.years.ago])
  end
end

and extend the relationships using that instead:

class Owner < ActiveRecord::Base
  has_many :books, :extend => BookExtention
end
 
class Company < ActiveRecord::Base
  has_many :books, :extend => BookExtention
end

That still didn’t completely resolve my discomfort though as the business logic was still outside the model itself. It was enough for me to move on to other problems though due to the code duplication issue being negated.

Why use extensions at all I expect you might think? It means in your actions you don’t have to continually repeat the condition and working through relationships means you have no security worries due to Rails taking care of that for you – walk the relationships for tighter more secure code people

Another way of getting a list of the user’s old books is this way:

current_user.books.select {|b| b.published_at < 20.years.ago }

Yep this works and is real quick on your dev machine with a limited dataset, say your fixtures for example. Here is what it’s doing. It’s getting ALL the user’s books from the DB, rails is then turning ALL that data into models, slow and memory hungry, and then the select is iterating over ALL the user’s books to find just the old ones. If you’re planning on people actually using your website, then use the DB for what it’s good at and filter the data as much as you can BEFORE it hands it over to your app.

Worse still is if this was done:

Books.all.select {|b| b.published_at < 20.years.ago }

Additionally, don’t use your fixtures for development data, use a recent backup form your production server, when that gets too big use a subset of that. Don’t have user generated data? Make some up. I jam chunks of Wikipedia into my database when I don’t have any real data to play with. Unless you do this you won’t know how your code performs in real life.

I’ve seen this style of coding quite a bit as Rails with it’s ORM wrapper, ActiveRecord, encourages developers to discard the shackles of SQL and do it all in ruby using rails with highly readable code. I’m all for that, but you have to write SQL as some point and this is where named scopes come in. They allow you to define (name) bunches of business logic (scope) and then string them together – which you can’t do with relationship extensions, for example:

Module BookExtention
 
  def old
    all(:conditions => ['published < ?', 20.years.ago])
  end
 
  def romantic
 
    all(:conditions => "genre = 'romantic'")
  end
 
end

What if I want old romantic books? I have to use one of the methods and then manually select based on the other’s condition. Ugh!

This is where named scope steps up to the plate and impresses the crowd, instead of the relationship extension do this in your model:

class Book < ActiveRecord::Base
 
  belongs_to :owner, :polymorphic => true
 
  named_scope :old, :conditions => ['published < ?', 20.years.ago]
  named_scope :romantic, :conditions => "genre = 'romantic'"
 
end

The beautiful thing here is that all your existing code which references the old methods off of the books relationship do not need to change as this still works:

current_user.books.old

Basically named_scope is jamming the :conditions into the current finder scope and applying that along with the relationships scope and defaulting to calling ‘all’. This means you can also do these:

current_user.books.old.all(:limit => 20)
Books.romantic.count

Instead of ‘all’ you can call what ever you normally could off of the relationship or the model’s class. And it generates the appropriate SQL given the scopes in play at the time. Nice. As the named_scope is just pushing itself onto the :find scope you can string them together like this:

current_user.books.old.romantic.all(:limit => 20)

This means that the DB will only return the data that you want, so no time or memory is wasted on data that would immediately be discarded by running a select over the users old books. Hurray.

As named_scopes are defined at class level the 20.years.ago is actually executed as the class definition is read into memory when the rails app first spins up. So if it was running for a year it would actually be 21.years.ago. To handle this you can do the following:

named_scope :old, lambda { {:conditions => ['published < ?', 20.years.ago]} }

Each time ‘old’ is used the lambda is executed returning a hash of finder options, in this case just a condition. Your eyes are probably lighting up at this stage thinking of the possibilities with dynamic named scopes and yes you can pass parameters to them. Thats for my next blog entry though sorry.

In summary, this gives two types of optimisation. Firstly higher quality though better readability, cleaner and tighter code, no code duplication and business logic living where it should be. Secondly, not only can you serve more requests from the same hardware due to faster execution, but because it’s more memory efficient it’s likely you can run more application instances, ie mongrel, on that same hardware.

So that covers the basics of why I like named_scope, in my next entry I’ll take this a couple of big steps further and possibly get into some nitty gritty log examinations.

Posted in Miscellaneous | Tagged , , , | Leave a comment

Using Nginx to send files with x-accel-redirect

So far I’ve configured Nginx to handle file uploads by caching the file to disk and telling rails where it is, rather than passing it through in the request, not fun with large files.  Now to do the reverse.  Instead of Rails sending files to users thru Nginx, Rails can tell Nginx what file to send.

I’d initially assumed that when a Rails application (lets say Mongrel) was sending a file with the ’send_file’ method then it couldn’t handle other requests as they came in.  Seeing as that was an assumption rather than fact I setup a download action on my Ubuntu dev server to show this happening.  Basically a website supported by a single Mongrel – if it’s busy sending a large file additional requests will get queued up eventually giving 503 errors I figured as they timed out.

I set the action going to download a 40mb test file on the server, opened up another tab and loaded another test page and it appear straight away.  So my assumption was wrong, but not completely wrong it turns out.  Mongrel is still sending the file though as if I kill it the download terminates.  This is because Mongrel has it’s own internal request queue and only sends one request at a time to your Rails code. Hence the ability to handle additional requests while someone is downloading a file and websites running clusters of Mongrels.  Getting Mongrel to send the file isn’t the best use of resources though.

For Nginx to send files on Mongrel’s behalf two changes are needed.  Firstly you need to tell Nginx that it should be doing this and from where.  The Nginx sendfile page is quite helpful in this regard, you’ll end up having something like this:

location /files/ {
  internal;
  root
/;  # note the trailing slash
}

Note, sendfile is enabled by default in all nginx.conf files I’ve seen.

Secondly, in your Rails download action do something like this:

if
  head(:x_accel_redirect =&gt; "/files/#{filename}",  :content_type =&gt; File.mime_type?(file), :content_disposition =&gt; "attachment; filename=#{filename}")
else
  send_file("#{RAILS_ROOT}/files/#{filename}", :type =&gt; File.mime_type?(file))
end

It is important that you specify the correct mime_type to stop the receiving browser guessing and potentially changing the file extension.  If one of the standard rails attachment gems is being used, then you’ll likely have that information already.  But if you don’t, like in the example above, then mimetype_fu is a very handy plugin as it extends the File class by adding the mime_type? method.

If you store files with guid like names, then the file name received by the user can be controlled by changing the :content_disposition value.

Finally, the reason I’ve specified the source tree root is so that files from multiple top level folders can be accessed from the one location entry in the Nginx config.

There is a plugin somewhere that does the head changes for you, but it’s been abandoned and even has a message suggesting that it shouldn’t be used for that very reason.  I’ve not used it for that reason and because it assumes all file downloads are to be handled by Nginx.  That’s a nasty assumption that would bite someone one day, so not very fun.  So I’m in the process of rolling my own.

Apache can do the same with with mod_xsendfile.

Posted in Nginx File Transfers | Tagged , , , , , | 1 Comment

Amazon Elastic Block Store (EBS)

Amazon’s persistent storage beta program for Elastic Cloud Computing (EC2) has been unleashed on the general public.  I’ve been hanging out to play with this for a while and will duplicate any tests I perform on Slicehost servers with comparable EC2 servers with EBS.  Since I’m also essentially comparing two different providers, I’ll look at the costs.

The page I’ve linked to above is worth reading as it covers performance and durability quite succinctly.  It’s pretty awesome to learn that you can just treat each block (EBS) as you would traditional physical storage, striping multiple mounts to achieve greater performance.  Hurrah!  Given the only different in cost of having three 30gb blocks striped compared to a single 30gb block would be three times the access costs I wonder if this would be the more common style of configuration?  I also wonder how backups / snapshots work in this situation as synchronization would be quite important one would think.

I find sometimes it’s the case that numbers quoted by a merchant aren’t often attained in the real world as they only reflect usage under ideal conditions.  Internet speeds in New Zealand, pictures of fast food and cosmetic benefits being good examples.  So I wonder if the same will apply to this new service?  Time will tell and I’m sure it’ll be big news if Amazon’s numbers are considerably off the mark.

Come to think of it I don’t know how Amazon’s existing services stack up against what is advertised.  I’m gonna find out though as I intend to see how much I can squeeze out of their service too.  Apart from the geek factor, I’m doing this as clients are increasingly asking about Amazon EC2 after I normally recommend either Slicehost or a dedicated box somewhere.  Basically at the moment I don’t know, when I need to know.

So I have four big questions around using Amazon EC2.  Firstly a big cause of concern for me is their service having already experienced a number of notable down times including what appears to be reported data loss.  With their latest significant event Amazon has been very open about the problem and what they are doing as a result – nice and positive.  Although, as nice as that is, they should really stop having them.

Secondly is performance, obviously.  I’m gonna put everything that can’t be replicated on EBS, including all logs that interest me.  Of note, I’ll have a webserver, a load balancer and many many application instance continuously logging away from multiple servers and having a good time about it.  Oh and also the database.  Performance better not blink.  Previously a big turn off for me has been the situation of ‘what logs?’ -  if an EC2 server suddenly stops, you have no way of knowing what just happened.  Yeah I know about services like RightScale that attempt to minimize this, but its not good enough by my standards.  Plus their sign-up fee is a big turn off to me, presumably they are trying to protect their IP from free access.  Anyway, who does big sign-up fees anymore???  It’s so 2005.

Thirdly is unexpected restarts.  What event based actions can I automate?  I don’t know and this is possibly just because I’ve not dug into it deep enough yet.  When they restart, EC2 servers are blank again.  Can I set them to be automatically loaded with a script which reverts their state back to what it was pre-reboot?  I doubt this would be as quick as I’d want so I could just symlink most of the OS to EBS – depending on it’s performance.  It implies that either Amazon provides a harness that sits around all your servers for kicking off the scripts as needed, or you use a third party or you roll your own.  I’ve been looking out for an excuse to setup a big kickass high availibilty setup, then it wouldn’t matter!  Hmmmm.

UPDATE: You can mount EBS as the file system on EC2 as noted here.  That basically implies it’s up there in the performance stakes and obviously no need to reinstall everything after a reboot.  You can also create a new EBS from a snapshot which is extremely handy.

Forthly, permanent ip addresses – I know they have them but again that’s about the limit of my knowledge.  The lack of an EBS like service has stopped me seriously investigating AWS until now.  Can I have more than one ip address per server?  Can I have a floating one if I want to roll my own my high availability configuration?

Backing up user content and other important data is a really big deal for a website, especially as it starts accumulating and accumulating.  I have found Amazon S3 is my best mate for this – especially for doing regular rsync like backups.  If a website is entirely hosted with EC2 and EBS this becomes a whole lot easier to the point of being stupidly easy.  That unto itself is a really big deal.

At the moment all server oriented virtualized services that I can recall using and reading about are essentially just replications of physical devices – normally with marginally lower performance characteristics while being more reliable due to their redundant nature.  I do wonder though when someone will come up with something new that’s not available in the physical world and what that will be.

Aside from the concerns I’ve expressed above, I’ll just add that what an awesome learning tool AWS and like services are becoming.  Who cares if you screw up – learn, wipe and start again.

I’ve got even more geeking out ahead of me now, sweet.

Posted in Performance Project | Tagged , , , , , , | 1 Comment
  •  

  • About Nahum Wild

    I'm a High Performance Website Consultant specialising in Ruby on Rails deployments. In this blog I cover common problems I've seen and provide insight on optimisation techniques.

  • Recommend Me

    Follow me on Twitter