Segregated page cache storage

— January 30, 2008 at 23:41 PST


Page-caching is one of the highest leverage features in Rails. It doesn't take much to set up, and the payoff is huge. When building Teldra I knew from the start that page caching would be part of my production deployment, as it should be for any site with pages where content changes infrequently relative to number of views.

The only thing I find annoying about using the page caching feature is how the cached pages are stored in the RAILS_ROOT/public directory, right alongside all the app's other static pages. I greatly prefer having the cached pages stored in a separate directory. This makes it a lot easier to distinguish between static pages and cached dynamic pages, and if something goes wonky with your cache you can blow it away easily with a single command.

Here's the setup I have Teldra running on. I'm using nginx for the webserver and capistrano for deployment (using slight tweaks to the EngineYard standard configs and recipes). I created a shared/cache directory, told Rails to store cached pages there, and added rules to nginx to find cached pages there. Here's how things look in detail...

In config/environments/production.rb, tell Rails to put cached pages in the public/cache directory.

config.action_controller.page_cache_directory = File.join(RAILS_ROOT, 'public', 'cache')

In nginx.conf, set up the precedence for locating static files. First look in public for regular static files. Next look in the cache directory for an exact match for the url. Lastly, look in the cache directory for the url with .html appended. That will let you cache pages for regular URLs with no .html extension as well as ones with extensions like .xml, .atom, .json, etc.

  if (-f $request_filename) { 
    break; 
  }

  if (-f /cache$request_filename) { 
    rewrite (.*) /cache$1 break;
    break; 
  }

  if (-f /cache$request_filename.html) { 
    rewrite (.*) /cache$1.html break;
    break; 
  }

The capistrano recipes have to do a couple things. You need to create the shared/cache directory when setting up the deployment.

after "deploy:setup", "create_page_cache"
task :create_page_cache, :roles => :app do
  run "umask 02 && mkdir -p #{shared_path}/cache"
end

When deploying a new release, create a symlink from public/cache to the shared/cache directory.

after "deploy:update_code","symlink_shared_dirs"
task :symlink_shared_dirs, :roles => :app, :except => {:no_release => true, :no_symlink => true} do
    run <<-CMD
      cd #{release_path} &&
      ln -nfs #{shared_path}/cache #{release_path}/public/cache
    CMD
end

When doing a deploy, the standard behavior is to flush the cache, just to be on the safe side. If you want to retain cached pages, as when making a change you know won't affect rendering, tell capistrano not to flush.

# default behavior is to flush page cache on deploy
set :flush_cache, true

# page cache management
task :keep_page_cache do
  set :flush_cache, false
end

after "deploy:cleanup", "flush_page_cache"
task :flush_page_cache, :roles => :app do
  if flush_cache
    run <<-CMD
      rm -rf #{shared_path}/cache/*
    CMD
  end
end

With the above setup, you can deploy and retain the cache with the following capistrano command:

$ cap keep_page_cache deploy

That's about it. I know I'm not the first fellow to think of this setup, but I'm surprised it's not more common. Anyone else doing something similar?

12 commentscaching, deployment, rails

Comments
  1. rick2008-01-31 00:21:01

    Mephisto does, fool! I can't rely on the /public directory for multi-site setups, so cached files go into public/cache/hostname/*. It makes the rewrite rules tougher to deal with though, which is why I try to keep the stock install simple.

    Also, be wary of this approach if you're using a shared file system like gfs. It slows down on on large write/delete operations. I had issues on lighthouse before moving to memcache, but I was also dealing with lots of smaller fragment cache issues. Realistically you're probably fine for now. But if you notice the cleanup taking awhile, that's why.

  2. Tim Connor2008-01-31 00:32:10

    And be careful to avoid the infamous Typo-style caching of the blank 500 page. That gets really old on TypoDH, and part of why I am now skeptical about page caching for all but high volume blogs (like yours). Also, just memcaching stuff saves so much load at work that I am not sure that isn't often enough.

    Of course for blerb we're playing with DataMapper and merb, so it will be interesting to see how the need for page-caching plays out there.

    Oh, and congrats on finally sucking it up and deploying a non-Typo, non-Dreamhost blog (a lethal combo, to be sure), Josh.

  3. Chris McGrath2008-01-31 03:59:49

    We use a similar scheme to Mephisto to cache different language versions of our search pages. Our setup wouldn't work with normal rails page caching as the different language domains point to the same rails root, so different language output would be cached in the same place. It works very well for us and the sweeper is a simple:

    */5 * * * * cd /data/www/current/file_cache && find . -name '*.html' -mmin +60 -type f -print0 | xargs -0 rm 2>/dev/null

  4. Mike Owens2008-01-31 05:19:23

    Crazy, I did pretty much this exact setup yesterday. A few differences:

    config.action_controller.page_cache_directory = "#{RAILS_ROOT}/tmp/cache/page"
    

    This allows the cache to be flushed with the normal rake tmp:cache:clear. Rails (on edge, at least) will create all path components needed to save the file in production.

    You have to configure nginx a little weirdly if you have the cache directory outside of your document_root. Something like:

    if (-f $document_root/../tmp/cache/page$uri.html) {
      rewrite (.*) /../tmp/cache/page$1.html break;
      break;
    }
    

    The only thing that worries me is the number of stat()s nginx is doing by testing files on every request. Doesn't seem like nginx uses inotify or similar, and it doesn't keep a cache of existing files. By the time nginx has exhausted all cache possibilities and handed the request to mongrel, you've already got this:

    stat64("/www/whatever/public/../tmp/cache/page/photos/i_dont_exist", 0xbfa8ba3c) = -1 ENOENT
    stat64("/www/whatever/public/../tmp/cache/page/photos/i_dont_exist.html", 0xbfa8ba3c) = -1 ENOENT
    stat64("/www/whatever/public/../tmp/cache/page/photos/i_dont_exist/index.html", 0xbfa8ba3c) = -1 ENOENT
    

    For this reason, I suggest at least using a location-match on ^/(images|stylesheets|javascripts) to bypass all the cache-checking on those.

  5. Mislav2008-01-31 05:57:10

    Excellent writeup. But why the 2 breaks in rewrite ... break; break? I thought the first one was enough -- I may be wrong, though.

    P.S. luv the comment system here (how you force the preview)

  6. Josh Susser2008-01-31 08:28:20

    Wow, lots of great comments. Let me throw some responses out...

    @rick: I guess that style of caching was added to Mephisto more recently than the version I was using, or my simple single-site install didn't activate that feature. Good point about deletes (EngineYard uses gfs), but since that only happens infrequently on a deployment, I don't think the impact is anything to worry about now, and maybe never.

    @Tim: good point about the cached 500 pages. I didn't do anything specific to avoid that, but I'm not doing anything crazy with caching so maybe Rails prevents that on its own. I'll have to write some tests and see what's going on.

    @Mike: That's exactly how the nginx.conf is setup, with the loc ation-match for asset files. EngineYard folks helped me set it up, and they know what they are doing there (even if I don't grok it all yet). I probably should have showed more of the config.

    @Mislav: see Mike's comment about the double breaks. I don't fully grok myself. (The comment preview is something I wanted for ages but was too hard to add to the liquid view in my old blog. I want to improve it to be a live preview, but that's going to take a little more effort to get working right. But yay for Showdown and previews without XHR calls.)

    It's cool that so many people have good experience with this. Crazy I haven't seen it written down anywhere yet. When is Ezra's book due...?

  7. rick2008-01-31 09:54:43

    @josh: It's not so much that my deployments took 10 or 15 seconds longer, but that it seemed to hang the whole slice when it happened. I guess it has to do with gfs locking the whole filesystem up? I'm not too sure... Again, not worth worrying about for a blog I guess.

    @tim: Rails shouldn't page cache unless the response is a 2xx. Also, I don't care how fast merb is, it won't hold a candle to your web server. Page caching is awesome...

  8. Jesse Andrews2008-01-31 15:01:49

    I do something very similar for caching the feeds on userscripts.org.

    I have multiple feeds for every user and script (number in the tens of thousands). I'm only rail's built-in using page cache to cache the feeds. I updated environments/production.rb to be:

     config.action_controller.page_cache_directory = "/home/deploy/app/shared/cache"
     config.action_controller.page_cache_extension = '.xml'
    

    I route all feeds through /feeds so checking if the feed is cached in nginx is simpler. Since the file is served from nginx I have to specify the charset and default_type explicitly:

    location /feeds/ {
        root /home/deploy/app/shared/cache;
        if (-f $request_filename.xml) {
          rewrite (.*) $1.xml break;
        }
        charset          UTF-8;
        default_type     application/xhtml+xml;
        error_page       404 = /fallback;
    }
    

    If the file is not found (404) it sends it to /fallback - which is a proxy to the rails process - which will create the feed and cache it on disk.

    To invalidate cache, I have a cron job that does a find of all xml files in the cache directory older than 10 minutes and delete them.

    cd ~/app/shared/cache && find -name *.xml -cmin +10 -exec rm {} \;
    

    It works great!

  9. blj2008-01-31 22:12:43

    FYI, Radiant CMS does something very close.

  10. Marcus2008-03-07 10:51:25

    Is that Teldra as in Steven Brust's Teldra?

  11. Josh Susser2008-03-13 23:26:47

    @Marcus: Yes, indeed.

  12. stephen2008-04-28 01:24:00

    not sure why, but i didn't have luck with your setup.. needed to do the following:

    
          if (-f $document_root/cache$request_uri.html) {
            rewrite (.*) /cache$1.html break;
          }
          if (-f $document_root/cache$request_uri/index.html) {
            rewrite (.*) /cache$1/index.html break;
          }
    

Sorry, comments for this article are closed.