Using your connections

— March 1, 2007 at 06:42 PST


ActiveRecord does a great job of insulating you from a lot of the hard stuff about using SQL, and it packs the results of queries up nicely as model objects. What could be better than that? Well, sometimes you don't want all that. Sometimes you just want what you want.

Let's pretend you're building a blog and want to show a little calendar in the sidebar that shows all the days on which a post was made. You don't care about the title, the time of posting, the tags or category of the posts - just the date. You could do that the obvious way using the model class and some Ruby mojo:

posts = Post.find(:all)
days = posts.collect { |post| post.published_on }.uniq

What's wrong with that? It's short and easy to understand, a nearly perfect snippet of Ruby goodness. However, it is a real dog to run. It is grabbing all the data for every post including the body text and allocating an ActiveRecord model object for each one, just to get the publication date. You really should be nicer to your database, and it's a shame throwing away all those barely used objects. What would your mother think of you? Luckily there's an easier way.

You probably know about the find_by_sql method that lets you run arbitrary SQL queries and returns ActiveRecord models. That's spiffy, but it's still overkill for what we're doing. ActiveRecord::Base also provides low-level access to its database connector that can be used to fetch primitive values using raw SQL. This is sometimes just what we want to do.

days = Post.connection.select_values("SELECT DISTINCT published_on FROM posts")

That's much less data to be shipping around, and less work for Ruby to extract what you're interested in. I'll be the first to admit the code isn't as pretty. You actually have to mess up your lovely Ruby code with SQL. But you shouldn't be afraid of SQL. Just because ActiveRecord lets you avoid getting your hands dirty with SQL most of the time doesn't mean you should try to avoid SQL all the time.

By the way, in reality we'd probably be storing a datetime for when the post was published, so we'd do DATE(published_at) instead of published_on in the query, but I was trying to keep the example simple.

You can see the full set of connector database statements in the Rails API docs. I'd stay away from using most of them, but the selection methods can come in handy when you don't want to break your back doing things the ORM way.

13 commentsrails

Comments
  1. JD2007-03-01 07:30:57

    I definitely agree with you - sometimes it is better to use the connection's methods, particularly when trying to get a single column. I do find myself writing code like this every once in a while:

    Something.find(:all, :select => 'some_column').collect(&:some_column)
    

    Using select_values is a good replacement for this to avoid AR instantiation, but be aware that you (probably?) won't get any typecasting.

  2. Jack Nutting2007-03-01 07:39:46

    Good point. It's nice to know that these foundation levels of code in ActiveRecord can be used from up above.

    Just to make it all a bit cleaner, though... wouldn't it be nicer to put that in a class method in the Post class? That way you're insulated from changes in the data model (in case you decide to rename published_on to published_at, for example) since you'd only have to change it in one place. Then you could just call "Post.published_dates" wherever you need it.

  3. Josh Susser2007-03-01 08:29:07

    Jack: That's a good idea to hide the low-level access behind a class method. Good thinking there.

    JD: You're right about the typecasting, or lack thereof. The connection selection methods just return raw strings parsed out of the result set.

  4. Jeff D2007-03-01 08:37:20

    Thanks for the great tip!

    Since select_values is expecting a string, if you need to include conditions you can interpolate sanitize_sql: connection.select_values("SELECT DISTINCT foo FROM bars WHERE #{sanitize_sql :name => 'Josh'} ORDER BY coolness")

  5. Tom Ward2007-03-01 09:45:42

    There are other ways to get this result through ActiveRecord (ensuring you still get type-casting, additional conditions, etc) which are much more efficient. How about:

    Post.find(:all, :select => :published_on, :group => :published_on).collect(&:published_on)

    I'd suggest hiding that behind a class method on Post (Post.all_publication_dates) and only switch to the direct connection when you absolutely have to.

    Another suggestion (if this is something people frequently do) might be a plugin to allow selection of values from a model without the overhead of instantiating each instance, only to collect a single attribute from each. Maybe a syntax like Post.select_values(:all, :select => :published_on, :group => :published_on).

    In general I'd warn against accessing the database connection directly unless you really know what you're doing.

  6. Tom Ward2007-03-01 09:49:31

    And how about a guide to the markup available in comments ;)

  7. Sandro2007-03-01 10:48:22

    But, if I use this method with a very very complex SQL QUERY ( joins and string manipulation functions inside the query ) on which class will be mapped the result ?

  8. Philip Hallstrom2007-03-01 15:51:37

    The other method that comes in handy now and then is update(). Say you've got a column that tracks number of views. If you get a lot of traffic, you can't fetch the record with AR, update it (by adding one in ruby or using increment()), and save it since during that time the value of that field might have increased significantly. So this is an area that using update with SQL similar to "update mytable set col = col + 1" comes in handy as it's atomic.

  9. Andrew Turner2007-03-01 16:06:57

    ActiveRecord Extensions is also worth looking at for doing smarter database queries rather than parsing the responses.

  10. Josh Susser2007-03-01 17:16:35

    Tom Ward: Nice. I like the class methods on the model class to get to the values using the connection but hiding it behind the model. Someone on IRC suggested something similar last night. If someone dosn't beat me to it, I might have to take a shot at adding that to ActiveRecord.

  11. Rodrigo Kochenburger2007-03-01 18:23:53

    I usually do something like this in the model:

    def published_dates self.find(:all, :select => "DISTINCT publishedon").map(&:publishedon) end

    And I actually prefer this, it will work as i usually expect, having type castings, being easy to add prepared conditions, etc.

    But i do agree with the "Less data to be shipping" argument :)

  12. Tom2007-03-01 18:26:08

    @Philip: Back door updates work great except your observers don't get called. If you rely on observers to keep track of things, then you'll need to write extra code for these cases. I ran into this problem just last week!

  13. Tom Ward2007-03-01 20:35:47

    Whilst waiting for my dinner this evening, I've come up with. Untested, scrappy code for sure, but I'm in a rush; I can smell my omelette's almost ready ;) I'm not actually convinced it's worth the effort, but in the spirit of sharing:

    # Allows selection of values from active record tables, without the overhead
    # of instantiating whole records
    
    module Tomafro::ActiveRecord::SelectValues       
      def self.included(base)
        base.extend(ClassMethods)
      end
    
      module ClassMethods
        def select_values(*args)
          options = extract_options_from_args!(args)
          validate_find_options(options)
          raise "select_values does not support the :include option" if options[:include]
    
          case args.first
            when :first then select_values_without_instantiating_records(options.update(:limit => 1)).first
            when :all   then select_values_without_instantiating_records(options)
            else             raise "You must specify :first or :all when using select_values"
          end
        end
    
        def select_values_without_instantiating_records(options)
          select_values_by_sql(construct_finder_sql(options))
        end
    
        def select_values_by_sql(sql)
          result = connection.select_all(sanitize_sql(sql), "#{name} Select Values")
          result.collect do |record|
            record.inject({}) do |memo, kv|
              column = self.columns_hash[kv[0]]       
              memo[kv[0]] = column ? column.type_cast(kv[1]) : kv[1]
              memo
            end
          end
        end
      end  
    end
    
    
    ActiveRecord::Base.send :include, Tomafro::ActiveRecord::SelectValues
    

Sorry, comments for this article are closed.