has_many :through gets :uniq

— May 6, 2006 at 18:40 PDT


If you don't follow the Rails Trac commit log, you may find this interesting. Jeremy Kemper just checked in a change to enable the :uniq option for has_many :through associations.

So what does the :uniq option do? The docs for the has_many options say:

:uniq - if set to true, duplicates will be omitted from the collection. Useful in conjunction with :through.

This is very handy when there are multiple connections between two model objects via a join model but you only care about whether there are any connections at all. Let's go back to my favorite example, where contributors contribute to books in various roles, such as author, editor, illustrator, etc. I'll omit the migrations for Book and Contributor as they aren't interesting for what we're doing here.

create_table "contributions" do |t|
  t.column "book_id",         :integer
  t.column "contributor_id",  :integer
  t.column "role",            :string
end

class Contribution < ActiveRecord::Base
  belongs_to :book
  belongs_to :contributor
end
class Book < ActiveRecord::Base
  has_many :contributions, :dependent => :destroy
  has_many :contributors, :through => :contributions
end
class Contributor < ActiveRecord::Base
  has_many :contributions, :dependent => :destroy
  has_many :books, :through => :contributions
end

That's all well and good, but the associations in the Book and Contributor models are a bit weak. If Sam contributed to a book as both an author and an illustrator, then book.contributors will include Sam twice. Since the contributors collection doesn't include any information about the roles, Sam showing up twice is mere redundancy. Let's try to improve the model so things are more useful.

class Book < ActiveRecord::Base
  has_many :contributions, :dependent => :destroy
  has_many :contributors, :through => :contributions, :uniq => true
end

Adding the :uniq => true option tells the association to eliminate redundant results. Now it doesn't matter how many different ways Sam contributes to a book; he is listed as a contributor only once.

That's nice, but we can do better.

class Contribution < ActiveRecord::Base
  belongs_to :book
  belongs_to :contributor
  belongs_to :author,      :class_name => "Contributor"
  belongs_to :editor,      :class_name => "Contributor"
  belongs_to :illustrator, :class_name => "Contributor"
  belongs_to :proofreader, :class_name => "Contributor"
end
class Book < ActiveRecord::Base
  has_many :contributions, :dependent => :destroy
  has_many :contributors, :through => :contributions, :uniq => true
  has_many :authors,      :through => :contributions, :source => :author, :conditions => "contributions.role = 'author'"
  has_many :editors,      :through => :contributions, :source => :editor, :conditions => "contributions.role = 'editor'"
  has_many :illustrators, :through => :contributions, :source => :illustrator, :conditions => "role = 'contributions.illustrator'"
  has_many :proofreaders, :through => :contributions, :source => :proofreader, :conditions => "role = 'contributions.proofreader'"
end

Here I've added some special associations to make it easier to access contributors by role. I'm assuming there will only be one contribution record for each role, so there's no :uniq option needed. With the above associations we can say book.contributors and book.authors, and both will return collections of contributors with no duplicates.

A word about performance...

The :uniq option removes duplicates in Ruby code, not in the database query. If you have a large number of duplicates, it might be better to use the :select option to tell the database to remove duplicates using the DISTINCT keyword. Like so:

class Book < ActiveRecord::Base
  has_many :contributions, :dependent => :destroy
  has_many :contributors, :through => :contributions, :select => "DISTINCT contributors.*"
end

I find using this approach a tad messy, as you have to explicitly include the name of the table in the select option, which isn't very DRY. I'd love to see a :distinct option that could be used like the one in counters. (I've looked into what it would take to implement that, but the association code is some of the nastiest code in ActiveRecord, and I'm not brave enough to try a change like that yet.)

And remember, as always, the best way to decide between using :uniq => true and :select => "DISTINCT..." is to run performance measurements on your application. Only the data can tell you which way is best for you, or if there's even enough of a difference to matter.

11 commentsassociations, rails

Comments
  1. Ted2006-05-06 19:09:29

    As usual, great stuff. Thanks for the heads-up.

  2. Fredrik2006-05-08 15:39:25

    Why both :uniq and :distinct? They should behave the same, why not just change :uniq to use SELECT DISTINCT?

  3. Josh Susser2006-05-08 15:50:53

    @Fredrik: It's harder to change the generated SQL to use SELECT DISTINCT than to remove duplicates from the result set in Ruby, so it might take a while (if ever) before someone implements that. And as I said, it may not always be the best way to go. I like having the choice.

  4. dylan2006-05-08 17:43:26

    once again, thanks for digging into josh. you should have a paypal link at the end of each of these posts ;)

  5. Josh Susser2006-05-09 10:56:20

    @dylan: thanks. I'm not much for the tip-jar thing, but if I manage to get a book published this year you can buy a copy, ok? :-)

  6. demerzel@gmail.com2006-05-30 12:39:48

    Hi Josh, Sorry about the late comment, but I couldn't get something to work.

    You use the :conditions on the belongto association and the :source on the hasmany association to retrieve contributors by role. Is there anything more than what you show here needed to get something like that working?

    I have a similar hmt relationship between a user and group model, through a membership join model that has an is_admin field. I tried adding an association like you did to retrieve admin users of a group, but i keep getting just all the users (as evidenced by the SQL query in the log too). What could be wrong?

    (I know this is a long shot, but this is driving me nuts. The model code is at http://rafb.net/paste/results/O7TVAg78.html )

  7. Josh Susser2006-05-30 12:56:26

    @demerzel: Oh darn, I thought I'd fixed that. It turns out that conditions on the source don't get used for filtering the query results. You need to put the conditions in the through association itself, as you can see in the updated example now.

    It would be nice is source conditions were used in the through association queries, but that code is freakishly complex and no one has been brave enough to get it to work yet.

  8. demerzel2006-05-30 13:15:08

    Wow thanks. I have been desperately refreshing this page for some answer. I can finally sleep now:-)

    It works for me now, and yeah, it'd have been neat if it worked the way you wrote it first. Are the conditions on the source even necessary now (since they don't have anything to do with the filtering, as it appears)?

    Thanks again!

  9. Josh Susser2006-05-30 13:32:20

    @demerzel: I guess those source conditions can just go away, since they aren't much help on a belongsto. I guess I was thinking they'd be useful for working directly with the join model, but that would mainly be of use if it was a hasmany association in the join model. I'll remove them from the example for clarity. Good catch.

  10. Matt2007-01-17 20:40:14

    Thanks for this. I tried doing this with :findersql option, but since findincollection is not added to the result it didn't work very well for me. Strange that uniq isn't listed as an option for hasmany in the rails framework documentation. It is for hasandbelongs_to_many.

  11. Josh Susser2007-01-18 16:44:54

    Matt: Why would you want a :uniq option for has_many? There is no way to get duplicates in a 1-N mapping like has_many. Each related record has only one foreign key to the base record, so there's no way one could appear in the has_many collection more than once.

Sorry, comments for this article are closed.