I'm working on an app now that has to import a big SQL file directly into MySQL to populate the tables with data exported from an older codebase. I don't entirely trust this data, so I wanted to run it all through my model validations to make sure nothing is broken. Corey Donohoe gave me a snippet for validating fixtures that got me started, which I spruced up and made it deal with STI and "paginate" large tables.
# file: validate_models.rake
# task: rake db:validate_models
namespace :db do
desc "Run model validations on all model records in database"
task :validate_models => :environment do
puts "-- records - model --"
Dir.glob(RAILS_ROOT + '/app/models/**/*.rb').each { |file| require file }
Object.subclasses_of(ActiveRecord::Base).select { |c|
c.base_class == c}.sort_by(&:name).each do |klass|
total = klass.count
printf "%10d - %s\n", total, klass.name
chunk_size = 1000
(total / chunk_size + 1).times do |i|
chunk = klass.find(:all, :offset => (i * chunk_size), :limit => chunk_size)
chunk.reject(&:valid?).each do |record|
puts "#{record.class}: id=#{record.id}"
p record.errors.full_messages
puts
end rescue nil
end
end
end
end
Now I can check to see if any of that imported data is invalid. Sweet.
After I got done writing the task I realized it is more generally useful than just for validating imported data. If I add new validations to my models, I can use this task to check the records already in the database. Wow, I suddenly feel so embarrassed that I never thought to do that before.
This is great although I had some trouble when it tried to load my model observer files. You can change line 7 to work around this and ignore 'em:
Slick. Just out of curiosity, how large is this legacy database and how many hours does it take this tool to run? :-)
This is really nice. I'm in the process of moving a site from PHP into Rails. The database was, fortunately, designed with conventions very close to the Rails convention, so some minor tweaks and using Dr. Nic's Magic Models has made development really fast.
Now add that I can walk the db with the new validations is great. Already caught a bunch of small errors that would have gotten me later on.
I ran this tool against 216MB of sql data and it took about 5 minutes on my MacBook Pro.
My company does mostly java (but I don't) and I'm trying to figure out whether this is even possible in java. Is it?
Lisp, of course, can do it - as might some other languages - but I'm sure C++ can't and pretty sure java can't. I think you could generate some java code by iterating over the directory structure - if you had a convention for model classes and their table - but doing it in one step like this seems impossible. What do you all think?
Thanks Josh, I working on a project that has several large legacy db's that I need import into the new MySQL database that I am building; you have saved me alot time with your work. Thanks again!
@johnb: I don't see why this wouldn't be possible in Java, assuming you had a system that allowed models to self-validate. You don't need to generate any code on the fly, just walk some data structures and send methods.
@carlos: Good luck with the import. I'm learning a lot about how to make that work well, since I'm not a guru about SQL and db performance. For instance, my import was going at about one record per second. I added some indexes with unique constraints and sped things up by a factor of nearly 1000.
I made the following modification, to deal with inheritance hierarchies:
This is pretty awesome.
I did have one addition as well. In the in the |klass| iteration, I put:
When I ran this originally, I did not have sessions handled by the database so it broke on me
Wow just what I was looking for I too made a script to import a big legacy database but instead of making a big SQL script I used ruby DBI to make fixtures from my own models from an old Paraox database making all th necesary mappings.
Then just load data using db:fixture:load rake task