Raw SQL in Rails
Apr 13, 2015
Object relational mapping(ORM) is nice. Of the two ORMs that I’ve used, Laravel’s Eloquent and Rails’ ActiveRecord, I’ve never failed to be amazed at how simple they make CRUD. They sanitize inputs, quietly cache queries, and ‘magically’ link records with associations/relationships.
However, I found that ORMs are limited precisely to their definition. That is to say, they are object relational mappings. They map a database record to an object or vice versa. If you start using them for functions other than CRUD, the code quickly starts looking like spaghetti.
Analytics and migrations are two areas that I’ve found raw sql to be beneficial. Analytic queries are frequently multiple sub-queries across multiple tables, and they return results that do not map to any specific object. Using raw sql in migrations that update many records gives you the benefit of not having to instantiate any objects.
I’m gonna use the example of migrations to demonstrate the performance of raw sql vs ActiveRecord. The scenario is moving the user_website
column from user_profiles
to users
. When using ActiveRecord, we are tempted to do something like this.
class User < ActiveRecord::Base
has_one :user_profile
end
class UserProfile < ActiveRecord::Base
belongs_to :user
end
class AddUserWebsiteToUsers < ActiveRecord::Migration
def up
add_column :users, :user_website, :string
User.find_each do |user|
user.user_website = user.user_profile.user_website
user.save!
end
end
def down
remove_column :users, :user_website
end
end
However, there’s a problem with this migration that is not at first obvious. It runs 2*N queries for the number of users that exists in the database, N to load the users, and N to load the associations. It also hogs memory when it initializes the User and UserProfile objects. Wrapping the block in a transaction could speed it up, but it will still be orders of magnitude slower than the raw query shown below.
class AddUserWebsiteToUsers < ActiveRecord::Migration
def up
add_column :users, :user_website, :string
execute <<-SQL
UPDATE users as u
SET user_website = p.user_website
FROM user_profiles as p
WHERE u.id = p.user_id;
SQL
end
def down
remove_column :users, :user_website
end
end
For benchmarking, I set up my database with the resulting schema with 10000 records in each table.
ActiveRecord::Schema.define(version: 20150416010143) do
enable_extension "plpgsql"
create_table "user_profiles", force: :cascade do |t|
t.integer "user_id"
t.string "user_website"
end
create_table "users", force: :cascade do |t|
t.string "name"
t.datetime "created_at"
t.datetime "updated_at"
t.string "user_website"
end
end
Instead of actually running the 2 migrations, I used the following rake tasks.
namespace :benchmark do
task :rawsql => :environment do
puts Benchmark.measure {
ActiveRecord::Base.connection.execute <<-SQL
UPDATE users as u
SET user_website = p.user_website
FROM user_profiles as p
WHERE u.id = p.user_id;
SQL
}
end
task :activerecord => :environment do
puts Benchmark.measure {
User.find_each do |user|
user.user_website = user.user_profile.user_website
user.save!
end
}
end
end
And here are the results:
user | system | total | real | |
---|---|---|---|---|
ActiveRecord | 17.700000 | 1.180000 | 18.880000 | (42.011080) |
Raw Sql | 0.070000 | 0.010000 | 0.080000 | (0.124420) |
The results shouldn’t be surprising. Databases were designed to manipulate data like this. One analogy I have for comparing ORM and raw sql is a delivery man. If you use N queries with the ORM, it’s like giving the delivery man one package at a time and telling him the address. He has to finish one package delivery before coming back and start the next. Yet if you give him everything at once, he can plan out a quicker route and not waste time going back and forth.