Raw SQL in Rails

Apr 13, 2015

Object relational mapping(ORM) is nice. Of the two ORMs that I’ve used, Laravel’s Eloquent and Rails’ ActiveRecord, I’ve never failed to be amazed at how simple they make CRUD. They sanitize inputs, quietly cache queries, and ‘magically’ link records with associations/relationships.

However, I found that ORMs are limited precisely to their definition. That is to say, they are object relational mappings. They map a database record to an object or vice versa. If you start using them for functions other than CRUD, the code quickly starts looking like spaghetti.

Analytics and migrations are two areas that I’ve found raw sql to be beneficial. Analytic queries are frequently multiple sub-queries across multiple tables, and they return results that do not map to any specific object. Using raw sql in migrations that update many records gives you the benefit of not having to instantiate any objects.

I’m gonna use the example of migrations to demonstrate the performance of raw sql vs ActiveRecord. The scenario is moving the user_website column from user_profiles to users. When using ActiveRecord, we are tempted to do something like this.

class User < ActiveRecord::Base
  has_one :user_profile
end

class UserProfile < ActiveRecord::Base
  belongs_to :user
end

class AddUserWebsiteToUsers < ActiveRecord::Migration
  def up
    add_column :users, :user_website, :string

    User.find_each do |user|
      user.user_website = user.user_profile.user_website
      user.save!
    end
  end

  def down
    remove_column :users, :user_website
  end
end

However, there’s a problem with this migration that is not at first obvious. It runs 2*N queries for the number of users that exists in the database, N to load the users, and N to load the associations. It also hogs memory when it initializes the User and UserProfile objects. Wrapping the block in a transaction could speed it up, but it will still be orders of magnitude slower than the raw query shown below.

class AddUserWebsiteToUsers < ActiveRecord::Migration
  def up
    add_column :users, :user_website, :string

    execute <<-SQL
      UPDATE users as u
      SET user_website = p.user_website
      FROM user_profiles as p
      WHERE u.id = p.user_id;
    SQL
  end

  def down
    remove_column :users, :user_website
  end
end

For benchmarking, I set up my database with the resulting schema with 10000 records in each table.

ActiveRecord::Schema.define(version: 20150416010143) do
  enable_extension "plpgsql"

  create_table "user_profiles", force: :cascade do |t|
    t.integer "user_id"
    t.string  "user_website"
  end

  create_table "users", force: :cascade do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
    t.string   "user_website"
  end
end

Instead of actually running the 2 migrations, I used the following rake tasks.

namespace :benchmark do
  task :rawsql => :environment do
    puts Benchmark.measure {
      ActiveRecord::Base.connection.execute <<-SQL
        UPDATE users as u
        SET user_website = p.user_website
        FROM user_profiles as p
        WHERE u.id = p.user_id;
      SQL
    }
  end

  task :activerecord => :environment do
    puts Benchmark.measure {
      User.find_each do |user|
        user.user_website = user.user_profile.user_website
        user.save!
      end
    }
  end
end

And here are the results:

  user system total real
ActiveRecord 17.700000 1.180000 18.880000 (42.011080)
Raw Sql 0.070000 0.010000 0.080000 (0.124420)

The results shouldn’t be surprising. Databases were designed to manipulate data like this. One analogy I have for comparing ORM and raw sql is a delivery man. If you use N queries with the ORM, it’s like giving the delivery man one package at a time and telling him the address. He has to finish one package delivery before coming back and start the next. Yet if you give him everything at once, he can plan out a quicker route and not waste time going back and forth.