SkillAgentSearch skills...

Upsert

Upsert on MySQL, PostgreSQL, and SQLite3. Transparently creates functions (UDF) for MySQL and PostgreSQL; on SQLite3, uses INSERT OR IGNORE.

Install / Use

/learn @seamusabshere/Upsert
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Upsert

Build Status

Make it easy to upsert on traditional RDBMS like MySQL, PostgreSQL, and SQLite3—hey look NoSQL!. Transparently creates (and re-uses) stored procedures/functions when necessary.

You pass it a bare-metal connection to the database like Mysql2::Client (from mysql2 gem on MRI) or Java::OrgPostgresqlJdbc4::Jdbc4Connection (from jdbc-postgres on Jruby).

As databases start to natively support SQL MERGE (which is basically upsert), this library will take advantage (but you won't have to change your code).

Does not depend on ActiveRecord.

Does not use INSERT ON DUPLICATE KEY UPDATE on MySQL as this only works if you are very careful about creating unique indexes.

70–90%+ faster than emulating upsert with ActiveRecord.

Supports MRI and JRuby.

Usage

You pass a selector that uniquely identifies a row, whether it exists or not. You also pass a setter, attributes that should be set on that row.

Syntax inspired by mongo-ruby-driver's update method.

Basic

connection = Mysql2::Client.new([...])
table_name = :pets
upsert = Upsert.new connection, table_name
# N times...
upsert.row({:name => 'Jerry'}, :breed => 'beagle', :created_at => Time.now)

The created_at and created_on columns are used for inserts, but ignored on updates.

So just to reiterate you've got a selector and a setter:

selector = { :name => 'Jerry' }
setter = { :breed => 'beagle' }
upsert.row(selector, setter)

Batch mode

By organizing your upserts into a batch, we can do work behind the scenes to make them faster.

connection = Mysql2::Client.new([...])
Upsert.batch(connection, :pets) do |upsert|
  # N times...
  upsert.row({:name => 'Jerry'}, :breed => 'beagle')
  upsert.row({:name => 'Pierre'}, :breed => 'tabby')
end

Batch mode is tested to be about 80% faster on PostgreSQL, MySQL, and SQLite3 than other ways to emulate upsert (see the tests, which fail if they are not faster).

Native Postgres upsert

INSERT ... ON CONFLICT DO UPDATE is used when Postgres 9.5+ is detected and unique constraint are in place.

**Note: ** You must have a unique constraint on the column(s) you're using as a selector. A unique index won't work. See https://github.com/seamusabshere/upsert/issues/98#issuecomment-295341405 for more information and some ways to check.

If you don't have unique constraints, it will fall back to the classic Upsert gem user-defined function, which does not require a constraint.

ActiveRecord helper method

require 'upsert/active_record_upsert'
# N times...
Pet.upsert({:name => 'Jerry'}, :breed => 'beagle')

Wishlist

Pull requests for any of these would be greatly appreciated:

  1. Cache JDBC PreparedStatement objects.
  2. Sanity check my three benchmarks (four if you include activerecord-import on MySQL). Do they accurately represent optimized alternatives?
  3. Provide require 'upsert/debug' that will make sure you are selecting on columns that have unique indexes
  4. Test that Upsert instances accept arbitrary columns, even within a batch, which is what people probably expect.
  5. @antage's idea for "true" upserting: (from https://github.com/seamusabshere/upsert/issues/17)
selector = { id: 15 }
update_setter = { count: Upsert.sql('count + 1') }
insert_setter = { count: 1 }
upsert.row_with_two_setter(update_setter, insert_setter, selector)

Real-world usage

<p><a href="http://angel.co/faraday"><img src="https://s3.amazonaws.com/photos.angel.co/startups/i/175701-a63ebd1b56a401e905963c64958204d4-medium_jpg.jpg" alt="Faraday logo"/></a></p>

We use upsert for big data at Faraday. Originally written to speed up the data_miner data mining library.

Supported databases/drivers

<table> <tr> <th>*</th> <th>MySQL</th> <th>PostgreSQL</th> <th>SQLite3</th> </tr> <tr> <th>MRI</th> <td><a href="https://rubygems.org/gems/mysql2">mysql2</a></td> <td><a href="https://rubygems.org/gems/pg">pg</a></td> <td><a href="https://rubygems.org/gems/sqlite3">sqlite3</a></td> </tr> <tr> <th>JRuby</th> <td><a href="https://rubygems.org/gems/jdbc-mysql">jdbc-mysql</a></td> <td><a href="https://rubygems.org/gems/jdbc-postgres">jdbc-postgres</a></td> <td><a href="https://rubygems.org/gems/jdbc-sqlite3">jdbc-sqlite3</a></td> </tr> </table>

See below for details about what SQL MERGE trick (emulation of upsert) is used, performance, code examples, etc.

Rails / ActiveRecord

(Assuming that one of the other three supported drivers is being used under the covers).

  • add "upsert" to your Gemfile and
  • run bundle install
Upsert.new Pet.connection, Pet.table_name

Speed

Depends on the driver being used!

SQL MERGE trick

Depends on the driver being used!

MySQL

On MRI, use the mysql2 driver.

require 'mysql2'
connection = Mysql2::Connection.new(:username => 'root', :password => 'password', :database => 'upsert_test')
table_name = :pets
upsert = Upsert.new(connection, table_name)

On JRuby, use the jdbc-mysql driver.

require 'jdbc/mysql'
java.sql.DriverManager.register_driver com.mysql.jdbc.Driver.new
connection = java.sql.DriverManager.get_connection "jdbc:mysql://127.0.0.1/mydatabase?user=root&password=password"

Speed

From the tests (updated 11/7/12):

Upsert was 82% faster than find + new/set/save
Upsert was 85% faster than find_or_create + update_attributes
Upsert was 90% faster than create + rescue/find/update
Upsert was 46% faster than faking upserts with activerecord-import (note: in question as of 3/13/15, need some expert advice)

SQL MERGE trick

Thanks to Dennis Hennen's StackOverflow response!!

CREATE PROCEDURE upsert_pets_SEL_name_A_tag_number_SET_name_A_tag_number(`name_sel` varchar(255), `tag_number_sel` int(11), `name_set` varchar(255), `tag_number_set` int(11))
BEGIN
  DECLARE done BOOLEAN;
  REPEAT
    BEGIN
      -- If there is a unique key constraint error then
      -- someone made a concurrent insert. Reset the sentinel
      -- and try again.
      DECLARE ER_DUP_UNIQUE CONDITION FOR 23000;
      DECLARE ER_INTEG CONDITION FOR 1062;
      DECLARE CONTINUE HANDLER FOR ER_DUP_UNIQUE BEGIN
        SET done = FALSE;
      END;

      DECLARE CONTINUE HANDLER FOR ER_INTEG BEGIN
        SET done = TRUE;
      END;

      SET done = TRUE;
      SELECT COUNT(*) INTO @count FROM `pets` WHERE `name` = `name_sel` AND `tag_number` = `tag_number_sel`;
      -- Race condition here. If a concurrent INSERT is made after
      -- the SELECT but before the INSERT below we'll get a duplicate
      -- key error. But the handler above will take care of that.
      IF @count > 0 THEN
        -- UPDATE table_name SET b = b_SET WHERE a = a_SEL;
        UPDATE `pets` SET `name` = `name_set`, `tag_number` = `tag_number_set` WHERE `name` = `name_sel` AND `tag_number` = `tag_number_sel`;
      ELSE
        -- INSERT INTO table_name (a, b) VALUES (k, data);
        INSERT INTO `pets` (`name`, `tag_number`) VALUES (`name_set`, `tag_number_set`);
      END IF;
    END;
  UNTIL done END REPEAT;
END

PostgreSQL

On MRI, use the pg driver.

require 'pg'
connection = PG.connect(:dbname => 'upsert_test')
table_name = :pets
upsert = Upsert.new(connection, table_name)

On JRuby, use the jdbc-postgres driver.

require 'jdbc/postgres'
java.sql.DriverManager.register_driver org.postgresql.Driver.new
connection = java.sql.DriverManager.get_connection "jdbc:postgresql://127.0.0.1/mydatabase?user=root&password=password"

If you want to use HStore, make the pg-hstore gem available and pass a Hash in setters:

gem 'pg-hstore'
require 'pg_hstore'
upsert.row({:name => 'Bill'}, :mydata => {:a => 1, :b => 2})

PostgreSQL notes

  • Upsert doesn't do any type casting, so if you attempt to do something like the following: upsert.row({ :name => 'A Name' }, :tag_number => 'bob') you'll get an error which reads something like: invalid input syntax for integer: "bob"

Speed

From the tests (updated 9/21/12):

Upsert was 72% faster than find + new/set/save
Upsert was 79% faster than find_or_create + update_attributes
Upsert was 83% faster than create + rescue/find/update
# (can't compare to activerecord-import because you can't fake it on pg)

SQL MERGE trick

Adapted from the canonical PostgreSQL upsert example:

CREATE OR REPLACE FUNCTION upsert_pets_SEL_name_A_tag_number_SET_name_A_tag_number("name_sel" character varying(255), "tag_number_sel" integer, "name_set" character varying(255), "tag_number_set" integer) RETURNS VOID AS
$$
DECLARE
  first_try INTEGER := 1;
BEGIN
  LOOP
    -- first try to update the key
    UPDATE "pets" SET "name" = "name_set", "tag_number" = "tag_number_set"
      WHERE "name" = "name_sel" AND "tag_number" = "tag_number_sel";
    IF found THEN
      RETURN;
    END IF;
    -- not there, so try to insert the key
    -- if someone else inserts the same key concurrently,
    -- we could get a unique-key failure
    BEGIN
      INSERT INTO "pets"("name", "tag_number") VALUES ("name_set", "tag_number_set");
      RETURN;
    EXCEPTION WHEN unique_violation THEN
      -- seamusabshere 9/20/12 only retry

Related Skills

View on GitHub
GitHub Stars647
CategoryData
Updated3mo ago
Forks77

Languages

Ruby

Security Score

92/100

Audited on Dec 29, 2025

No findings