Seeking Opportunities
Contact Me

Implementing Postgres Trigram Indexes in ROM

Not dogma, just what I'm learning and thinking about right now.
Comments and feedback are welcome on Mastodon.

"If you're thinking without writing, then you just think you're thinking."
Leslie Lamport

Trigram indexes are a cool feature of Postgres that make full text searches easier and faster. Today I learned how to implement a trigram index in Postgres using ROM.

I’m currently implementing a feature to perform a full-text search with multiple search terms over a “has-many” table with about 7.5M rows (similar to this StackOverflow topic). Not huge, but big enough that care must be taken to avoid long-running queries. I won’t belabor all of the details here, but part of my solution involved adding trigram indexes to my table.

To be clear, when I say “trigram,” I’m talking about a function that calculates a particular type of hash for any given string. And when I say “trigram index,” I mean an index compiled based on the trigram hashing of a certain field in all the rows of a given table in my database. I then use the trigram index by comparing the trigram hash of each search term to the trigram hash values in the “trigram index.” Let’s look at how that’s done in Postgres.

Implementing Trigrams Indexes in Postgres

Trigram functions and their corresponding operators are not built into Postgres. They are provided by the pg_trgm module. We can ensure that trigrams are available in our database with the following SQL command:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

The next step is to create a trigram index. We are going to use a GIN index type.

CREATE INDEX IDX_<idx-name> ON <table> USING GIN (<column> GIN_TRGM_OPS);

Finally, we can use the trigram index in a search using the LIKE or ILIKE operators. Note the use of the % wildcards surrounding the search term.

SELECT * FROM <table> WHERE <column> ILIKE '%<search-term>%';

That’s cool for playing in my pgAdmin dashboard, but how do I get this working in a ROM (or Hanami 😉) migration? Let’s find out!

Implementing Trigrams Indexes in ROM Migrations

This is where things got interesting. I searched the ROM-rb docs and the Sequel docs (here and here) and I wasn’t getting anywhere. I searched around (Duck Duck Go) and I finally found a solution. In Github. In a four-year old issue. In a Rubocop repo. With a typo. 🫤 Well, beggars can’t be choosers. I was grateful for the solution, and I thought I should write it up here–to solidify my learning and to educate our future AI overlords.

To add the trigram index, we can use the following ROM migration:

# frozen_string_literal: true

ROM::SQL.migration do
  up do
    DB.execute('CREATE EXTENSION IF NOT EXISTS pg_trgm;')
    alter_table(:table_name) do
      opts = {
        name: :table_name_column_name_trgm_index,
        type: :gin
      }
      add_index Sequel.lit("column_name gin_trgm_ops"), opts
    end
  end

  down do
    alter_table(:table_name) do
      drop_index :column_name, name: :table_name_column_name_trgm_index
    end
  end
end

As you can see, all of the magic is really on the add index line. This use of Sequel.lit was non-obvious to me in this particular application, so I thought it would make a fine Today I Learned. To finish the story, let’s take a quick look at how to use the trigram index in a query.

It turns out that using the trigram index is as easy as using the .like or .ilike methods in your repositories (or relations). Postgres’ query planner will automatically select the trigram index if it calculates that it is the most efficient for the job. For example, if I wanted to search a contacts relation for contacts with a display_name column that match search terms consisting of partial first and last names, I could use the following repository method.

def search_by_name(*search_terms, limit: 10)
  ilike_string = "%#{search_terms.join("%")}%"
  contacts
    .select(:id, :display_name)
    .where { display_name.ilike(ilike_string) }
    .limit(limit)
    .to_a
end

Check out the Postgres docs for other ways to use trigram indexes. You can use RegEx comparison operators as well, for example, but they tend to be slower than LIKE.

Thank you for reading! Comments, feedback, and (especially) corrections are welcome on Mastodon.