[Solved] Index not used with many WHERE LOWER(column) IN (...) conditions, works fine with just a few
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Index not used with many WHERE LOWER(column) IN (...) conditions, works fine with just a few

Database type:

I have the following table:

CREATE TABLE public.devices
(
  id integer NOT NULL DEFAULT nextval('devices_id_seq'::regclass),
  token text NOT NULL,
  ...some other columns...,
  CONSTRAINT devices_pkey PRIMARY KEY (id, token)
)
CREATE INDEX IF NOT EXISTS idx_devices_lower_token ON devices (lower(token));

If I run the following query, everything is fine, the query runs instantly, and the index is used:

SELECT * 
FROM "devices"  
WHERE (id > '0') 
AND ((lower(token) IN ('92134f4bb6a2263fd044a5a04440bc1a','cbf7d4a094bd266fd3f1b7ef4196172a', 'aa0f7650cf28010b5c01b77ae206f7fb'))) 
ORDER BY id LIMIT 1000;
"Limit  (cost=58116.96..58119.46 rows=1000 width=345)"
"  ->  Sort  (cost=58116.96..58189.47 rows=29005 width=345)"
"        Sort Key: id"
"        ->  Bitmap Heap Scan on devices  (cost=1061.91..56526.65 rows=29005 width=345)"
"              Recheck Cond: (lower(token) = ANY ('{92134f4bb6a2263fd044a5a04440bc1a,cbf7d4a094bd266fd3f1b7ef4196172a}'::text[]))"
"              Filter: (id > 0)"
"              ->  Bitmap Index Scan on idx_devices_lower_token  (cost=0.00..1054.65 rows=29005 width=0)"
"                    Index Cond: (lower(token) = ANY ('{92134f4bb6a2263fd044a5a04440bc1a,cbf7d4a094bd266fd3f1b7ef4196172a}'::text[]))"

However, if I add a lot of conditions (e.g. 1k) for the (lower(token) IN (...)), the index will no longer be used anymore and the query will be VERY slow:

SELECT * 
FROM "devices"  
WHERE (id > '0') 
AND ((lower(token) IN ('92134f4bb6a2263fd044a5a04440bc1a','cbf7d4a094bd266fd3f1b7ef4196172a', 'aa0f7650cf28010b5c01b77ae206f7fb', ... up to 1k tokens ...))) 
ORDER BY id 
LIMIT 1000;
"Limit  (cost=3900016.07..3900018.57 rows=1000 width=345)"
"  ->  Sort  (cost=3900016.07..3907219.10 rows=2881214 width=345)"
"        Sort Key: id"
"        ->  Seq Scan on devices  (cost=0.00..3742042.21 rows=2881214 width=345)"
"              Filter: ((id > 0) AND (lower(token) = ANY ('{92134f4bb6a2263fd044a5a04440bc1a,cbf7d4a094bd266fd3f1b7ef4196172a,6116af468e0017ffe5c49309869be773,4a5ebc1cec6c452e74fbf1a4e22fe804,0a9981eaf530d22ad9fe12a73a0b486c,61eb008e86a5afeaa9edd3e56ba01e0b,a31775a866571d9c2ecadc6b2a8127a3,dfcb4bc34771883bb2a54ee44299b058,7c93581a24053021bd41b7cc4654bf9e,43182b0010cea255fef1d10849858d7c,c3f6503eafeaa1d25a6f5672310ef4e5,672641b1fcd23f67deab34deaf28f1d8,cb52e085618ecbdf6eef1669b7b38008,335626a32455e61e1b1de16550e43af0, (...)"

Ok so, it must not like many conditions, right?

Well, I tried replicating the same scenario but without using lower(token), and it works correctly:

create index IF NOT EXISTS idx_devices_token on devices (token);

SELECT * 
FROM "devices"  
WHERE (id > '0') 
AND ((token IN ('92134f4bb6a2263fd044a5a04440bc1a','cbf7d4a094bd266fd3f1b7ef4196172a', 'aa0f7650cf28010b5c01b77ae206f7fb', ... 1k tokens ...))) 
ORDER BY id 
LIMIT 1000;
"Limit  (cost=25.70..25.71 rows=3 width=345)"
"  ->  Sort  (cost=25.70..25.71 rows=3 width=345)"
"        Sort Key: id"
"        ->  Bitmap Heap Scan on devices  (cost=13.69..25.68 rows=3 width=345)"
"              Recheck Cond: (token = ANY ('{92134f4bb6a2263fd044a5a04440bc1a,cbf7d4a094bd266fd3f1b7ef4196172a,aa0f7650cf28010b5c01b77ae206f7fb}'::text[]))"
"              Filter: (id > 0)"
"              ->  Bitmap Index Scan on idx_devices_token  (cost=0.00..13.69 rows=3 width=0)"
"                    Index Cond: (token = ANY ('{92134f4bb6a2263fd044a5a04440bc1a,cbf7d4a094bd266fd3f1b7ef4196172a,aa0f7650cf28010b5c01b77ae206f7fb}'::text[]))"

So this seems to only happen if I use an index with lower and many WHERE IN conditions.

I have no idea what is going on here, any ideas?

I am using PostgreSQL 9.5

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 11): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `token` is indexed, the index won’t be used as it’s wrapped with the function `lower`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  4. Use Numeric Column Types For Numeric Values (query line: 7): Referencing a numeric value (e.g. 0) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
Optimal indexes for this query:
CREATE INDEX devices_idx_id ON "devices" ("id");
The optimized query:
SELECT
        * 
    FROM
        "devices" 
    WHERE
        (
            "devices".id > '0'
        ) 
        AND (
            (
                lower("devices".token) IN (
                    '92134f4bb6a2263fd044a5a04440bc1a', 'cbf7d4a094bd266fd3f1b7ef4196172a', 'aa0f7650cf28010b5c01b77ae206f7fb'
                )
            )
        ) 
    ORDER BY
        "devices".id LIMIT 1000

Related Articles



* original question posted on StackOverflow here.