[Solved] Tuning Oracle SQL Query

EverSQL Database Performance Knowledge Base

Tuning Oracle SQL Query

Database type:

I've got this large transaction database that I'm trying to pull data out of. Basically one of the fields in the header table has a '0' in it, and each time I execute this stored procedure, I want to get n number of rows from the header that have a 0, but only those rows that meet some criteria, one of which is that a joined table (indexed, of course) has a matching value to a constant I designate.

Here is the query:

select /*+ FIRST_ROWS(1000)*/ T.ID        
    from HEADER T 
    JOIN (
          SELECT /*+ parallel(TC,4) FULL(TC) */ 
          tc.ID, tc.SOURCE_ID, tc.CUSTOMER_IDENTIFIER FROM
          SUBTABLE tc, (SELECT ID, MAX(IDENT_SEQUENCE_ID) as IDENT_SEQUENCE_ID FROM SUBTABLE
          WHERE SOURCE_ID = 9002 AND upper(trim(CUSTOMER_IDENTIFIER)) <> 'UNKNOWN'  GROUP BY ID) maxtc
          WHERE maxtc.ID = tc.ID AND
          maxtc.CUST_IDENT_SEQUENCE_ID = tc.CUST_IDENT_SEQUENCE_ID
        ) cust
        ON t.ID = cust.ID
        where T.batch=0 and T.status=6  and rownum <= SOME_NUMBER_HERE;

I was hoping the "FIRST_ROWS" or the rownum limit would make this basically just look for the first "SOME_NUMBER_HERE" number of records and return, but instead it seems Oracle is scanning the whole table?

Anyway to make it run faster, by stopping the select statement inside the join, after finding a certain number of matching rows?

I've got indexes on CUSTOMER_IDENTIFIER & SOURCE_ID on the SUBTABLE, and BATCH/STATUS on the HEADER.

This seems to run sub-second when I've got a hundred thousand rows, but takes several minutes when running on multi-millions rows... Thanks in advance for any assistance...

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 20): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `CUSTOMER_IDENTIFIER` is indexed, the index won’t be used as it’s wrapped with the function `upper`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Avoid Correlated Subqueries (query line: 7): A correlated subquery is a subquery that contains a reference (column: ID) to a table that also appears in the outer query. Usually correlated queries can be rewritten with a join clause, which is the best practice. The database optimizer handles joins much better than correlated subqueries. Therefore, rephrasing the query with a join will allow the optimizer to use the most efficient execution plan for the query.
  3. Avoid Subqueries (query line: 13): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
The optimized query:
SELECT
        T.ID 
    FROM
        HEADER T 
    JOIN
        (
            SELECT
                tc.ID,
                tc.SOURCE_ID,
                tc.CUSTOMER_IDENTIFIER 
            FROM
                SUBTABLE tc,
                (SELECT
                    SUBTABLE.ID,
                    MAX(IDENT_SEQUENCE_ID) AS IDENT_SEQUENCE_ID 
                FROM
                    SUBTABLE 
                WHERE
                    SUBTABLE.SOURCE_ID = 9002 
                    AND upper(trim(SUBTABLE.CUSTOMER_IDENTIFIER)) <> 'UNKNOWN' 
                GROUP BY
                    SUBTABLE.ID) maxtc 
            WHERE
                maxtc.ID = tc.ID 
                AND maxtc.CUST_IDENT_SEQUENCE_ID = tc.CUST_IDENT_SEQUENCE_ID
            ) cust 
                ON t.ID = cust.ID 
        WHERE
            T.batch = 0 
            AND T.status = 6 
            AND rownum <= SOME_NUMBER_HERE

Related Articles



* original question posted on StackOverflow here.