[Solved] Optimize data retrieval from different servers with more than 10M records

EverSQL Database Performance Knowledge Base

Optimize data retrieval from different servers with more than 10M records

SELECT party_code , max(date) AS date    FROM     
server1.table1 WITH (nolock) GROUP  BY party_code    

UNION    
SELECT party_code , max(date) AS date   FROM     
server2.table1 WITH (nolock) GROUP  BY party_code    

UNION    
SELECT party_code , max(date) AS date    FROM     
server3.table1 WITH (nolock) GROUP  BY party_code 

Like shown above I have similarly 17 tables on different servers, so I union them to get records. The total data sums up to more than 36 crores (360 millions) which effects the database execution time and ability to retrieve records. Can someone help me as to how to optimize this. Or any other solution to it.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Use UNION ALL instead of UNION (query line: 17): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
ALTER TABLE `table1` ADD INDEX `table1_idx_party_code` (`party_code`);
The optimized query:
SELECT
        server1.table1.party_code,
        max(date) AS date 
    FROM
        server1.table1 WITH (NOLOCK) 
    GROUP BY
        server1.table1.party_code 
    UNION
    SELECT
        server2.table1.party_code,
        max(date) AS date 
    FROM
        server2.table1 WITH (NOLOCK) 
    GROUP BY
        server2.table1.party_code 
    UNION
    SELECT
        server3.table1.party_code,
        max(date) AS date 
    FROM
        server3.table1 WITH (NOLOCK) 
    GROUP BY
        server3.table1.party_code

Related Articles



* original question posted on StackOverflow here.