[Solved] How do I increase the speed of a large series of UPDATEs in mySQL vs SQL Server?

EverSQL Database Performance Knowledge Base

How do I increase the speed of a large series of UPDATEs in mySQL vs SQL Server?

Database type:

I have an application which I'm writing in Java with simple SQL, so no custom MySQL or SQL Server here - it might have to run on either. One data persist operation has to grab the data out of the DB, compare it with what has been submitted and then insert, update or delete accordingly.

I've improved the performance of the operation considerably by batching the JDBC calls.

So my INSERTs - I just call the Statement.addBatch() method for the whole data set to be inserted, and the JDBC driver creates

INSERT INTO data (parentId, seriesDate, valueDate, value) 
    VALUES (a,b,c,d),(a,b,e,f),(a,b,g,h)... etc

The DELETEs - I just delete the whole lot with

DELETE FROM data WHERE parentId = a AND seriesDate = b;

and I can re-insert them. (It may be better to take another approach by composing a big long

DELETE FROM data WHERE (parentId = 1 AND seriesDate = b) 
    OR (parentId = 2 AND seriesDate = c) 
    OR (parentId = 3 AND seriesDate = d)  ...

but that's not the issue here, my main problem is that the UPDATEs are really slow - twice as slow as the INSERTs

I get 1000 separate statements:

UPDATE data SET value = 4 
    WHERE parentId = 1 AND seriesDate = '' AND valueDate = '';

In SQL Server, the UPDATEs are just as quick as the INSERTs, but in MySQL I am seeing it run 10 x slower.

I am hoping I've forgotten some mutually compatible approach, or missed out on some JDBC connection configuration I need to adjust, maybe in conjunction with the number of items I'm putting in each batch.

[UPDATE 2018-05-17] Here's the requested DDL - and unfortunately I can't change this (yet) so any suggestions that involve schema changes won't help, at least not this year :(

CREATE TABLE data (
  parentId INT UNSIGNED NOT NULL,
  seriesDate DATE NOT NULL,
  valueDate DATE NOT NULL,
  value FLOAT NOT NULL,
  versionstamp INT UNSIGNED NOT NULL DEFAULT 1,
  createdDate DATETIME DEFAULT CURRENT_TIMESTAMP,
  last_modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  CONSTRAINT pk_data PRIMARY KEY (parentId, seriesDate, valueDate),
  CONSTRAINT fk_data_forecastid FOREIGN KEY (parentId)
    REFERENCES forecast (id)
) MAX_ROWS 222111000;

CREATE TRIGGER trg_data_update BEFORE UPDATE ON data
  FOR EACH ROW SET NEW.versionstamp = OLD.versionstamp + 1;

CREATE INDEX ix_data_seriesdate ON `data` (seriesDate);

The INSERT:

INSERT INTO `data` (`parentId`, `valueDate`, `value`, `seriesDate`)
    VALUES (52031,'2010-04-20',1.12344,'2013-01-10')

EXPLAIN PLAN:
id: 1
select_type: INSERT
table: data
partitions:
type: ALL
possible_keys: PRIMARY,ix_data_seriesdate

and the UPDATE:

UPDATE `data` SET `value` = -2367.0
    WHERE `parentId` = 52005 AND `seriesDate` = '2018-04-20' AND `valueDate` = '2000-02-11'

EXPLAIN PLAN:
id: 1
select_type: UPDATE
table: data
partitions: 
type: range
possible_keys: PRIMARY,ix_data_seriesdate
key: PRIMARY
key_len: 10
ref: const,const,const
rows: 1
filtered: 100
Extra: Using where

and the DELETE:

DELETE FROM `data` WHERE `parentId` = 52030 AND `seriesDate` = '2018-04-20'

EXPLAIN PLAN:
id: 1
select_type: DELETE
table: data
partitions: 
type: range
possible_keys: PRIMARY,ix_data_seriesdate
key: PRIMARY
key_len: 7
ref: const,const
rows: 1
filtered: 100
Extra: Using where

FYI 2 fields are updated automatically - last_modified by the ON UPDATE clause and versionstamp by the trigger (and again, I can't ditch that functionality).

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `data` ADD INDEX `data_idx_parenti_a_seriesd_b` (`parentId`,`a`,`seriesDate`,`b`);
The optimized query:
SELECT
        1 
    FROM
        data 
    WHERE
        data.parentId = data.a 
        AND data.seriesDate = data.b

Related Articles



* original question posted on StackOverflow here.