I have an application which I'm writing in Java with simple SQL, so no custom MySQL or SQL Server here - it might have to run on either. One data persist operation has to grab the data out of the DB, compare it with what has been submitted and then insert, update or delete accordingly.
I've improved the performance of the operation considerably by batching the JDBC calls.
So my INSERTs - I just call the Statement.addBatch()
method for the whole data set to be inserted, and the JDBC driver creates
INSERT INTO data (parentId, seriesDate, valueDate, value)
VALUES (a,b,c,d),(a,b,e,f),(a,b,g,h)... etc
The DELETEs - I just delete the whole lot with
DELETE FROM data WHERE parentId = a AND seriesDate = b;
and I can re-insert them. (It may be better to take another approach by composing a big long
DELETE FROM data WHERE (parentId = 1 AND seriesDate = b)
OR (parentId = 2 AND seriesDate = c)
OR (parentId = 3 AND seriesDate = d) ...
but that's not the issue here, my main problem is that the UPDATEs are really slow - twice as slow as the INSERTs
I get 1000 separate statements:
UPDATE data SET value = 4
WHERE parentId = 1 AND seriesDate = '' AND valueDate = '';
In SQL Server, the UPDATEs are just as quick as the INSERTs, but in MySQL I am seeing it run 10 x slower.
I am hoping I've forgotten some mutually compatible approach, or missed out on some JDBC connection configuration I need to adjust, maybe in conjunction with the number of items I'm putting in each batch.
[UPDATE 2018-05-17] Here's the requested DDL - and unfortunately I can't change this (yet) so any suggestions that involve schema changes won't help, at least not this year :(
CREATE TABLE data (
parentId INT UNSIGNED NOT NULL,
seriesDate DATE NOT NULL,
valueDate DATE NOT NULL,
value FLOAT NOT NULL,
versionstamp INT UNSIGNED NOT NULL DEFAULT 1,
createdDate DATETIME DEFAULT CURRENT_TIMESTAMP,
last_modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
CONSTRAINT pk_data PRIMARY KEY (parentId, seriesDate, valueDate),
CONSTRAINT fk_data_forecastid FOREIGN KEY (parentId)
REFERENCES forecast (id)
) MAX_ROWS 222111000;
CREATE TRIGGER trg_data_update BEFORE UPDATE ON data
FOR EACH ROW SET NEW.versionstamp = OLD.versionstamp + 1;
CREATE INDEX ix_data_seriesdate ON `data` (seriesDate);
The INSERT:
INSERT INTO `data` (`parentId`, `valueDate`, `value`, `seriesDate`)
VALUES (52031,'2010-04-20',1.12344,'2013-01-10')
EXPLAIN PLAN:
id: 1
select_type: INSERT
table: data
partitions:
type: ALL
possible_keys: PRIMARY,ix_data_seriesdate
and the UPDATE:
UPDATE `data` SET `value` = -2367.0
WHERE `parentId` = 52005 AND `seriesDate` = '2018-04-20' AND `valueDate` = '2000-02-11'
EXPLAIN PLAN:
id: 1
select_type: UPDATE
table: data
partitions:
type: range
possible_keys: PRIMARY,ix_data_seriesdate
key: PRIMARY
key_len: 10
ref: const,const,const
rows: 1
filtered: 100
Extra: Using where
and the DELETE:
DELETE FROM `data` WHERE `parentId` = 52030 AND `seriesDate` = '2018-04-20'
EXPLAIN PLAN:
id: 1
select_type: DELETE
table: data
partitions:
type: range
possible_keys: PRIMARY,ix_data_seriesdate
key: PRIMARY
key_len: 7
ref: const,const
rows: 1
filtered: 100
Extra: Using where
FYI 2 fields are updated automatically - last_modified
by the ON UPDATE
clause and versionstamp
by the trigger (and again, I can't ditch that functionality).
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `data` ADD INDEX `data_idx_parenti_a_seriesd_b` (`parentId`,`a`,`seriesDate`,`b`);
SELECT
1
FROM
data
WHERE
data.parentId = data.a
AND data.seriesDate = data.b