I am attemping to use the following CQL3 statement to update a column family 50k times:
update column_family
set value_1 = ?,
value_2 = ?,
value_3 = ?,
value_4 = ?
where partition_key = ?
and column_key = ?;
The important piece to state here is that the partition_key is the same for all 50k records.
I either send cassandra this query 50k times, or batch up 5000 at a time using BATCH ... APPLY BATCH; Either way, it takes roughly 10 minutes with no network latency to speak of. I know that the internal structure is one wide row. Is this why it is slow?
Also do I have the internal structure correct? If the CF creation CQL looks like this:
create table column_family (
partition_key varchar,
column_key uuid,
value_1 int,
value_2 timestamp,
value_3 double,
value_4 double,
PRIMARY KEY(partition_key , column_key)
);
Then my internal CF would have partition_key as a partition key, the column keys would be column_key(0)#value_1, column_key(0)#value_2, column_key(0)#value_3, column_key(0)#value_4, coulmn_key(1)#value_1 .......
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `column_family` ADD INDEX `column_family_idx_partition_key_column_key` (`partition_key`,`column_key`);
SELECT
column_family.value_1
FROM
column_family
WHERE
column_family.partition_key = ?
AND column_family.column_key = ?