[Solved] Cassandra CQL3 update slow performance on a single wide row

EverSQL Database Performance Knowledge Base

Cassandra CQL3 update slow performance on a single wide row

I am attemping to use the following CQL3 statement to update a column family 50k times:

 update column_family
 set    value_1    = ?,   
        value_2    = ?,   
        value_3    = ?,   
        value_4    = ?    
 where  partition_key = ?                
 and    column_key    = ?;     

The important piece to state here is that the partition_key is the same for all 50k records.

I either send cassandra this query 50k times, or batch up 5000 at a time using BATCH ... APPLY BATCH; Either way, it takes roughly 10 minutes with no network latency to speak of. I know that the internal structure is one wide row. Is this why it is slow?

Also do I have the internal structure correct? If the CF creation CQL looks like this:

create table column_family (
    partition_key varchar,
    column_key uuid,
    value_1 int,
    value_2 timestamp,
    value_3 double,
    value_4 double,
    PRIMARY KEY(partition_key , column_key)               

Then my internal CF would have partition_key as a partition key, the column keys would be column_key(0)#value_1, column_key(0)#value_2, column_key(0)#value_3, column_key(0)#value_4, coulmn_key(1)#value_1 .......

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `column_family` ADD INDEX `column_family_idx_partition_key_column_key` (`partition_key`,`column_key`);
The optimized query:
        column_family.partition_key = ? 
        AND column_family.column_key = ?

Related Articles

* original question posted on StackOverflow here.