[Solved] Limit result rows for minimal time intervals for PostgreSQL

EverSQL Database Performance Knowledge Base

Limit result rows for minimal time intervals for PostgreSQL

Database type:

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.

I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.

I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.

The car has 4 status, driving / online / asleep / offline.

  1. If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
  2. If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
  3. Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
  4. If the car is in asleep/offline status, no data is recorded at all.

This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.

I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.

E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.

row     date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000

The current (problematic) method I am using is as follows:

SELECT $__time(t.date), t.battery_level AS "SOC [%]" 
FROM (
  SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
  FROM (
    SELECT battery_level, date
    FROM positions
    WHERE car_id = $car_id AND $__timeFilter(date)
    UNION ALL
    SELECT battery_level, date
    FROM charges c 
    JOIN charging_processes p ON p.id = c.charging_process_id
    WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
  ORDER BY date ASC) as t
  WHERE t.row % 100 = 0;

This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)

PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.

Thanks in advance!

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 19): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `date` is indexed, the index won’t be used as it’s wrapped with the function `$__timeFilter`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Avoid Calling Functions With Indexed Columns (query line: 30): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `date` is indexed, the index won’t be used as it’s wrapped with the function `$__timeFilter`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX charges_idx_charging_id ON "charges" ("charging_process_id");
CREATE INDEX charging_processes_idx_car_id ON "charging_processes" ("car_id");
CREATE INDEX positions_idx_car_id ON "positions" ("car_id");
The optimized query:
SELECT
        $__time(t.date),
        t.battery_level AS "SOC [%]" 
    FROM
        (SELECT
            data.date,
            data.battery_level,
            row_number() OVER (ORDER 
        BY
            data.date ASC) AS row 
        FROM
            (SELECT
                positions.battery_level,
                positions.date 
            FROM
                positions 
            WHERE
                positions.car_id = $car_id 
                AND $__timeFilter(positions.date) 
            UNION
            ALL SELECT
                battery_level,
                date 
            FROM
                charges c 
            JOIN
                charging_processes p 
                    ON p.id = c.charging_process_id 
            WHERE
                $__timeFilter(date) 
                AND p.car_id = $car_id
        ) AS data 
    ORDER BY
        data.date ASC) AS t 
WHERE
    t.row % 100 = 0

Related Articles



* original question posted on StackOverflow here.