[Solved] Dynamically add columns to query results via CakePHP 3 ORM queries
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Dynamically add columns to query results via CakePHP 3 ORM queries

Database type:

I'm trying to write a query using CakePHP 3.7 ORM where it needs to add a column to the result set. I know in MySQL this sort of thing is possible: MySQL: Dynamically add columns to query results

So far I've implemented 2 custom finders. The first is as follows:

// src/Model/Table/SubstancesTable.php
public function findDistinctSubstancesByOrganisation(Query $query, array $options)
{
    $o_id = $options['o_id'];

    $query = $this
        ->find()
        ->select('id')
        ->distinct('id')
        ->contain('TblOrganisationSubstances')
        ->where([
            'TblOrganisationSubstances.o_id' => $o_id,
            'TblOrganisationSubstances.app_id IS NOT' => null
        ])
        ->orderAsc('Substances.app_id')
        ->enableHydration(false);

    return $query;
}

The second custom finder:

// src/Model/Table/RevisionSubstancesTable.php
public function findProductNotifications(Query $query, array $options)
{
    $date_start = $options['date_start'];
    $date_end = $options['date_end'];

    $query = $this
        ->find()
        ->where([
            'RevisionSubstances.date >= ' => $date_start,
            'RevisionSubstances.date <= ' => $date_end
        ])
        ->contain('Substances')
        ->enableHydration(false);

    return $query;
}

I'm using the finders inside a Controller to test it out:

$Substances = TableRegistry::getTableLocator()->get('Substances');
$RevisionSubstances = TableRegistry::getTableLocator()->get('RevisionSubstances');

$dates = // method to get an array which has keys 'date_start' and 'date_end' used later.

$org_substances = $Substances->find('distinctSubstancesByOrganisation', ['o_id' => 123);

if (!$org_substances->isEmpty()) {

    $data = $RevisionSubstances
        ->find('productNotifications', [
            'date_start' => $dates['date_start'],
            'date_end' => $dates['date_end']
        ])
        ->where([
            'RevisionSubstances.substance_id IN' => $org_substances
        ])
        ->orderDesc('RevisionSubstances.date');

    debug($data->toArray());
}

The logic behind this is that I'm using the first custom finder to produce a Query Object which contains unique (DISTINCT in SQL) id fields from the substances table, based on a particular company (denoted by the o_id field). These are then fed into the second custom finder by implementing where(['RevisionSubstances.substance_id IN' ....

This works and gives me all the correct data. An example of the output from the debug() statement is as follows:

(int) 0 => [
    'id' => (int) 281369,
    'substance_id' => (int) 1,
    'date' => object(Cake\I18n\FrozenDate) {

        'time' => '2019-09-02T00:00:00+00:00',
        'timezone' => 'UTC',
        'fixedNowTime' => false

    },
    'comment' => 'foo',
    'substance' => [
        'id' => (int) 1,
        'app_id' => 'ID000001',
        'name' => 'bar',
        'date' => object(Cake\I18n\FrozenDate) {

            'time' => '2019-07-19T00:00:00+00:00',
            'timezone' => 'UTC',
            'fixedNowTime' => false

        }
    ]
],

The problem I'm having is as follows: Each of the results returned contains a app_id field (['substance']['app_id'] in the array above). What I need to do is perform a count (COUNT() in MySQL) on another table based on this, and then add that to the result set.

I'm unsure how to do this for a couple of reasons. Firstly, my understanding is that custom finders return Query Objects, but the query is not executed at this point. Because I haven't executed the query - until calling $data->toArray() - I'm unsure how I would refer to the app_id in a way where it could be referenced per row?

The equivalent SQL that would give me the required results is this:

SELECT COUNT (myalias.app_id) FROM (
    SELECT
        DISTINCT (tbl_item.i_id),
        tbl_item.i_name,
        tbl_item.i_code,
        tbl_organisation_substances.o_id,
        tbl_organisation_substances.o_sub_id,
        tbl_organisation_substances.app_id,
        tbl_organisation_substances.os_name
    FROM
        tbl_organisation_substances
    JOIN tbl_item_substances
        ON tbl_organisation_substances.o_sub_id = tbl_item_substances.o_sub_id 
    JOIN tbl_item
        ON tbl_item.i_id = tbl_item_substances.i_id
    WHERE
        tbl_item.o_id = 1
        AND
        tbl_item.date_valid_to IS NULL
        AND
        tbl_organisation_substances.app_id IS NOT NULL
    ORDER BY
        tbl_organisation_substances.app_id ASC
) AS myalias
WHERE myalias.app_id = 'ID000001'

This does a COUNT() where the app_id is ID000001.

So in the array I've given previously I need to add something to the array to hold this, e.g.

 'substance' => [
     // ...
 ],
 'count_app_ids' => 5

(Assuming there were 5 rows returned by the query above).

I have Table classes for all of the tables referred to in the above query.

So my question is, how do you write this using the ORM, and add the result back to the result set before the query is executed?

Is this even possible? The only other solution I can think of is to write the data (from the query I have that works) to a temporary table and then perform successive queries which UPDATE with the count figure based on the app_id. But I'm really not keen on that solution because there are potentially huge performance problems of doing this. Furthermore I'd like to be able to paginate my query so ideally need everything confined to 1 SQL statement, even if it's done across multiple finders.

I've tagged this with MySQL as well as CakePHP because I'm not even sure if this is achievable from a MySQL perspective although it does look on the linked SO post like it can be done? This has the added complexity of having to write the equivalent query using Cake's ORM.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries (query line: 4): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Push Filtering Conditions Into Subqueries (modified query below): Parts of the WHERE clause can pushed from the outer query to a subquery / union clause. Applying those conditions as early as possible will allow the database to scan less data and run the query more efficiently.
Optimal indexes for this query:
ALTER TABLE `tbl_item` ADD INDEX `tbl_item_idx_o_id_date_to_i_id` (`o_id`,`date_valid_to`,`i_id`);
ALTER TABLE `tbl_item_substances` ADD INDEX `tbl_substances_idx_o_id` (`o_sub_id`);
ALTER TABLE `tbl_organisation_substances` ADD INDEX `tbl_substances_idx_app_id` (`app_id`);
The optimized query:
SELECT
        COUNT(myalias.app_id) 
    FROM
        (SELECT
            DISTINCT (tbl_item.i_id),
            tbl_item.i_name,
            tbl_item.i_code,
            tbl_organisation_substances.o_id,
            tbl_organisation_substances.o_sub_id,
            tbl_organisation_substances.app_id,
            tbl_organisation_substances.os_name 
        FROM
            tbl_organisation_substances 
        JOIN
            tbl_item_substances 
                ON tbl_organisation_substances.o_sub_id = tbl_item_substances.o_sub_id 
        JOIN
            tbl_item 
                ON tbl_item.i_id = tbl_item_substances.i_id 
        WHERE
            (
                tbl_item.o_id = 1 
                AND tbl_item.date_valid_to IS NULL 
                AND tbl_organisation_substances.app_id IS NOT NULL
            ) 
            AND (
                tbl_organisation_substances.app_id = 'ID000001'
            ) 
        ORDER BY
            tbl_organisation_substances.app_id ASC) AS myalias 
    WHERE
        1 = 1

Related Articles



* original question posted on StackOverflow here.