[Solved] How can I form MySQL query to return required rows based on these 4 related tables?

EverSQL Database Performance Knowledge Base

How can I form MySQL query to return required rows based on these 4 related tables?

Database type:

I have a site where I am administering guitar courses. As part of a course, there are various content types, such as assignment, quiz, discussion topic.

A course itself, as well as content such as assignment, topic, and quiz are stored in same table. Here is example:

-- schema
CREATE TABLE posts (
    id INT NOT NULL,
    post_type VARCHAR(16) NOT NULL,
    post_title VARCHAR(64) NOT NULL,
    post_parent INT NOT NULL,
    PRIMARY KEY(id)
)
;

-- data
INSERT INTO posts
    (id, post_type, post_title, post_parent)
VALUES
    (1, 'course', 'Course1', 0),
    (2, 'course-topic', 'An Assignment for Course1', 0),
    (3, 'quiz', 'A Quiz for Course1', 0),
    (4, 'quiz', 'Another Quiz for Course1', 0),
    (5, 'forum', 'Required Discussions', 0),
    (6, 'topic', 'Topic for Course1 Forum', 5),
    (7, 'course-topic', 'Another Assignment for Course1', 0),
    (8, 'topic', 'Another Topic for Course1 Forum', 5),
    (9, 'course', 'Course2', 0),
    (10, 'quiz', 'A Quiz for Course2', 0),
    (11, 'quiz', 'Another Quiz for Course2', 0),
    (12, 'course-topic', 'An Assignment for Course2', 0),
    (13, 'forum', 'Required Discussions', 0),
    (14, 'topic', 'Topic for Course2 Forum', 13),
    (15, 'course-topic', 'Another Assignment for Course2', 0),
    (16, 'topic', 'Another Topic for Course2 Forum', 13),
    (17, 'other', 'Some Other Post', 0),
    (18, 'forum', 'Some Other Forum', 0),
    (19, 'topic', 'Topic for Other Forum', 18)
;

A quiz is its own content type. So in Posts Table it has post_type = quiz.

A required discussion has post_type = topic, post_parent = forum_id, where title of forum is 'Required Discussions' and it is a post item with post_type = forum

An assignment on the other hand is a post of type post_type=course-topic, and with taxonomy 'Assignment'.

So here are additional tables that set up an assignment taxonomy:

-- schema
CREATE TABLE taxonomy (
    taxonomy_id INT NOT NULL,
    description VARCHAR(32) NOT NULL,
    PRIMARY KEY(taxonomy_id)
)
;

-- data
INSERT INTO taxonomy
    (taxonomy_id, description)
VALUES
    (1, 'Some Category'),
    (2, 'Assignments'),
    (3, 'Some Other Category')
;

-- schema
CREATE TABLE taxonomy_relations (
    post_id INT NOT NULL,
    taxonomy_id VARCHAR(16) NOT NULL,
    PRIMARY KEY(post_id)
)
;

-- data
INSERT INTO taxonomy_relations
    (post_id, taxonomy_id)
VALUES
    (2, 2),
    (7, 2),
    (12, 2),
    (15, 2)
;

Now, to tie everything to a given course, there is a meta table with a meta_key related_course that is set for all content types.

-- schema
CREATE TABLE meta (
    meta_id INT NOT NULL AUTO_INCREMENT,
    post_id INT NOT NULL,
    meta_key VARCHAR(16) NOT NULL,
    meta_value VARCHAR(256) NOT NULL,
    PRIMARY KEY(meta_id)
)
;

-- data
INSERT INTO meta
    (meta_id, post_id, meta_key, meta_value)
VALUES
    (1, 1, 'some_key', 'some val'),
    (2, 1, 'some_otherkey', 'some other val'),
    (3, 2, 'another_key', 'and a value'),
    (4, 2, 'related_course', '1'),
    (5, 3, 'related_course', '1'),
    (6, 4, 'related_course', '1'),
    (7, 5, 'related_course', '1'),
    (8, 6, 'related_course', '1'),
    (9, 7, 'related_course', '1'),
    (10, 8, 'related_course', '1'),
    (11, 10, 'related_course', '9'),
    (12, 11, 'related_course', '9'),
    (13, 12, 'related_course', '9'),
    (14, 13, 'related_course', '9'),
    (15, 14, 'related_course', '9'),
    (16, 15, 'related_course', '9'),
    (17, 16, 'related_course', '9'),
    (18, 19, 'related_course', '1'),
;

Notice in above, that discussion topic with id 19 from posts table has a related_course meta value, but that it is not one of my defined required tasks. It is just some topic reply in some other forum for the course. The point being, there can be a lot of interrelated content here, but required tasks are as I defined above: 1) quizzes, 2) posts of type course-topic who have Assignment taxonomy, and 3) posts of type topic whose parent forum is titled 'Required Discussions'.

Lastly, when a user completes some task (assignment, quiz, or topic reply), I store info in a taskstatus table:

-- schema
CREATE TABLE taskstatus (
    id INT NOT NULL AUTO_INCREMENT,
    task_id INT NOT NULL,
    user_id INT NOT NULL,
    passed INT NOT NULL,
    PRIMARY KEY(id)
)
;

-- data
INSERT INTO taskstatus
    (id, task_id, user_id, passed)
VALUES
    (1, 2, 53, 1),
    (2, 4, 53, 0),
    (3, 6, 53, 1)
;

Then to track completion of required tasks I have a taskstatus table.

-- schema
CREATE TABLE taskstatus (
    id INT NOT NULL AUTO_INCREMENT,
    task_id INT NOT NULL,
    user_id INT NOT NULL,
    passed INT NOT NULL,
    PRIMARY KEY(id)
)
;

-- data
INSERT INTO taskstatus
    (id, task_id, user_id, passed)
VALUES
    (1, 6, 53, 1),
    (2, 4, 53, 0),
    (3, 2, 53, 1),
    (4, 11, 53, 1),
    (5, 2, 24, 1),
    (6, 4, 24, 1),
    (7, 6, 24, 0)
;

Here is sqlfiddle:

http://sqlfiddle.com/#!9/6430ff/4

This taskstatus table shows that user 53 completed the Assignment, one Quiz (though didn't pass), and Discussion Topic from the course with id 1. It also shows user 53 completed a Quiz from another course (with id 9 in posts table). This table gets populated only on completion of a task.

The table also shows status for user 24.

Now, finally, what I am after: I am trying to figure out if there is a single query that can get me task status for a given user and given course across all required task types - Assignments, Quizzes, Discussions.

In current example if targeting user_id 53 and course_id 1, the result should be

desired results:
--------------------------------------------
task_id   passed
2         1
3         NULL
4         0         
6         1
7         NULL
8         NULL

I suspect it will involve a bit of INNER JOINS, maybe a LEFT or RIGHT OUTER JOIN at some point to ensure I get that NULL row for a task that has no status in the taskstatus table.

But this is way beyond anything I have attempted in MySQL and just not sure how to tie all this together.

To get started, here are required quizzes and forum topics:

-- get quizzes and forum topics
SELECT tasks.id, tasks.post_type  FROM posts as tasks
INNER JOIN meta as pm ON pm.post_id = tasks.ID
LEFT JOIN posts as forums ON forums.id = tasks.post_parent
WHERE (pm.meta_key = 'related_course') 
  AND (pm.meta_value=1) 
  AND (tasks.post_type='quiz' OR (tasks.post_type='topic') AND (forums.post_title='Required Discussions'));
  
-- how to add assignments?

Now I have to figure out

  1. How to add assignments to the result rows
  2. Finally, how to show the task status as shown above in desired results.

Edit

This seems to work:

SELECT tasks.id, tasks.post_type, tstatus.passed FROM posts as tasks
LEFT JOIN meta as pm ON pm.post_id = tasks.ID
LEFT JOIN posts as forums ON forums.id = tasks.post_parent
LEFT JOIN taxonomy_relations AS taxrel ON taxrel.post_id = tasks.id
LEFT JOIN taxonomy AS tax ON taxrel.taxonomy_id = tax.taxonomy_id
LEFT JOIN taskstatus as tstatus ON tstatus.task_id = tasks.id AND tstatus.user_id=53
WHERE (pm.meta_key = 'related_course') 
  AND (pm.meta_value=1) 
  AND ((tasks.post_type='quiz') OR ((tasks.post_type='course-topic') AND (tax.description='Assignments')) OR ((tasks.post_type='topic') AND (forums.post_title='Required Discussions')) )

This produces

id  post_type   passed
2   course-topic    1
4   quiz            0
6   topic           1
3   quiz         (null)
7   course-topic (null)
8   topic        (null)

But I got this just by experimenting. I really have no idea how to really do this, and I noticed I get same results even by varying the JOIN types, which has me worried that some small detail can break this query.

How can I fine tune this?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid OR Conditions By Using UNION (modified query below): In mosts cases, filtering using the OR operator cannot be applied using indexes. A more optimized alternative will be to split the query to two parts combined with a UNION clause, while each query holds one part of the original OR condition.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Prefer Inner Join Over Left Join (modified query below): We identified that one or more left joined entities (e.g. `meta`) are used in the 'where' clause, in a way that allows to replace it with an optimized inner join. Inner joins can be fully optimized by the database, while Left joins apply limitations on the database's optimizer.
  4. Prefer Inner Join Over Left Join (modified query below): We identified that one or more left joined entities (e.g. `posts`) are used in the 'where' clause, in a way that allows to replace it with an optimized inner join. Inner joins can be fully optimized by the database, while Left joins apply limitations on the database's optimizer.
  5. Prefer Inner Join Over Left Join (modified query below): We identified that one or more left joined entities (e.g. `taxonomy`) are used in the 'where' clause, in a way that allows to replace it with an optimized inner join. Inner joins can be fully optimized by the database, while Left joins apply limitations on the database's optimizer.
  6. Remove Redundant Left Joins (modified query below): Redundant LEFT JOINs (e.g. `taxonomy`) were detected in the query. Removing them will result in a performance improvement. In some cases, JOINs become redundant after an optimization is applied, such as when converting OR conditions to a UNION clause.
  7. Remove Redundant Left Joins (modified query below): Redundant LEFT JOINs (e.g. `taxonomy_relations`) were detected in the query. Removing them will result in a performance improvement. In some cases, JOINs become redundant after an optimization is applied, such as when converting OR conditions to a UNION clause.
  8. Remove Redundant Left Joins (modified query below): Redundant LEFT JOINs (e.g. `posts`) were detected in the query. Removing them will result in a performance improvement. In some cases, JOINs become redundant after an optimization is applied, such as when converting OR conditions to a UNION clause.
  9. Use UNION ALL instead of UNION (query line: 65): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
ALTER TABLE `meta` ADD INDEX `meta_idx_meta_key_meta_value` (`meta_key`,`meta_value`);
ALTER TABLE `posts` ADD INDEX `posts_idx_post_type_post_title_id` (`post_type`,`post_title`,`ID`);
ALTER TABLE `posts` ADD INDEX `posts_idx_post_type_id` (`post_type`,`ID`);
ALTER TABLE `taskstatus` ADD INDEX `taskstatus_idx_user_id_task_id` (`user_id`,`task_id`);
ALTER TABLE `taxonomy` ADD INDEX `taxonomy_idx_description_taxonomy_id` (`description`,`taxonomy_id`);
ALTER TABLE `taxonomy_relations` ADD INDEX `taxonomy_relations_idx_post_id` (`post_id`);
The optimized query:
SELECT
        tasks_id,
        tasks_post_type,
        tstatus_passed 
    FROM
        ((SELECT
            tasks.id AS tasks_id,
            tasks.post_type AS tasks_post_type,
            tstatus.passed AS tstatus_passed 
        FROM
            posts AS tasks 
        INNER JOIN
            meta AS pm 
                ON pm.post_id = tasks.ID 
        INNER JOIN
            posts AS forums 
                ON forums.id = tasks.post_parent 
        LEFT JOIN
            taskstatus AS tstatus 
                ON tstatus.task_id = tasks.id 
                AND tstatus.user_id = 53 
        WHERE
            (
                pm.meta_key = 'related_course'
            ) 
            AND (
                pm.meta_value = 1
            ) 
            AND (
                (
                    (
                        tasks.post_type = 'topic'
                    ) 
                    AND (
                        forums.post_title = 'Required Discussions'
                    )
                )
            )) 
    UNION
    DISTINCT (SELECT
        tasks.id AS tasks_id,
        tasks.post_type AS tasks_post_type,
        tstatus.passed AS tstatus_passed 
    FROM
        posts AS tasks 
    INNER JOIN
        meta AS pm 
            ON pm.post_id = tasks.ID 
    LEFT JOIN
        taxonomy_relations AS taxrel 
            ON taxrel.post_id = tasks.id 
    INNER JOIN
        taxonomy AS tax 
            ON taxrel.taxonomy_id = tax.taxonomy_id 
    LEFT JOIN
        taskstatus AS tstatus 
            ON tstatus.task_id = tasks.id 
            AND tstatus.user_id = 53 
    WHERE
        (pm.meta_key = 'related_course') 
        AND (pm.meta_value = 1) 
        AND (((tasks.post_type = 'course-topic') 
        AND (tax.description = 'Assignments')))) 
UNION
DISTINCT (SELECT
    tasks.id AS tasks_id,
    tasks.post_type AS tasks_post_type,
    tstatus.passed AS tstatus_passed 
FROM
    posts AS tasks 
INNER JOIN
    meta AS pm 
        ON pm.post_id = tasks.ID 
LEFT JOIN
    taskstatus AS tstatus 
        ON tstatus.task_id = tasks.id 
        AND tstatus.user_id = 53 
WHERE
    (pm.meta_key = 'related_course') 
    AND (pm.meta_value = 1) 
    AND ((tasks.post_type = 'quiz')))
) AS union1

Related Articles



* original question posted on StackOverflow here.