[Solved] MySQL: Data structure for transitive relations

EverSQL Database Performance Knowledge Base

MySQL: Data structure for transitive relations

Database type:

I tried to design a data structure for easy and fast querying (delete, insert an update speed does not really matter for me).

The problem: transitive relations, one entry could have relations through other entries whose relations I don't want to save separately for every possibility.

Means--> I know that Entry-A is related to Entry-B and also know that Entry-B is related to Entry-C, even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it.

What I think the solution is:

Eliminating the transitive part when inserting, deleting or updating.

Entry:
    id
    representative_id

I would store them as sets, like group of entries (not mysql set type, the Math set, sorry if my English is wrong). Every set would have a representative entry, all of the set elements would be related to the representative element.

A new insert would insert the Entry and set the representative as itself.

enter image description here

If the newly inserted entry should be connected to another, I simply set the representative id of the newly inserted entry to the referred entry's rep.id.

Attach B to A

enter image description here

It doesn't matter, If I need to connect it to something that is not a representative entry, It would be the same, because every entry in the set would have the same rep.id.

Attach C to B

enter image description here

Detach B-C: The detached item would have become a representative entry, meaning it would relate to itself.

enter image description here

Detach B-C and attach C to X

enter image description here

Deletion: If I delete a non-representative entry, it is self explanatory. But deleting a rep.entry is harder a bit. I need to chose a new rep.entry for the set and set every set member's rep.id to the new rep.entry's rep.id.

So, delete A in this:

enter image description here

Would result this:

enter image description here

What do you think about this? Is it a correct approach? Am I missing something? What should I improve?

Edit: Querying: So, If I want to query every entry that is related to an certain entry, whose id i know:

SELECT * FROM entries a LEFT JOIN entries b ON (a.rep_id = b.rep_id) WHERE a.id = :id

SELECT * FROM AlkReferencia
WHERE rep_id=(SELECT rep_id FROM AlkReferencia
    WHERE id=:id);

About the application that requires this:

Basically, I am storing vehicle part numbers (references), one manufacturer can make multiple parts that can replace another and another manufacturer can make parts that are replacing other manufacturer's parts.

Reference: One manufacturer's OEM number to a certain product.

Cross-reference: A manufacturer can make products that objective is to replace another product from another manufacturer.

I must connect these references in a way, when a customer search for a number (doesn't matter what kind of number he has) I can list an exact result and the alternative products.

To use the example above (last picture): B, D and E are different products we may have in store. Each one has a manufacturer and a string name/reference (i called it number before, but it can be almost any character chain). If I search for B's reference number, I should return B as an exact result and D,E as alternatives.

So far so good. BUT I need to upload these reference numbers. I can't just migrate them from an ALL-IN-ONE database. Most of the time, when I upload references I got from a manufacturer (somehow, most of the time from manually, but I can use catalogs too), I only get a list where the manufacturer tells which other reference numbers point to his numbers.

Example.:

Asas filter manufacturer, "AS 1" filter has these cross references (means, replaces these):

GOLDEN SUPER  -->  1
ALFA ROMEO  -->  101000603000
ALFA ROMEO  -->  105000603007
ALFA ROMEO  -->  1050006040
RENAULT TRUCKS (RVI)  -->  122577600
RENAULT TRUCKS (RVI)  -->  1225961
ALFA ROMEO  -->  131559401
FRAD  -->  19.36.03/10
LANDINI  -->  1896000
MASSEY FERGUSON  -->  1851815M1
...

It would took ages to write all of the AS 1 references down, but there is many (~1500 ?). And it is ONE filter. There is more than 4000 filter and I need to store there references (and these are only the filters). I think you can see, I can't connect everything, but I must know that Alfa Romeo 101000603000 and 105000603007 are the same, even when I only know (AS 1 --> alfa romeo 101000603000) and (as 1 --> alfa romeo 105000603007). That is why I want to organize them as sets. Each set member would only connect to one other set member, with a rep_id, that would be the representative member. And when someone would want to (like, admin, when uploading these references) attach a new reference to a set member, I simply INSERT INTO References (rep_id,attached_to_originally_id,refnumber) VALUES([rep_id of the entry what I am trying to attach to],[id of the entry what I am trying to attach to], "16548752324551..");

Another thing: I don't need to worry about insert, delete, update speed that much, because it is an admin task in our system and will be done rarely.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Avoid Subqueries (query line: 7): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `AlkReferencia` ADD INDEX `alkreferencia_idx_id` (`id`);
The optimized query:
SELECT
        * 
    FROM
        AlkReferencia 
    WHERE
        AlkReferencia.rep_id = (
            SELECT
                AlkReferencia.rep_id 
            FROM
                AlkReferencia 
            WHERE
                AlkReferencia.id = :id
        )

Related Articles



* original question posted on StackOverflow here.