[Solved] Graph problems: connect by NOCYCLE prior replacement in SQL server?

EverSQL Database Performance Knowledge Base

Graph problems: connect by NOCYCLE prior replacement in SQL server?

Database type:

question:

I have the following (directed) graph: Graph

And this table:

CREATE TABLE [dbo].[T_Hops](
    [UID] [uniqueidentifier] NULL,
    [From] [nvarchar](1000) NULL,
    [To] [nvarchar](1000) NULL,
    [Distance] [decimal](18, 5) NULL
) ON [PRIMARY]

GO

And this content:

      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'A'              ,'E'              ,10.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'E'              ,'D'              ,20.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'A'              ,'B'              ,5.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'B'              ,'C'              ,10.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'C'              ,'D'              ,5.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'A'              ,'F'              ,2.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'F'              ,'G'              ,6.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'G'              ,'H'              ,3.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'H'              ,'D'              ,1.00000              );   

Now I can query the best connection from point x to point y like this:

WITH AllRoutes 
(
     [UID]
    ,[FROM]
    ,[To]
    ,[Distance]
    ,[Path]
    ,[Hops]
)
AS
(
    SELECT 
         [UID]
        ,[FROM]
        ,[To]
        ,[Distance]
        ,CAST(([dbo].[T_Hops].[FROM] + [dbo].[T_Hops].[To]) AS varchar(MAX)) AS [Path]
        ,1 AS [Hops]
      FROM [dbo].[T_Hops]
      WHERE [FROM] = 'A'

    UNION ALL


    SELECT 
         [dbo].[T_Hops].[UID]
        --,[dbo].[T_Hops].[FROM]
        ,Parent.[FROM]
        ,[dbo].[T_Hops].[To]
        ,CAST((Parent.[Distance] + [dbo].[T_Hops].[Distance]) AS [decimal](18, 5)) AS distance
        ,CAST((Parent.[Path] + '/' + [dbo].[T_Hops].[FROM] + [dbo].[T_Hops].[To]) AS varchar(MAX)) AS [Path]
        ,(Parent.[Hops] + 1) AS [Hops]
     FROM [dbo].[T_Hops]

    INNER JOIN AllRoutes AS Parent 
            ON Parent.[To] = [dbo].[T_Hops].[FROM] 

)

SELECT TOP 100 PERCENT * FROM AllRoutes


/*
WHERE [FROM] = 'A' 
AND [To] = 'D'
AND CHARINDEX('F', [Path]) != 0 -- via F
ORDER BY Hops, Distance ASC
*/

GO

Now I want to create an undirected graph, for that I can, for example, also get the path from D to A

I start with a most simple change, and just ad the reverse direction for HD.

INSERT INTO [dbo].[T_Hops]
           ([UID]
           ,[From]
           ,[To]
           ,[Distance])
     VALUES
           (newid() --<UID, uniqueidentifier,>
           ,'D' --<From, nvarchar(1000),>
           ,'H' --<To, nvarchar(1000),>
           ,1 --<Distance, decimal(18,5),>
           )
GO

Now, as expected, my query throws an exception:

Infinite recursion / max recursion level (100) exceeded

Because the number of possible connections is now infinite.

Now in Oracle you do the same thing with "connect by prior" instead of a tree. And if a cycle problem (infinite recursion) is possible, you just add NOCYCLE to CONNECT BY PRIOR, making it "CONNECT BY NOCYCLE PRIOR"

Now in MS-SQL, I fixed that behaviour by adding:

AND Parent.[Path] NOT LIKE '%' + [dbo].[T_Hops].[FROM] + '/%'

to the inner join clause, essentially emulating NOCYCLE.

However, as LIKE is basically strstr (or worse strcasestr), and thus maximally slower than checking an array of parent elements, I am extremely worried about performance.

After all, this is just an example, and I intend to basically add data for an entire country. So the end result would potentially be extremely slow.

Anybody else has a better (=faster) method of how to replace NOCYCLE in MS SQL ?

Or is this the point where I simply have no other option but switching to Oracle (for doing this in an acceptable speed) ?

Note: Any temp tables (large amount of data) solution will be slower, because the temp tables will be swapped to the HardDisk when there is not enough RAM (absolute certainty).

Same goes for any solution using functions and table-valued functions.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 26): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX t_hops_idx_from ON dbo.T_Hops (FROM);
The optimized query:
WITH AllRoutes ([UID], [FROM], [To], [Distance], [Path], [Hops]) AS (SELECT
        [dbo].[T_Hops].[UID],
        [dbo].[T_Hops].[FROM],
        [dbo].[T_Hops].[To],
        [dbo].[T_Hops].[Distance],
        CAST(([dbo].[T_Hops].[FROM] + [dbo].[T_Hops].[To]) AS varchar (MAX)) AS [Path],
        1 AS [Hops] 
    FROM
        [dbo].[T_Hops] 
    WHERE
        [dbo].[T_Hops].[FROM] = 'A' 
    UNION
    ALL SELECT
        [dbo].[T_Hops].[UID],
        Parent.[FROM],
        [dbo].[T_Hops].[To],
        CAST((Parent.[Distance] + [dbo].[T_Hops].[Distance]) AS [decimal] (18,
        5)) AS distance,
        CAST((Parent.[Path] + '/' + [dbo].[T_Hops].[FROM] + [dbo].[T_Hops].[To]) AS varchar (MAX)) AS [Path],
        (Parent.[Hops] + 1) AS [Hops] 
    FROM
        [dbo].[T_Hops] 
    INNER JOIN
        AllRoutes AS Parent 
            ON Parent.[To] = [dbo].[T_Hops].[FROM]) SELECT
            TOP 100 PERCENT * 
    FROM
        AllRoutes GO

Related Articles



* original question posted on StackOverflow here.