[Solved] Find all the leaf nodes below a subtree in a Tree structure in sql server

EverSQL Database Performance Knowledge Base

Find all the leaf nodes below a subtree in a Tree structure in sql server

Database type:

I've a tree structure, and its subsequent assignment table for customer categories in an sql server database.

CustomerCategory (CategoryID, ParentId)
CustomerInCategory(CustomerID, CategoryID)

If a CustomerCategory has any customer assigned to it, we can't add another subcategory to it. So, Customer can only be added to the lowest level in every sub tree. In other sense, the result of this query

SELECT * FROM `CustomerCategory` WHERE `CategoryId` NOT IN 
(SELECT DISTINCT `parentid` FROM `CustomerCategory` WHERE `parentid` IS NOT NULL)

would yield leaf nodes. The Other thing is that, this tree might have subtrees of different levels, and we also, don't want to bound the number of levels in anyway, however, our users won't need more than 10 levels. Consider this as an illustration

CategoryID------ParentID---------------Name
1               NULL                   All Customers
2               1                      Domestic
3               1                      International
4               2                      Independent Retailers
5               2                      Chain Retailers
6               2                      Whole Sellers
7               5                      A-Mart
8               5                      B-Mart
9               4                      Grocery Stores
10              4                      Restaurants
11              4                      Cafes

CustomerID---------CustomerName----------Category
1                  Int.Customer#1               3
2                  Int.Customer#2               3
3                  A-Mart.Branch#1              7
4                  A-Mart.Branch#2              7
5                  B-Mart.Branch#1              8
6                  B-Mart.Branch#2              8
7                  Grocery#1                    9
8                  Grocery#2                    9
9                  Grocery#3                    9
10                 Restaurant#1                 10
11                 Restaurant#2                 10
12                 Cafe#1                       11
13                 Wholeseller#1                6
14                 Wholeseller#2                6

My requirement is something like this, "Given a node in Categories, Return All the Customers attached to any node below it".

How can I do it with sql?

Obviously this can be done with a recursive call in the code, but how can we do it in t-sql (without calling a stored procedure several times or using text-based search)?

Can any body, Use a CTE to solve this problem?

I have a result set of something like this in mind

CustomerID--------Customer Name----------------CategoryId----------CAtegoryName

12                Cafe#1                       11                  Cafes
12                Cafe#1                       4                   IndependentRetailers
12                Cafe#1                       2                   Demoestic
12                Cafe#1                       1                   AllCustomers
.
.
.
4                 A-Mart.Branch#2              7                  A-Mart
4                 A-Mart.Branch#2              5                  Chain Retailers
4                 A-Mart.Branch#2              2                  Domestic
4                 A-Mart.Branch#2              1                  All Customers
.
.
.
14                 Wholeseller#2               6                  WholeSellers
14                 Wholeseller#2               2                  Domestic
14                 Wholeseller#2               1                  All Customers

This is not necessarily a good Idea to layout a result like this, This would consume too much space, something that might not be required, yet, a search in such result set would be very fast. If I want to find all the customers below say categoryId = 2 , I would simply query

SELECT * FROM resultset where category ID = 2

Any suggestions to improve the data model is super welcomed! If It helps solving this problem.

Once again, I'm not fixated on this result set. Any other Suggestion that solves the problem, "Given a node in Categories, Return All the Customers attached to any node below it", is well accepted.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Replace In Subquery With Correlated Exists (modified query below): In many cases, an EXISTS subquery with a correlated condition will perform better than a non correlated IN subquery.
Optimal indexes for this query:
CREATE INDEX customercategory_idx_categoryid_parentid ON CustomerCategory (CategoryId,parentid);
The optimized query:
SELECT
        * 
    FROM
        `CustomerCategory` 
    WHERE
        NOT EXISTS (
            SELECT
                DISTINCT 1 
            FROM
                `CustomerCategory` AS CustomerCategory1 
            WHERE
                (
                    CustomerCategory1.`parentid` IS NOT NULL
                ) 
                AND (
                    `CustomerCategory`.`CategoryId` = CustomerCategory1.`parentid`
                )
        )

Related Articles



* original question posted on StackOverflow here.