[Solved] Get records with Same \"Text\" but different \"Code\"

EverSQL Database Performance Knowledge Base

Get records with Same \"Text\" but different \"Code\"

Database type:

I have a table like below (using SQL server 2008 R2 Enterprise):

create table Message
(
ID int Not Null Primary Key,
Text nvarchar(100) not null,
Code nvarchar(50) null
)

Insert Into Message Values (1,'Hello Everybody','T6/45')
Insert Into Message Values (2,'Hello Everybody',Null)
Insert Into Message Values (3,'Hello Everybody','T6/45')
Insert Into Message Values (4,'Hello Everybody','T6/45')
Insert Into Message Values (5,'Hello Everybody','T6/70')
Insert Into Message Values (6,'Hello Everybody','T6/70')
Insert Into Message Values (7,'Hello','T6/70')
Insert Into Message Values (8,'Hello','T6/45')
Insert Into Message Values (9,'Hello Everybody',Null)
Insert Into Message Values (10,'Hello Everybody','T6/70')
Insert Into Message Values (11,'Hello',Null)

I need to get all the records which are having same Text but different Code. And Code will not be Null .

So expected output will be:

Insert Into Message Values (1,'Hello Everybody','T6/45')
Insert Into Message Values (5,'Hello Everybody','T6/70')
Insert Into Message Values (7,'Hello','T6/70')
Insert Into Message Values (8,'Hello','T6/45')

I tried below query, but it is returning several rows:

select m1.* 
from Message M1 
With (nolock)
JOIN Message M2 
With (Nolock) 
On m1.Text=m2.Text
where (m1.ID<>m2.ID 
  and m1.Code<>m2.Code 
  and m1.Code is not null)

How can I get expected result?

Also, this is a sample DB. I need to run the query on a table having arount 50 Million records. So any optimized query will be of great help.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX message_idx_text_code ON Message (Text,Code);
The optimized query:
SELECT
        m1.* 
    FROM
        Message M1 WITH (NOLOCK) 
    JOIN
        Message M2 WITH (NOLOCK) 
            ON m1.Text = m2.Text 
    WHERE
        (
            m1.ID <> m2.ID 
            AND m1.Code <> m2.Code 
            AND m1.Code IS NOT NULL
        )

Related Articles



* original question posted on StackOverflow here.