[Solved] How to compare two comma separate fields and get the count in MySQL

EverSQL Database Performance Knowledge Base

How to compare two comma separate fields and get the count in MySQL

Database type:

Hi all I have a MySQL table that has a field of comma separated values

id    res
=============================
1     hh_2,hh_5,hh_6
------------------------------
2     hh_3,hh_5,hh_4
------------------------------
3     hh_6,hh_8,hh_7
------------------------------
4     hh_2,hh_7,hh_4
------------------------------

Please see the above example ,Actually i need to compare each row 'res' with other row's 'res' values and need to display count if they match with others. Please help me to get the count.

For example, IN first row 'hh_2' also exist in fourth row so we need count as 2, likewise we need to compare all in all rows

I Have run the function its working for me. but the table so big. It have million of records so my performance take time. While check one record with 50000 record take 25 sec. Suppose my input is 60 rows it take one hour. Please help me how to optimize.

   CREATE FUNCTION `combine_two_field`(s1 CHAR(96), s3 TEXT) RETURNS int(11)
    BEGIN
      DECLARE ndx INT DEFAULT 0;
      DECLARE icount INT DEFAULT 0;  
      DECLARE head1 char(10);
      DECLARE head2 char(10); 
      DECLARE head3 char(10);    
      WHILE ndx <= LENGTH(s1) DO
            SET head1 = SUBSTRING_INDEX(s3, ',', 1);    
            SET s3 = SUBSTRING(s3, LENGTH(head1) + 1 + @iSeparLen); 

            SET head2 = SUBSTRING_INDEX(s1, ',', 1);    
            SET s1 = SUBSTRING(s1, LENGTH(head2) + 1 + @iSeparLen);

            IF (head1 = head2) THEN 
                SET icount = icount + 1;
            END IF;         
       SET ndx = ndx + 1;   
      END WHILE;
      RETURN icount; 
    END 

And the table size is too big and i want to reduce fetching time also ...

UPDATE QUERY:

DROP PROCEDURE IF EXISTS `pcompare7` $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `pcompare7`(IN in_analysis_id INT(11))
BEGIN

drop  table  if exists `tmp_in_results`;
CREATE TEMPORARY TABLE `tmp_in_results` (
  `t_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `r_id` bigint(11) NOT NULL,
  `r_res` char(11) NOT NULL,
  PRIMARY KEY (`t_id`),
  KEY r_res (r_res)

)
ENGINE = InnoDB;

SELECT splite_snp(r_snp,id,ruid) FROM results  WHERE technical_status = 1 and critical_status = 1 and autosomal_status = 1 and gender_status != "NO CALL" and analys_id = in_analysis_id;


-- SELECT * FROM tmp_in_results;
-- COmpare Functionality 
SELECT a.t_id, b.id, SUM(IF(FIND_IN_SET(a.r_res, b.r_snp), 1, 0)) FROM tmp_in_results a CROSS JOIN results b GROUP BY a.t_id, b.id;
END $$

Function FOR CREATE TEMP TABLE:

  DROP FUNCTION IF EXISTS `splite_snp` $$
    CREATE DEFINER=`root`@`localhost` FUNCTION `splite_snp`(s1 TEXT, in_id bigint(96), ruid char(11)) RETURNS tinyint(1)
    BEGIN
      DECLARE ndx INT DEFAULT 0;
      DECLARE icount INT DEFAULT 0;
      DECLARE head1 TEXT;
      DECLARE head2 TEXT;
      DECLARE intpos1 char(10);
      DECLARE intpos2 char(10);
      DECLARE Separ char(3) DEFAULT ',';
      DECLARE iSeparLen INT;

      SET @iSeparLen = LENGTH( Separ );

      WHILE s1 != '' DO

            SET intpos1 = SUBSTRING_INDEX(s1, ',', 1);
            SET s1 = SUBSTRING(s1, LENGTH(intpos1) + 1 + @iSeparLen);

        INSERT INTO tmp_in_results(r_id,r_res) VALUES(in_id,intpos1);

      END WHILE;
      RETURN TRUE;
    END $$

New table structure

pc_input

id    in_res     in_id
=============================
1     hh_2     1000
------------------------------
2     hh_3     1000
------------------------------
3     hh_6     1001
------------------------------
4     hh_2     1001
------------------------------

res_snp

id    r_res     r_id
=============================
1     hh_2     999
------------------------------
2     hh_3     999
------------------------------
3     hh_9     999
------------------------------
4     hh_2     998
------------------------------
5     hh_6     998
------------------------------
6     hh_9     998
------------------------------

Result:

in_id    r_id     matches_count
=============================
1000     999     2 (hh_2,hh_3)
------------------------------
1000     998     1 (hh_2)
------------------------------
1001     999     1 (hh_2)
------------------------------
1001     998     2 (hh_2,hh_6)
------------------------------

I have add the separate index both table in_res,in_id and r_res and r_id

QUERY:

SELECT b.r_id,count(*) FROM  pc_input AS a INNER JOIN results_snps AS b ON (b.r_snp = a.in_snp) group by a.in_id,b.r_id;

But mysql server was freeze. Cloud you please suggest any other way or optimize my query.

EXPLAIN TABLE: res_snp

Field   Type    Null    Key     Default     Extra
id  bigint(11)  NO  PRI     NULL    auto_increment
r_snp   varchar(50) NO  MUL     NULL    
r_id    bigint(11)  NO  MUL     NULL    

EXPLAIN TABLE: pc_input

  Field     Type    Null    Key     Default     Extra
id  bigint(11)  NO  PRI     NULL    auto_increment
in_snp  varchar(55) NO  MUL     NULL    
in_id   bigint(11)  NO  MUL     NULL

Explain Query:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  a   ALL     in_snp  NULL    NULL    NULL    192     Using temporary; Using filesort
1   SIMPLE  b   ref     r_snp   r_snp   52  rutgers22042014.a.in_snp    2861    Using where0

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
Optimal indexes for this query:
ALTER TABLE `results_snps` ADD INDEX `results_snps_idx_r_snp` (`r_snp`);
The optimized query:
SELECT
        b.r_id,
        count(*) 
    FROM
        pc_input AS a 
    INNER JOIN
        results_snps AS b 
            ON (
                b.r_snp = a.in_snp
            ) 
    GROUP BY
        a.in_id,
        b.r_id 
    ORDER BY
        NULL

Related Articles



* original question posted on StackOverflow here.