NickName:Patrick Ask DateTime:2016-12-01T09:17:12

MySQL Find distinct pair of values per group in time interval

I have the following table in MySQL:

CREATE TABLE `events` (
  `pv_name` varchar(60) COLLATE utf8mb4_bin NOT NULL,
  `time_stamp` bigint(20) unsigned NOT NULL,
  `event_type` varchar(40) COLLATE utf8mb4_bin NOT NULL,
  `has_data` tinyint(1) NOT NULL,
  `data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED;

ALTER TABLE `events`
 ADD PRIMARY KEY (`pv_name`,`time_stamp`), ADD KEY `has_data` (`has_data`,`pv_name`,`time_stamp`);

I have been struggling to construct an efficient query to find each pv_name that has at least one change in value in a given time interval.

I believe that the query I currently have is inefficient because it finds all of the distinct values in the given time interval for each pv_name, instead of stopping as soon as it finds more than one:

SELECT events.pv_name
FROM events
WHERE events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name
HAVING COUNT(DISTINCT JSON_EXTRACT(events.data, '$.value')) > 1;

To avoid this I am considering breaking the count and distinct parts into separate steps, since the documentation says that:

When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.

Is there an efficient query to find a pair of distinct values for each pv_name in a given time interval, that does not have to find all of the distinct values for each pv_name in a given time interval?

EDIT @Rick James

I am essentially trying to find a faster non cursor based solution for this:

SET @old_sql_mode=@@sql_mode, sql_mode='STRICT_ALL_TABLES';

DELIMITER //

DROP PROCEDURE IF EXISTS check_for_change;
CREATE PROCEDURE check_for_change(IN t0_in bigint(20) unsigned, IN t1_in bigint(20) unsigned)
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE current_pv_name VARCHAR(60);
    DECLARE cur CURSOR FOR SELECT DISTINCT pv_name FROM events;
    DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;

    SET @t0_in := t0_in;
    SET @t1_in := t1_in;


    IF @t0_in > @t1_in THEN
        SET @temp := @t0_in;
        SET @t0_in := @t1_in;
        SET @t1_in := @temp;
    END IF;


    DROP TEMPORARY TABLE IF EXISTS has_change;
    CREATE TEMPORARY TABLE has_change (
    pv_name varchar(60) NOT NULL,
    PRIMARY KEY (pv_name)
    ) ENGINE=Memory DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;


    OPEN cur;

    label1: LOOP
        FETCH cur INTO current_pv_name;

        IF done THEN
            LEAVE label1;
        END IF;

        INSERT INTO has_change
        SELECT current_pv_name
        FROM (
        SELECT DISTINCT JSON_EXTRACT(events.data, '$.value') AS distinct_value
        FROM events
        WHERE events.pv_name = current_pv_name
        AND events.has_data = 1
        AND events.time_stamp > @t0_in AND events.time_stamp < @t1_in
        LIMIT 2 ) AS t
        HAVING COUNT(t.distinct_value) = 2;
    END LOOP;

    CLOSE cur;
END //

DELIMITER ;

SET sql_mode=@old_sql_mode;

The optimization here is in the application of the limit on the number of distinct values to find for each pv_name.

Copyright Notice：Content Author:「Patrick」，Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/40901076/mysql-find-distinct-pair-of-values-per-group-in-time-interval