I have the following table in MySQL:
CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_bin NOT NULL,
`time_stamp` bigint(20) unsigned NOT NULL,
`event_type` varchar(40) COLLATE utf8mb4_bin NOT NULL,
`has_data` tinyint(1) NOT NULL,
`data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED;
ALTER TABLE `events`
ADD PRIMARY KEY (`pv_name`,`time_stamp`), ADD KEY `has_data` (`has_data`,`pv_name`,`time_stamp`);
I have been struggling to construct an efficient query to find each pv_name
that has at least one change in value in a given time interval.
I believe that the query I currently have is inefficient because it finds all of the distinct values in the given time interval for each pv_name
, instead of stopping as soon as it finds more than one:
SELECT events.pv_name
FROM events
WHERE events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name
HAVING COUNT(DISTINCT JSON_EXTRACT(events.data, '$.value')) > 1;
To avoid this I am considering breaking the count and distinct parts into separate steps, since the documentation says that:
When combining LIMIT row_count with DISTINCT, MySQL stops as soon as
it finds row_count unique rows.
Is there an efficient query to find a pair of distinct values for each pv_name
in a given time interval, that does not have to find all of the distinct values for each pv_name
in a given time interval?
EDIT @Rick James
I am essentially trying to find a faster non cursor based solution for this:
SET @old_sql_mode=@@sql_mode, sql_mode='STRICT_ALL_TABLES';
DELIMITER //
DROP PROCEDURE IF EXISTS check_for_change;
CREATE PROCEDURE check_for_change(IN t0_in bigint(20) unsigned, IN t1_in bigint(20) unsigned)
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE current_pv_name VARCHAR(60);
DECLARE cur CURSOR FOR SELECT DISTINCT pv_name FROM events;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;
SET @t0_in := t0_in;
SET @t1_in := t1_in;
IF @t0_in > @t1_in THEN
SET @temp := @t0_in;
SET @t0_in := @t1_in;
SET @t1_in := @temp;
END IF;
DROP TEMPORARY TABLE IF EXISTS has_change;
CREATE TEMPORARY TABLE has_change (
pv_name varchar(60) NOT NULL,
PRIMARY KEY (pv_name)
) ENGINE=Memory DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
OPEN cur;
label1: LOOP
FETCH cur INTO current_pv_name;
IF done THEN
LEAVE label1;
END IF;
INSERT INTO has_change
SELECT current_pv_name
FROM (
SELECT DISTINCT JSON_EXTRACT(events.data, '$.value') AS distinct_value
FROM events
WHERE events.pv_name = current_pv_name
AND events.has_data = 1
AND events.time_stamp > @t0_in AND events.time_stamp < @t1_in
LIMIT 2 ) AS t
HAVING COUNT(t.distinct_value) = 2;
END LOOP;
CLOSE cur;
END //
DELIMITER ;
SET sql_mode=@old_sql_mode;
The optimization here is in the application of the limit on the number of distinct values to find for each pv_name
.
Copyright Notice:Content Author:「Patrick」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/40901076/mysql-find-distinct-pair-of-values-per-group-in-time-interval