那是游乐场:
CREATE TABLE `feed`(
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`tm` INT UNSIGNED NOT NULL COMMENT 'timestamp',
`user_id` INT UNSIGNED NOT NULL COMMENT 'author id',
`image` VARCHAR(255) NOT NULL COMMENT 'posted image filename',
`group` INT UNSIGNED NULL DEFAULT NULL COMMENT 'post group',
PRIMARY KEY(`id`),
INDEX(`user_id`),
INDEX(`tm`,`group`)
);
我们想将时间上接近的帖子组合在一起。
首先,声明所需的粒度:时间接近度的阈值:
SET @granularity:=60*60;
每一行形成一个组,组 ID 与行 ID 匹配(也可以是时间戳):
SELECT `g`.`id` AS `group`
FROM `feed` `g`;
每个组包含来自同一用户的行,发布时间早于组形成者:
SELECT `g`.`id` AS `group`, `f`.*
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm`-@granularity AND `g`.`tm`
)
每一行属于多个组。对于每一行,我们选择最“广泛”的组:它具有最大的 rowId
SELECT MAX(`g`.`id`) AS `group`, `f`.*
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm`-@granularity AND `g`.`tm`
)
GROUP BY `f`.`id`
最近更新的组总是跳到顶部(如果您按group DESC 排序)。
但是,如果您希望组是持久的(例如,这样项目不会从一个组移动到另一个组),请使用 MIN 而不是 MAX:
SELECT MIN(`g`.`id`) AS `group`, `f`.*
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm` AND `g`.`tm`+@granularity
)
GROUP BY `f`.`id`
现在,我们将更新表的 group 列。
首先,MySQL 无法更新您正在读取的同一张表。我们需要一个临时表。
第二:我们只更新group列为NULL的行,或者晚于UNIX_TIMESTAMP()-2*@threshold发布的行:
CREATE TEMPORARY TABLE `_feedg`
SELECT MAX(`g`.`id`) AS `group`, `f`.`id`
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm`-@granularity AND `g`.`tm`
)
WHERE `f`.`group` IS NULL
OR `f`.`tm` >= (UNIX_TIMESTAMP()-2*@granularity)
GROUP BY `f`.`id`;
并更新group 列:
UPDATE `feed` `f` CROSS JOIN `_feedg` `g` USING(`id`)
SET `f`.`group` = `g`.`group`;
这是 SQLFiddle:http://sqlfiddle.com/#!2/be9ce/15