【问题标题】:Counting records with related records which appear first in a given date计算在给定日期首先出现的相关记录的记录
【发布时间】:2016-05-27 10:52:06
【问题描述】:

我有两个表,playersgames,创建如下:

CREATE TABLE IF NOT EXISTS `players` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

CREATE TABLE IF NOT EXISTS `games` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `player` int(11) NOT NULL,
  `played_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

我希望每天提取 3 个值:

  1. 当天创建的玩家数量
  2. 当天的玩家人数
  3. 当天第一次参加比赛的玩家人数

因此,假设玩家表如下所示:

+----+--------+---------------------+
| id | name   | created_at          |
+----+--------+---------------------+
|  1 | Alan   | 2016-02-01 00:00:00 |
|  2 | Benny  | 2016-02-01 06:00:00 |
|  3 | Calvin | 2016-02-02 00:00:00 |
|  4 | Dan    | 2016-02-03 00:00:00 |
+----+--------+---------------------+

游戏桌如下:

+----+--------+---------------------+
| id | player | played_at           |
+----+--------+---------------------+
|  1 |      1 | 2016-02-01 01:00:00 |
|  2 |      3 | 2016-02-02 02:00:00 |
|  3 |      2 | 2016-02-03 14:00:00 |
|  4 |      3 | 2016-02-03 17:00:00 |
|  5 |      3 | 2016-02-03 18:00:00 |
+----+--------+---------------------+

那么查询应该返回类似

+------------+-----+--------+-------+
| day        | new | played | first |
+------------+-----+--------+-------+
| 2016-02-01 | 2   | 1      | 1     |
| 2016-02-02 | 1   | 1      | 1     |
| 2016-02-03 | 1   | 2      | 1     |
+------------+-----+--------+-------+

我有一个解决方案 1 (new):

SELECT Date(created_at) AS day,
       Count(*)         AS new
FROM   players
GROUP  BY day;  

这很容易。我想我也有 2 的解决方案(played),感谢MySQL COUNT DISTINCT:

select Date(played_at) AS day,
       Count(Distinct player) AS played
FROM   games
GROUP  BY day;

但我不知道如何获得 3 所需的结果(first)。我也不知道如何将所有内容放在一个查询中,以节省执行时间(games 表可能包含数百万条记录)。


如果您需要,这里有一个插入示例数据的查询:

INSERT INTO `players` (`id`, `name`, `created_at`) VALUES
(1, 'Alan', '2016-02-01 00:00:00'),
(2, 'Benny', '2016-02-01 06:00:00'),
(3, 'Calvin', '2016-02-02 00:00:00'),
(4, 'Dan', '2016-02-03 00:00:00');

INSERT INTO `games` (`id`, `player`, `played_at`) VALUES
(1, 1, '2016-02-01 01:00:00'),
(2, 3, '2016-02-02 02:00:00'),
(3, 2, '2016-02-03 14:00:00'),
(4, 3, '2016-02-03 17:00:00'),
(5, 3, '2016-02-03 18:00:00');

【问题讨论】:

    标签: mysql datetime join


    【解决方案1】:

    一个版本是将所有相关数据放入一个联合中并从那里进行分析;

    SELECT SUM(type='P') new, 
           COUNT(DISTINCT CASE WHEN type='G' THEN pid END) played, 
           SUM(type='F') first 
    FROM (
      SELECT id pid, DATE(created_at) date, 'P' type FROM players 
      UNION ALL 
      SELECT player, DATE(played_at) date,  'G' FROM games 
      UNION ALL 
      SELECT player, MIN(DATE(played_at)),  'F' FROM games GROUP BY player
    ) z 
    GROUP BY date;
    

    在工会中;

    P 类型的记录是玩家创建统计数据。
    类型为G 的记录是玩家相关的游戏统计数据。
    类型为F 的记录是玩家第一次玩游戏的统计数据。

    【讨论】:

    • 这真是太棒了。就代码而言,它可能很简单。执行速度如何?
    • @Bach 使用基本索引,它似乎做得相当好,但对于大量数据,您可能需要考虑例如将日期与日期时间一起存储以避免所有 DATE() 转换和分组您可以在子查询中尽量减少生成的行数。
    • Joachim,感谢您的回答和解释!
    【解决方案2】:

    您可以根据 min(played_at) 计算临时表的结果并通过具有过滤器

    select count(player) from 
       (  select player, min(played_at)  
          from games 
          group by player 
          having min(played_at) = YOUR_GIVEN_DATE ) as t;
    

    【讨论】:

    • 首先,你可能想要Day(Min(played_at))。其次,这个解决方案可能需要对每个可能的日期进行查询,这是不可行的......
    【解决方案3】:

    这个查询会给你结果:

    select day,( select count(distinct(id)) from players where Date(created_at) = temp.day ) as no_created_at ,
    ( select count(distinct(player)) from games where Date(played_at) = temp.day) as no_played_at,
    ( select count(distinct(player)) from games  where Date(played_at) = 
    (select min(Date(played_at)) from games internal_games 
    where internal_games.player =games.player and Date(games.played_at) = temp.day )) as no_first_played_at
     from (
    SELECT Date(created_at) AS day     
    FROM   players
    GROUP  BY day 
    union 
    select Date(played_at) AS day
    FROM   games
    GROUP  BY day) temp 
    

    和输出:

    【讨论】:

    • 几乎精确,因为在2016-02-03 上玩的玩家数量是 2(至少对于我的示例数据)。
    • @Bach 我更新了它,你可以使用 distinct 和播放器的 id 来计数而不是计算所有行
    【解决方案4】:

    这是一个包含一堆子查询的解决方案,它解释了玩家可能是在没有游戏的日子创建的,反之亦然:

    select
        all_dates.date as day,
        ifnull(new.num, 0) as new,
        ifnull(players.num, 0) as players,
        ifnull(first.num, 0) as first
    from (
        select date(created_at) as date from players
        union
        select date(played_at) from games
    ) as all_dates
    left join (
        select date(created_at) as created_at_date, count(*) as num
        from players
        group by created_at_date
    ) as new on all_dates.date = new.created_at_date
    left join (
        select date(played_at) as played_at_date, count(distinct player) as num
        from games
        group by played_at_date
    ) as players on all_dates.date = players.played_at_date
    left join (
        select min_date, count(*) num
        from (
            select player, date(min(played_at)) as min_date
            from games
            group by player
        ) as players_first
        group by min_date
    ) as first on all_dates.date = first.min_date
    order by day;
    

    【讨论】:

    • 很好。它在没有比赛的日子里返回 NULL 而不是 0(与 Joachim 的方法不同)。
    猜你喜欢
    • 1970-01-01
    • 2012-03-08
    • 1970-01-01
    • 1970-01-01
    • 2016-05-18
    • 2021-04-06
    • 2015-04-25
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多