【问题标题】:BigQuery Count only last item by timestampBigQuery 仅按时间戳计算最后一项
【发布时间】:2020-08-25 22:26:44
【问题描述】:

我正在尝试加入表格并计算完成了多少“清单”

您会注意到id: 1 是:

  • 01-09:标记为完成
  • 01-10:标记为未完成
  • 01-11:又被标记为完成

因此,我的计数减少了 1。我希望只选择每个 id 的最后完成的操作。实际的响应应该是:

| Worksite   | Count |
| ---------- | ----- |
| worksite_1 | 4     |
| worksite_2 | 2     |

编辑

如果我想按日期分开计数,我想出了如何做到这一点。但是我还没有想出如果我想要一个 TOTAL 我会怎么做。这似乎适用于日期:

SELECT 
    DATE(ChecklistCompletions.ts) AS `DATE`,
    Checklists.worksite_id AS `Worksite`,
    COUNT(DISTINCT (CASE WHEN ChecklistCompletions.completed = 1 THEN 1 END)) AS `Count`
FROM Checklists
LEFT JOIN ChecklistCompletions
on Checklists.id = ChecklistCompletions.id
GROUP BY `Worksite`, `DATE`
ORDER BY `DATE` DESC

这是可以做到的吗?

注意:我只是将 MySQL 用作游乐场。我正在寻找 BigQuery Standard SQL 中的解决方案。


架构 (MySQL v5.7)

CREATE TABLE Checklists
    (`id` varchar(55), `uid` varchar(55), `worksite_id`  varchar(55), `ts` datetime)
;

CREATE TABLE ChecklistCompletions
    (`id` varchar(55), `uid` varchar(55), `completed` tinyint(1), `ts` datetime)
;

INSERT INTO ChecklistCompletions
    (`id`, `uid`, `completed`, `ts`)
    
VALUES
  ("1",     "u12345",   1, '2019-01-09 00:00:00'),
  ("1",     "u12345",   0, '2019-01-10 00:00:00'),
  ("1",     "u12345",   1, '2019-01-11 00:00:00'),
  ("2",     "u12345",   0, '2019-01-13 00:00:00'),
  ("3",     "u12345",   1, '2019-01-12 00:00:00'),
  ("4",     "u12345",   1, '2019-01-13 00:00:00'),
  ("5",     "u12345",   1, '2019-01-12 00:00:00'),
  ("6",     "u12345",   0, '2019-01-17 00:00:00'),
  ("7",     "u1",       1, '2019-01-10 00:00:00'),
  ("8",     "u1",       0, '2019-01-12 00:00:00'),
  ("9",     "u1",       1, '2019-01-15 00:05:00'),
  ("10",    "u1",       0, '2019-01-15 00:00:00')

;

INSERT INTO Checklists
    (`id`, `uid`, `worksite_id`, `ts`)
    
VALUES
  ("1",     "u12345",   "worksite_1", '2019-01-09 00:00:00'),
  ("2",     "u12345",   "worksite_2", '2019-01-13 00:00:00'),
  ("3",     "u12345",   "worksite_2", '2019-01-12 00:00:00'),
  ("4",     "u12345",   "worksite_1", '2019-01-13 00:00:00'),
  ("5",     "u12345",   "worksite_2", '2019-01-12 00:00:00'),
  ("6",     "u12345",   "worksite_1", '2019-01-17 00:00:00'),
  ("7",     "u1",       "worksite_1", '2019-01-10 00:00:00'),
  ("8",     "u1",       "worksite_1", '2019-01-12 00:00:00'),
  ("9",     "u1",       "worksite_1", '2019-01-15 00:05:00'),
  ("10",    "u1",       "worksite_2", '2019-01-15 00:00:00')
;

查询 #1

SELECT 
    Checklists.worksite_id AS `Worksite`,
    COUNT(CASE WHEN ChecklistCompletions.completed = 1 THEN 1 END) AS `Count`
FROM Checklists
LEFT JOIN ChecklistCompletions
on Checklists.id = ChecklistCompletions.id
GROUP BY `Worksite`;

| Worksite   | Count |
| ---------- | ----- |
| worksite_1 | 5     |
| worksite_2 | 2     |

View on DB Fiddle

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT Worksite, COUNTIF(completed = 1) completed
    FROM (
      SELECT 
          Checklists.worksite_id AS `Worksite`,
          ARRAY_AGG(completed ORDER BY completed DESC LIMIT 1)[OFFSET(0)] completed
      FROM `project.dataset.Checklists` Checklists
      LEFT JOIN `project.dataset.ChecklistCompletions` ChecklistCompletions
      ON Checklists.id = ChecklistCompletions.id
      GROUP BY Checklists.id, Worksite
    ) GROUP BY worksite
    

    如果应用到您的问题中的样本数据,您将得到结果(如预期的那样)

    Row Worksite    completed    
    1   worksite_1  4    
    2   worksite_2  2     
    

    你可以用下面的方式测试,玩上面的

    #standardSQL
    WITH `project.dataset.ChecklistCompletions` AS (
      SELECT "1" id,     "u12345" uid,   1 completed, TIMESTAMP '2019-01-09 00:00:00' ts UNION ALL
      SELECT "1",     "u12345",   0, '2019-01-10 00:00:00' UNION ALL
      SELECT "1",     "u12345",   1, '2019-01-11 00:00:00' UNION ALL
      SELECT "2",     "u12345",   0, '2019-01-13 00:00:00' UNION ALL
      SELECT "3",     "u12345",   1, '2019-01-12 00:00:00' UNION ALL
      SELECT "4",     "u12345",   1, '2019-01-13 00:00:00' UNION ALL
      SELECT "5",     "u12345",   1, '2019-01-12 00:00:00' UNION ALL
      SELECT "6",     "u12345",   0, '2019-01-17 00:00:00' UNION ALL
      SELECT "7",     "u1",       1, '2019-01-10 00:00:00' UNION ALL
      SELECT "8",     "u1",       0, '2019-01-12 00:00:00' UNION ALL
      SELECT "9",     "u1",       1, '2019-01-15 00:05:00' UNION ALL
      SELECT "10",    "u1",       0, '2019-01-15 00:00:00' 
    ), `project.dataset.Checklists` AS (
      SELECT "1" id,     "u12345" uid,   "worksite_1" worksite_id, TIMESTAMP '2019-01-09 00:00:00' ts UNION ALL
      SELECT "2",     "u12345",   "worksite_2", '2019-01-13 00:00:00' UNION ALL
      SELECT "3",     "u12345",   "worksite_2", '2019-01-12 00:00:00' UNION ALL
      SELECT "4",     "u12345",   "worksite_1", '2019-01-13 00:00:00' UNION ALL
      SELECT "5",     "u12345",   "worksite_2", '2019-01-12 00:00:00' UNION ALL
      SELECT "6",     "u12345",   "worksite_1", '2019-01-17 00:00:00' UNION ALL
      SELECT "7",     "u1",       "worksite_1", '2019-01-10 00:00:00' UNION ALL
      SELECT "8",     "u1",       "worksite_1", '2019-01-12 00:00:00' UNION ALL
      SELECT "9",     "u1",       "worksite_1", '2019-01-15 00:05:00' UNION ALL
      SELECT "10",    "u1",       "worksite_2", '2019-01-15 00:00:00' 
    )
    SELECT Worksite, COUNTIF(completed = 1) completed
    FROM (
      SELECT 
          Checklists.worksite_id AS `Worksite`,
          ARRAY_AGG(completed ORDER BY completed DESC LIMIT 1)[OFFSET(0)] completed
      FROM `project.dataset.Checklists` Checklists
      LEFT JOIN `project.dataset.ChecklistCompletions` ChecklistCompletions
      ON Checklists.id = ChecklistCompletions.id
      GROUP BY Checklists.id, Worksite
    ) GROUP BY worksite
    ORDER BY worksite
    

    【讨论】:

    • 再次感谢,一如既往!当不涉及工作地点时,您认为我的上述编辑适用于按日期吗?
    猜你喜欢
    • 1970-01-01
    • 2015-04-26
    • 2019-12-09
    • 2015-11-04
    • 2023-03-19
    • 1970-01-01
    • 2015-01-26
    • 2015-08-06
    • 2022-01-16
    相关资源
    最近更新 更多