从 BigQuery 中的时间戳数据类型中获取每月前 3 个标签数量答案

【问题标题】：Get top 3 number of tags per month from timestamp data type in BigQuery从 BigQuery 中的时间戳数据类型中获取每月前 3 个标签数量
【发布时间】：2020-09-16 16:05:58
【问题描述】：

我有来自 BigQuery 的以下数据集：Stackoverflow，表 post_questions。

Schema of table

我想获得每月和每年的前 3 个标签。该数据集从 2008 年 8 月到 2020 年 5 月，并显示每天随时间变化的时间戳。

我最初的方法是计算标题并按标签年份和月份对它们进行分组，以便我知道哪些标签被问到的问题最多。但是，这会给我一个很长的查询结果，其中包含每个月的所有标签和计数（即使标签只有一个问题）。

我在网上看到有一个东西叫： ROW_NUMBER() OVER (PARTITION BY ...... ORDER BY .... DESC) AS rank.我尝试应用它，但是，我以前从未使用过它。我对 SQL 世界还很陌生。

这是我目前所拥有的：

SELECT
  tags,
  COUNT(title) AS number_of_times_used,
  EXTRACT(MONTH FROM creation_date) AS month,
  EXTRACT(YEAR FROM creation_date) AS year,
FROM
  `bigquery-public-data.stackoverflow.posts_questions`
GROUP BY year, month, tags

关于如何获得所需结果的任何建议？（类似这样）：

Year: 2008   Month: 1   Tags: Android   number_of_times_used: 500
Year: 2008   Month: 1   Tags: Apple     number_of_times_used: 460
Year: 2008   Month: 1   Tags: SQL       number_of_times_used: 400
Year: 2008   Month: 2   Tags: Apple     number_of_times_used: 760
Year: 2008   Month: 2   Tags: SQL       number_of_times_used: 300
Year: 2008   Month: 2   Tags: Python    number_of_times_used: 230

感谢您的帮助！

【问题讨论】：

标签： google-bigquery timestamp

【解决方案1】：

以下是 BigQuery 标准 SQL

#standardSQL
SELECT top_tags.*
FROM (
  SELECT 
    ARRAY_AGG(t ORDER BY number_of_times_used DESC LIMIT 3) top_tags
  FROM (
    SELECT
      EXTRACT(YEAR FROM creation_date) AS year,
      EXTRACT(MONTH FROM creation_date) AS month,
      tags,
      COUNT(title) AS number_of_times_used
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    GROUP BY year, month, tags
  ) t
  GROUP BY year, month
) t, t.top_tags
-- ORDER BY year DESC, month DESC, number_of_times_used DESC

结果

【讨论】：