引用 SQL SELECT 中的其他列答案

【问题标题】：Referencing other columns in a SQL SELECT引用 SQL SELECT 中的其他列
【发布时间】：2021-03-08 12:25:02
【问题描述】：

我在 BigQuery 中有一个 SQL 查询：

SELECT
  creator.country,
  (SUM(length) / 60) AS total_minutes,
  COUNT(DISTINCT creator.id) AS total_users,
  (SUM(length) / 60 / COUNT(DISTINCT creator.id)) AS minutes_per_user
FROM
  ...

您可能已经注意到最后一列等同于total_minutes / total_users。

我试过了，但它不起作用：

SELECT
  creator.country,
  (SUM(length) / 60) AS total_minutes,
  COUNT(DISTINCT creator.id) AS total_users,
  (total_minutes / total_users) AS minutes_per_user
FROM
  ...

有什么方法可以让这更简单吗？

【问题讨论】：

标签： sql google-bigquery

【解决方案1】：

不是真的。也就是说，您不能在同一个SELECT 的表达式中重复使用列别名。如果你真的想要，你可以使用子查询或 CTE：

SELECT c.*,
       total_minutes / total_users
FROM (SELECT creator.country,
             (SUM(length) / 60) AS total_minutes,
              COUNT(DISTINCT creator.id) AS total_users
      FROM
     ) c;

【讨论】：

【解决方案2】：

另一种选择是将度量计算的所有业务逻辑移动到 UDF 中（临时或永久取决于使用需求）...

create temp function custom_stats(arr any type) as ((
  select as struct    
    sum(length) / 60 as total_minutes,
    count(distinct id) as total_users,
    sum(length) / 60 / count(distinct id) as minutes_per_user
  from unnest(arr)
));

... 从而使查询本身保持简单且最少冗长 - 如下例所示

select creator.country,
  custom_stats(array_agg(struct(length, creator.id))).*
from `project.dataset.table`
group by country

【讨论】：