聚合列上的奇异条件答案

【问题标题】：Singular condition on an aggregate column聚合列上的奇异条件
【发布时间】：2017-06-14 09:17:47
【问题描述】：

有点难以定义我想要达到的目标，但在这里尝试一下。我正在研究 redshift 并在以下示例 Table A 之上编写查询：

User ID ||  Active_in_Month  || Max_Months_On_Platform
1           1                   6
1           2                   6
1           5                   6
2           1                   3
2           3                   3

按“Active_in_Month”分组后，我想在 Table B 中获得以下输出：

Active_in_Month  ||   Active_Distinct_Users   ||   User_Cohorts
1                     2                            2
2                     1                            2
3                     1                            2
5                     1                            1

“Active_Distinct_Users”是一个简单的 COUNT(*)。但是，“User_Cohorts”的计算是我卡住的地方。该列应该代表平台上有多少用户最多处于“active_in_month”列中的值处于活动状态。例如，在 表 B 的第 1 行中，有两个用户的“Max_Months_on_Platform” > 1（在月份活跃）。 表 B 的第 5 行只有 1 个“User_Cohort”，因为只有 1 个用户的“平台上的最大月数”> 5 (Active_in_Month)。

希望这能解释我想要了解的内容。

【问题讨论】：

标签： mysql amazon-redshift

【解决方案1】：

我希望我已经了解计算 User_Cohorts 值的正确规则。请试试这个：

SELECT
    a.Active_in_Month
    , COUNT(*) AS Active_Distinct_Users
    , ( SELECT COUNT(DISTINCT user_id) +1
        FROM tablea a2
        WHERE a.Active_in_Month < a2.Max_Months_On_Platform
        AND a.user_id <> a2.user_id
    ) AS User_Cohorts
FROM tablea a
GROUP BY a.Active_in_Month
ORDER BY a.Active_in_Month;

样本

MariaDB [test]> SELECT
    ->     a.Active_in_Month
    ->     , COUNT(*) AS Active_Distinct_Users
    ->     , ( SELECT COUNT(DISTINCT user_id) +1
    ->         FROM tablea a2
    ->         WHERE a.Active_in_Month < a2.Max_Months_On_Platform
    ->         AND a.user_id <> a2.user_id
    ->     ) AS User_Cohorts
    -> FROM tablea a
    -> GROUP BY a.Active_in_Month
    -> ORDER BY a.Active_in_Month;
+-----------------+-----------------------+--------------+
| Active_in_Month | Active_Distinct_Users | User_Cohorts |
+-----------------+-----------------------+--------------+
|               1 |                     2 |            2 |
|               2 |                     1 |            2 |
|               3 |                     1 |            2 |
|               5 |                     1 |            1 |
+-----------------+-----------------------+--------------+
4 rows in set (0.00 sec)

MariaDB [test]>

【讨论】：

感谢 Bernd，但我收到一条错误消息，提示我还需要按 a.user_id 分组。
你可以先尝试SET sql_mode='';
可能是因为我在redshift而不是sql。我终于能够解决它，尽管我认为这不是最有说服力的解决方案。将解决方案添加到帖子中。感谢您的帮助！

【解决方案2】：

解决方案

使用以下方法解决了它，不确定它是否是最好的方法，但它完成了工作：

SELECT
    Active_in_Month,
    COUNT(DISTINCT user_id),
    ( SELECT 
SUM(number_of_customers)
          FROM (SELECT 
                  tbl_a2.Max_Months_On_Platform AS total,
                  COUNT(DISTINCT tbl_a2.user_id) AS number_of_customers
                FROM 
                  tbl_a AS tbl_a2
                GROUP BY tbl_a2.Max_Months_On_Platform
                )
            WHERE total + 1 >= tbl_a.Active_in_Month  
        ) AS total_customers

      FROM
        tbl_a

【讨论】：