标准差预置答案

【问题标题】：standard deviation presto标准差预置
【发布时间】：2022-01-11 19:17:47
【问题描述】：

我想计算avg_total_orders_last_30_days using the avg_total_orders_last_12_months 的标准差。

示例表

customer_id | avg_total_orders_last_30_days | avg_total_orders_last_12_months

939           103                             94
441           107                             118
082           313                             293

这是我迄今为止尝试过的：

select 
    customer_id
    avg_total_orders_last_30_days,
    avg_total_orders_last_12_months,
    approx_distinct(SUM(avg_total_orders_last_12_months)) OVER (partition by customer_id ) as stdev_rep
from table
group by 1

【问题讨论】：

目前我还没有完全理解你的数据结构。相同的customer_id 是否有可能具有不同的avg_total_orders_last_30_days 的不同行？
没有这个表是客户级别的，每个客户只有一个 avg_total_orders_last_30_days 和 avg_total_orders_last_12_months 的值
那你为什么要应用任何分组？
另外，您能否解释一下 avg_total_orders_last_30_days using the avg_total_orders_last_12_months 的标准偏差是什么意思，即，请给出一些输入和输出示例以及获得输出的公式，如果不是很明显。

标签： sql presto

【解决方案1】：

我认为这是您想要做的，但是您的 [avg_total_orders_last_12_months] 字段包含的数字太大而无法充当 approx_distinct 的“e”。

Approx Distinct Link

approx_distinct(x, e) → bigint#

返回不同输入值的近似数量。此函数提供 count(DISTINCT x) 的近似值。如果所有输入值为空，则返回零。此函数应产生不超过 e 的标准误差，这是所有可能集合上（近似正态）误差分布的标准差。它不保证任何特定输入集的错误上限。 这个函数的当前实现要求e在[0.0040625, 0.26000]的范围内。

如果您希望获得该字段的真实样本标准差，请使用 STDDEV(x)，如下所述：

Standard Deviation Link

【讨论】：