对 Postgres 表使用 n_distinct答案

【问题标题】：Using n_distinct for Postgres tables对 Postgres 表使用 n_distinct
【发布时间】：2017-11-16 21:43:40
【问题描述】：

我正在尝试使用 dplyr 和 dbplyr 与 Postgres 表进行交互，而不将数据收集（拉入）到 R 中。
如果表格的格式类似于下面的 x，我如何计算我定义的分组中不同值的计数？

例如，这个例子是模仿我想做的：

# Would actually be x = tbl(src = "postgres_conn", "x")

x = data.frame(
    a = c(1, 1, 2, 2, 3, 3),
    b = c(1, 1, 1, 2, 2, 2),
    c = c(1, 2, 3, 1, 2, 3)
)

> x
  a b c
1 1 1 1
2 1 1 2
3 2 1 3
4 2 2 1
5 3 2 2
6 3 2 3

x %>% group_by(a, b) %>% mutate(Count = n_distinct(c))

# Results
# A tibble: 6 x 4
# Groups:   a, b [4]
      a     b     c Count
  <dbl> <dbl> <dbl> <int>
1     1     1     1     2
2     1     1     2     2
3     2     1     3     1
4     2     2     1     1
5     3     2     2     2
6     3     2     3     2

如果我在 Postgres tbl 上使用 n_distinct(c)，则会收到以下错误：DISTINCT 未针对窗口函数实现。
我尝试了 length(unique(c))，它返回了一个语法错误。

尝试

sql('COUNT(DISTINCT(c))')

给我这个错误：

column "c" does not exist. HINT:  Perhaps you meant to reference the column "aresphukou.c."

但是，aresphukou 是一个任意临时表名称，每次运行查询时都会更改。

最后，我尝试了replyr_uniqueValues，但它似乎忽略了分组并为所有 Count 值返回值 1。

谁能建议如何解决这个问题？

【问题讨论】：

标签： r postgresql dplyr dbplyr

【解决方案1】：

在将近四年后，我偶然发现了这个问题。使用 dbplyr/dplyr 您可能无法获得所需的响应，但使用 PostgreSQL，您可以通过将子查询或 CTE（公用表表达式）的结果与 x 表连接来实现：

with y as (
select a, b, count(c) as n
from x
group by a,b
)
select a, b, c, n
from x
left join y using (a,b);

【讨论】：