将列中的重复行合并为逗号分隔值 - Google Query答案

【问题标题】：Combine duplicate rows in column as comma separated values - Google Query将列中的重复行合并为逗号分隔值 - Google Query
【发布时间】：2020-12-06 14:36:02
【问题描述】：

如果我有 2 列，即 ID 和名称，ID 列包含重复项，并且如果我想按 ID 分组以获得唯一 ID，但名称列应该是逗号分隔的列表，这可以在 Google Query 中实现吗?

| ID   | Name |
===============
| 1001 | abc  |
---------------
| 1001 | def  |
---------------
| 1002 | kjg  |
---------------
| 1003 | aof  |
---------------
| 1003 | lmi  |
---------------
| 1004 | xyz  |
---------------

进入

| ID   | Name      |
====================
| 1001 | abc, def  |
--------------------
| 1002 | kjg       |
--------------------
| 1003 | aof, lmi  |
--------------------
| 1004 | xyz       |
--------------------

【问题讨论】：

其实我不这么认为。但它可以通过标准公式获得。如果您对这种解决方案感兴趣，请告诉我。
hmm..希望 Google Query 能以某种方式与 Textjoin 等功能混合使用。
我很确定它可以通过 query/textjoin/split/etc 来完成。我去试试看……
TextJoin、Split 有 50,000 个字符的限制。
我认为以某种方式使用未记录的FLATTEN 函数应该可以克服这个限制。

标签： google-sheets google-query-language

【解决方案1】：

尝试：

=ARRAYFORMULA({QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col2) 
  where Col1 is not null 
  group by Col1 
  pivot Col3"), 
 "select Col1 
  offset 1", 0), REGEXREPLACE(TRIM(
 TRANSPOSE(QUERY(TRANSPOSE(QUERY(QUERY({A2:B&",", B2:B}, 
 "select max(Col2) 
  where Col1 is not null 
    and Col2 <> ',' 
  group by Col1 
  pivot Col3"), 
 "offset 1", 0)),,999^9))), ",$", )})

但是，由于TRIM（需要删除空格）和REGEXREPLACE（需要删除结尾逗号）限制，这可能不适用于海量数据集。否则，没有它，公式可以处理任何事情：

=ARRAYFORMULA({QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col2) 
  where Col1 is not null 
  group by Col1 
  pivot Col3"), 
 "select Col1 
  offset 1", 0), 
 TRANSPOSE(QUERY(TRANSPOSE(QUERY(QUERY({A2:B&",", B2:B}, 
 "select max(Col2) 
  where Col1 is not null 
    and Col2 <> ',' 
  group by Col1 
  pivot Col3"), 
 "offset 1", 0)),,999^9))})

【讨论】：

非常巧妙的答案！这就是我所说的纯粹的光彩。非常感谢。我确信没有 select 语句和大量标题的 Query 可以解决问题。我主要对 Offset 1 感到困惑。你能解释一下公式的整个工作原理吗？
@sifar 查询以第 3 列为中心，因此查询将第 3 列输出为标题行号 1，这是完全没有必要的，因此我们使用“偏移 1”将整个查询输出偏移 1 行，结果在删除标题行（枢轴残留）

【解决方案2】：

我查看了查询规范。我找不到解决方案。所以我做了一些公式来完成这项工作（因为我发现这个任务很有趣）。

D2 包含=unique(a2:a)

E2 包含=join(", ",transpose(filter($B$2:$B,$A$2:$A=D2)))，它被复制下来了。

我不得不把公式抄下来（远非漂亮的公式）希望对您有所帮助。

参考

【讨论】：

【解决方案3】：

这是使用 QUERY 的答案。

=ARRAYFORMULA(REGEXREPLACE(TRIM(SPLIT(TRANSPOSE(SPLIT(
 CONCATENATE(TRANSPOSE(QUERY({"♦"&A2:A&"♠", B2:B&", "}, 
 "select max(Col2) where Col2 is not null group by Col2 pivot Col1", 0))), 
 "♦")), "♠")), ",$", ))

这直接来自this question。 Player0 的答案是惊人的公式，能够以多种方式重组数据。

【讨论】：

我收到Text result of CONCATENATE is longer than the limit of 50000 characters.
是样本数据还是你的“生产”表？ A列和B列有多少行数据？ A 或 B 中的值是大文本块吗？这将解释错误。可以重新开发此公式以避免此问题。
我的生产工作表包含 6K 行。

【解决方案4】：

如果您可以接受输出中的结束逗号，您可以尝试：

=ARRAYFORMULA({QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1 
  pivot Col2"),
 "select Col1 offset 1", 0), 
 TRANSPOSE(QUERY(TRANSPOSE(IFERROR(VLOOKUP(QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1 
  pivot Col2"),
 "select Col1 offset 1", 0), 
 QUERY(QUERY({A2:B, B2:B&","}, 
 "select Col1,max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1 
  pivot Col2"),
 "offset 1", 0), 
 SPLIT(TRANSPOSE(QUERY(TRANSPOSE(IF(QUERY(QUERY({A2:B, B2:B&","}, 
 "select max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1    
  pivot Col2"),
 "offset 1", 0)="",,COLUMN(B2:XXX)&",")),,999^99)), ","), 0))),,999^99))})

（虽然这从未在超大规模数据集上进行过测试，但理论上它也应该可以处理任何事情）

【讨论】：

哇！杰出的。我会在早上用更大的数据集检查它。
你能帮我处理一下this吗？