SQL Hadoop Hive 与 Toad for DB2 - 多个不同的计数答案

【问题标题】：SQL Hadoop Hive vs Toad for DB2 - Multiple Distinct CountsSQL Hadoop Hive 与 Toad for DB2 - 多个不同的计数
【发布时间】：2020-12-17 04:16:28
【问题描述】：

我正在尝试为 Toad 构建一个查询，但是，以下内容不起作用。

select count (distinct t.column1, t.column2)
from schema.table
;

但是，上述查询在 Hadoop Hive 中运行良好。关于优化查询以使其适用于 Toad 的任何建议？

【问题讨论】：

标签： sql hive db2 toad

【解决方案1】：

模拟行为有点棘手。最安全的方法大概是：

select sum(case when seqnum = 1 and column1 is not null and column2 is not null then 1 else 0 end)
from (select t.*,
             row_number() over (partition by column1, column2 order by column1) as seqnum
      from t
     ) t

（order by 列无关紧要。许多数据库都需要一个，所以我经常包含它。）

此版本适用于任何数据库，而不仅仅是 DB2。

问题在于，如果任何值为 NULL，Hive 不会计算一行，这会考虑在内。

在子查询中使用 select distinct 很接近，但它会计算 NULL 值 - 并且该更改可能不适用于查询中的其他列。

将列连接在一起更接近。但是，当存在重叠值（例如“12”/“3”和“1”/“23”）时，您会遇到问题。

【讨论】：

【解决方案2】：

尝试连接它们：

select count(distinct concat(t.column1, t.column2))
from schema.table t

【讨论】：

【解决方案3】：

您可以使用子查询：

select count(1)
from (select distinct t.column1, t.column2 from schema.table) as t1
;

【讨论】：