Hive - 与 hive 子查询有关的问题答案

【问题标题】：Hive - Issue with the hive sub queryHive - 与 hive 子查询有关的问题
【发布时间】：2019-07-08 08:06:38
【问题描述】：

我的问题陈述是这样的

“查找每个州人口最多的前 2 个区”

数据就像

Input

我的预期输出是

output

我尝试了很多查询和子查询，但导致子查询出现 SQL 错误

谁能帮我得到这个结果？

提前致谢。

我尝试过的查询

选择州名， (select concat_ws(',', collect_set(dist_name as string)) from population where state_name = state_name group by state order by population desc 2)

来自州名的人口组

选择
州名， concat_ws(',', collect_set(cast(dist_name as string)))
从人口其中population.dist_name in (select dist_name from ( 选择 dist_name , max(b.population) 作为总数从人口 b 其中 state_name = b.state_name 按 b.dist_name ， b.dist_name 分组按总降序限制排序 2) 作为 dist_name ）按 state_name 分组

【问题讨论】：

最好在此处发布文本，而不是在图像中。您还必须向我们展示您到目前为止所做的事情。您尝试过的查询。
您能发布到目前为止您尝试了哪些查询吗？

标签： sql database hive hiveql hue

【解决方案1】：

下面是查询 -

 select A.state, collect_set(A.dist)[0], collect_set(A.dist)[1] from 
(select state, dist, row_number() over (partition by state order by population 
 desc) as rnk from <tableName>) A
where A.rnk<=2 group by A.state;

以下是样本数据的结果 -

hive> select * from hier;
OK
C1      C11
C11     C12
C12     123
P1      C1
P2      C2

hive> select parent, collect_set(child)[0], collect_set(child)[1] from hier group by parent;
OK
C1      C11     NULL
C11     C12     NULL
C12     123     NULL
P1      C1      NULL
P2      C2      NULL
Time taken: 19.212 seconds, Fetched: 5 row(s)

【讨论】：

面对此错误“编译语句时出错：FAILED: ParseException line 4:0 cannot identify input near 'where' 'rnk' '
请使用更新后的查询，我最初错过了别名