【问题标题】:NULLS LAST function for HiveHive 的 NULLS LAST 函数
【发布时间】:2022-01-07 06:12:12
【问题描述】:

我有以下选择记录的算法,按照下面写的例子,应该选择以下记录。

  1. 如果“issuedate”是一个空列,则取“publid”,它有 更多“客栈”。

  2. 如果 "issuedate" 不完全相等,那么我们取 "issuedate" = last date。

  3. 如果 "issuedate" 都相等,那么我们取 "operdate" = last date。

  4. 如果 "issuedate" 和 operdate 相等,那么我们取 "publid",它有更多的 "inn"。

我在oracle中写了一段代码,想在hive中运行,但是出现了错误。我认为这是因为 NULLS LAST 函数。请告诉我如何将代码中的 NULLS LAST 函数更改为 Hive 的正确函数。

例子

| inn | publid | clusterid | issuedate | operdate |
|-----|--------|-----------|-----------|----------|
| 333 |   1    |    12     |  01-01-21 | 05-01-21 |
| 222 |   1    |    12     |  01-01-21 | 05-01-21 |
| 333 |   2    |    12     |  01-01-21 | 05-01-21 | 
| 222 |   2    |    12     |  01-01-21 | 05-01-21 |
| 111 |   2    |    12     |  01-01-21 | 05-01-21 |
|-----|--------|-----------|-----------|----------|
| 123 |   1    |     1     |  01-01-21 |          |
| 456 |   1    |     1     |  01-01-21 |          |
| 123 |   2    |     1     |  03-01-21 |          |
| 456 |   2    |     1     |  03-01-21 |          | 
| 789 |   2    |     1     |  03-01-21 |          |
| 123 |   3    |     1     |  02-01-21 |          |
| 456 |   3    |     1     |  02-01-21 |          |
|-----|--------|-----------|-----------|----------|
| 123 |   1    |     1     |           | 01-01-21 |
| 456 |   1    |     1     |           | 01-01-21 |
| 123 |   2    |     1     |           | 03-01-21 |
| 456 |   2    |     1     |           | 03-01-21 | 
| 789 |   2    |     1     |           | 03-01-21 |
| 123 |   3    |     1     |           | 02-01-21 |
| 456 |   3    |     1     |           | 02-01-21 |

结果

| inn | publid | clusterid | issuedate | operdate |
|-----|--------|-----------|-----------|----------|
| 333 |   2    |    12     |  01-01-21 | 05-01-21 |
| 222 |   2    |    12     |  01-01-21 | 05-01-21 |
| 111 |   2    |    12     |  01-01-21 | 05-01-21 |
|-----|--------|-----------|-----------|----------|
| 123 |   2    |     1     |  03-01-21 |          |
| 456 |   2    |     1     |  03-01-21 |          |
| 789 |   2    |     1     |  03-01-21 |          |
|-----|--------|-----------|-----------|----------|
| 123 |   2    |     1     |           | 03-01-21 |
| 456 |   2    |     1     |           | 03-01-21 |
| 789 |   2    |     1     |           | 03-01-21 |
    SELECT inn,
       publid,
       clusterid,
       issuedate,
       operdate
FROM   (
  SELECT inn,
         publid,
         clusterid,
         issuedate,
         operdate,
         DENSE_RANK() OVER (
           PARTITION BY clusterid
           ORDER     BY COALESCE( issuedate, operdate ) DESC NULLS LAST,
                        cnt DESC
         ) AS rnk
  FROM   (
    SELECT t.*,
           COUNT(inn) OVER (PARTITION BY publid) cnt
    FROM   table_name t
    WHERE  clusterid is not null
  )
)
WHERE  rnk = 1;

【问题讨论】:

    标签: sql hive null


    【解决方案1】:

    只需在 ORDER BY 中再添加一个表达式

    替换这个:

    ORDER BY COALESCE( issuedate, operdate ) DESC NULLS LAST
    

    用这个:

    ORDER BY CASE WHEN COALESCE(issuedate, operdate) is NOT NULL THEN 1 ELSE 2 END, --acts as NULLS LAST
             COALESCE( issuedate, operdate ) DESC
    

    也根据这个 Jira:HIVE-12994 目前 NULLS FIRST 是 ASC 订单的默认值,而 NULLS LAST 是 DESC 订单的默认值,您可能可以删除 NULLS LAST ,它将作为 DESC 订单的默认值。需要仔细检查。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-04-10
      • 2020-02-21
      • 2013-04-24
      • 2017-06-24
      • 2017-11-02
      • 1970-01-01
      • 2013-02-13
      相关资源
      最近更新 更多