【问题标题】:How to PARTITION BY to show the same value for all rows?如何 PARTITION BY 为所有行显示相同的值?
【发布时间】:2021-06-03 12:24:27
【问题描述】:

我有一份所有消费者购买的清单,其中一些消费者在一段时间内进行了多次购买。我想用每个消费者第一次购买的位置填充一列,但我收到了这个错误:

Error in SQL statement: ParseException: 
mismatched input '(' expecting <EOF>(line 2, pos 25)

== SQL ==
SELECT consumer_id
       ,location OVER(partition BY table.consumer_id) AS first_purchase_site
---------------------^^^
FROM table

为清楚起见,这是我的查询:

SELECT consumer_id
       ,location OVER(partition BY table.consumer_id) AS first_purchase_site
FROM table
WHERE consumer_purchase_order_sequence = 1

【问题讨论】:

  • 你应该使用窗口函数spark.apache.org/docs/latest/…
  • partition by 与聚合函数一起使用。你不能使用location over()。请使用这个SELECT consumer_id ,location from ( SELECT a.*, row_number() OVER(partition BY table.consumer_id) AS rn from Table a) rs WHERE rs.rn=1

标签: sql apache-spark hive alias partition-by


【解决方案1】:

我想用每个消费者第一次购买的位置填充一列

你在找first_value()吗?

SELECT consumer_id,
       FIRST_VALUE(location) OVER (partition BY table.consumer_id) AS first_purchase_site
FROM table;

您的窗口函数,错误,缺少该函数。

【讨论】:

    【解决方案2】:

    你需要窗口函数FIRST_VALUE():

    SELECT DISTINCT consumer_id,
           FIRST_VALUE(location) OVER(PARTITION BY consumer_id ORDER BY consumer_purchase_order_sequence) AS first_purchase_site
    FROM table
    

    consumer_purchase_order_sequence 更改为订购购买的列。

    【讨论】:

      【解决方案3】:

      用窗口计算很难做到这一点。你可以用连接做到这一点,

      SELECT 
        table.consumer_id,
        table.location,
        a.first_purchase_site
      FROM table LEFT JOIN
        (SELECT consumer_id,location AS first_purchase_site FROM table WHERE 
        consumer_purchase_order_sequence = 1) a ON a.consumer_id=table.consumer_id
      

      【讨论】:

        猜你喜欢
        • 2019-02-26
        • 1970-01-01
        • 1970-01-01
        • 2011-04-17
        • 1970-01-01
        • 2012-09-23
        • 1970-01-01
        • 1970-01-01
        • 2021-05-06
        相关资源
        最近更新 更多