【发布时间】:2018-05-17 07:58:58
【问题描述】:
我有一个 Hive 表“订单”,其中包含四列(id 字符串、名称字符串、订单字符串、ts 字符串)。表格样本数据如下。
-------------------------------------------
id name order ts
-------------------------------------------
1 abc completed 2018-04-12 08:15:26
2 def received 2018-04-15 06:20:17
3 ghi processed 2018-04-16 11:36:56
4 jkl received 2018-04-05 12:23:34
3 ghi received 2018-03-23 16:43:46
1 abc processed 2018-03-17 18:39:22
1 abc received 2018-02-25 20:07:56
订单列具有三种状态:已收到 -> 已处理 -> 已完成。一个名称有许多订单,每个订单都有这三个阶段。我需要给定“id”和“name”的最新订单值。这对您来说似乎是一个新手问题,但我对此感到困惑。
我尝试编写如下查询,但它们不起作用,我无法直接在“ts”列上使用 max 函数,因为它是字符串格式。请教一个最好的方法。 提前致谢。
我尝试过的查询
SELECT
ORDER
FROM Orders
WHERE id = '1'
AND name = 'ghi'
AND ts = (
SELECT max(unix_timestamp(ts, 'yyyy-MM-dd HH:mm:SS'))
FROM Orders
)
编译语句时出错:FAILED: ParseException line 2:0 cannot identify input near 'select' 'max' '(' in expression specification
SELECT
ORDER
FROM Orders
WHERE id = '1'
AND name = 'ghi'
AND max(unix_timestamp(ts, 'yyyy-MM-dd HH:mm:SS'))
编译语句时出错:FAILED: SemanticException [Error 10128]: Line 1:93 Not yet supported place for UDAF 'max'
select o.order from Orders o
inner join (
select id, name, order, max(ts) as ts
from Orders
group by id, name, order
) ord on d.id = ord.id and o.name = ord.name and o.ts = ord.ts where o.id = '1' and o.name = 'abc'
此查询已执行,但输出不是单个最新订单阶段,而是每个订单阶段都有相应的最新时间戳。
请帮忙。
【问题讨论】: