【发布时间】:2021-10-11 06:12:40
【问题描述】:
对于下面的示例,如果我在下面使用相同的数据,并且如果我希望 Mary 和 Peter 帐户都在同一日期范围内,我将如何修改 hive sql 查询来执行此操作?例如,将日期范围设置在“2021-05-24”和“2021-06-03”之间,并填充此期间的所有余额。如果我们以 Mary 为例,我还希望看到 Mary 可用余额 '53028.1' 向前填充到 '2021-06-03' 并且如果 Mary 在 '2021-05-24 上没有值' 将余额填满 '50000'。
with mytable as (--Demo dataset, use your table instead of this
select stack(10, --number of tuples
'Peter',float(50000),'2021-05-24',
'Peter',float(50035),'2021-05-25',
'Peter',float(50035),'2021-05-26',
'Peter',float(50610),'2021-05-28',
'Peter',float(51710),'2021-06-01',
'Peter',float(53028.1),'2021-06-02',
'Peter',float(53916.1),'2021-06-03',
'Mary',float(50000),'2021-05-24',
'Mary',float(50035),'2021-05-25',
'Mary',float(53028.1),'2021-05-30'
) as (account_name,available_balance,Date_of_balance)
) --use your table instead of this CTE
select account_name, available_balance, date_add(Date_of_balance,e.i) as Date_of_balance
from
( --Get next_date to generate date range
select account_name,available_balance,Date_of_balance,
lead(Date_of_balance,1, Date_of_balance) over (partition by account_name order by Date_of_balance) next_date
from mytable d --use your table
) s lateral view outer posexplode(split(space(datediff(next_date,Date_of_balance)-1),'')) e as i,x --generate rows
order by account_name desc, Date_of_balance --this is to have order of rows like in your Converted Table
结果:
account_name available_balance date_of_balance
Peter 50000 2021-05-24
Peter 50035 2021-05-25
Peter 50035 2021-05-26
Peter 50035 2021-05-27
Peter 50610 2021-05-28
Peter 50610 2021-05-29
Peter 50610 2021-05-30
Peter 50610 2021-05-31
Peter 51710 2021-06-01
Peter 53028.1 2021-06-02
Peter 53916.1 2021-06-03
Mary 50000 2021-05-24
Mary 50035 2021-05-25
Mary 50035 2021-05-26
Mary 50035 2021-05-27
Mary 50035 2021-05-28
Mary 50035 2021-05-29
Mary 53028.1 2021-05-30
注意,这个左连接帮助我在附加的链接here 中走到了这一步
@左加入
我有一个非常大的表,我需要每天过去 90 天的余额。账户数超过100万账户,余额表庞大,余额记录仅在账户余额发生变化时更新。某些帐户可能一年多没有更新余额日期记录,因此 -left join 提出的以下代码将无法正常工作。
我有两张桌子:
**Accounts lookup table:**
account_name, observation_date
'Peter','2021-05-24'
'Luis','2021-03-21'
资产负债表
account_name,account_balance,balance_date
'Peter',50000,'2020-03-20'
'Peter',50035,'2021-04-27'
'Peter',43821,'2021-05-21'
'Peter',50610,'2021-05-22'
'Mary',51710,'2019-03-20'
'Mary',53028.1,'2021-04-27'
'Mary',53916.1,'2021-05-21'
'Mary',54632.76,'2021-05-22'
'Roger',55147.76,'2021-03-03'
'Roger',55293.96,'2021-02-03'
'Roger',57142.15,'2021-03-04'
'Roger',67834.15,'2021-04-01'
我正在寻找的 HIVE SQL 查询将能够连接这两个表并提供类似于下面的结果
account_name,account_balance,balance_date
Peter,50000,2020-03-20
Peter,50000,2021-02-24
Peter,50000,2021-02-25
Peter,…,…
Peter,50035,2021-04-27
Peter,50035,2021-04-28
Peter,50035,2021-04-29
Peter,…,…
Peter,43821,2021-05-21
Peter,50610,2021-05-22
Peter,43821,2021-05-23
Peter,43821,2021-05-24
Roger,55147.76,05/01/2021
Roger,55147.76,06/01/2021
Roger,55147.76,07/01/2021
Roger,…,…
Roger,55293.96,2021-02-03
Roger,57142.15,2021-02-04
Roger,57142.15,2021-02-05
Roger,…,…
Roger,67834.15,2021-04-01
Roger,67834.15,2021-04-02
Roger,67834.15,2021-04-03
Roger,67834.15,2021-04-04
Roger,67834.15,2021-04-05
我知道我们可能会从一开始就获取所有余额,然后执行领先功能,但是对于大规模环境,当每天查询数百万时,这将无法正常工作。
【问题讨论】:
标签: sql date hive hiveql date-range