【发布时间】:2019-05-28 22:19:35
【问题描述】:
我有一个用户日志表,按 action_date(表名 - user_action_log)分区,包含数十亿行和列
user_id、action_name、action_date
样本数据 -
+---------+-------------+-------------+
| user_id | action_name | action_date |
+---------+-------------+-------------+
| 123 | login | 2018-01-30 |
| 123 | logout | 2018-01-31 |
| 123 | click | 2018-02-28 |
| 123 | comment | 2018-02-15 |
| 123 | post | 2018-03-15 |
+---------+-------------+-------------+
我想编写一个 ETL/sql 来将这些数据转换成这样的东西(表名 - user_action_record)。
user_id(主键)、first_action_date、last_action_date、previous_action_date
样本输出数据-
+---------+-------------------+------------------+---------------------------+
| user_id | first_action_date | last_action_date | previous_last_action_date |
+---------+-------------------+------------------+---------------------------+
| 123 | 2018-01-30 | 2018-03-15 | 2018-02-28 |
+---------+-------------------+------------------+---------------------------+
我尝试将问题分为两个步骤 -
- 插入 user_action_record 中不存在的新用户。
- 通过从“last_action_date”中的值更新“previous_last_action_date”来更新现有用户,并根据 user_action_log 表更新 last_action_date。
问题在于,由于 user_action_log 在 action_date 上进行分区,我可以每天查询该表 (action_date = CURRENT_DATE)
在这种情况下,谁能帮我用 sqls 填充我的目标表?
-- 编辑下面的附加信息
- “2018-01-30”日的源和预期目标表
+---------+-------------+-------------+
| user_id | action_name | action_date |
+---------+-------------+-------------+
| 123 | login | 2018-01-30 |
| 123 | logout | 2018-01-30 |
| 123 | click | 2018-01-30 |
+---------+-------------+-------------+
+---------+-------------------+------------------+---------------------------+
| user_id | first_action_date | last_action_date | previous_last_action_date |
+---------+-------------------+------------------+---------------------------+
| 123 | 2018-01-30 | 2018-01-30 | 2018-01-30 |
+---------+-------------------+------------------+---------------------------+
- “2018-01-31”日的源和预期目标表
+---------+-------------+-------------+
| user_id | action_name | action_date |
+---------+-------------+-------------+
| 123 | login | 2018-01-30 |
| 123 | logout | 2018-01-30 |
| 123 | click | 2018-01-30 |
| 123 | login | 2018-01-31 |
| 123 | logout | 2018-01-31 |
+---------+-------------+-------------+
+---------+-------------------+------------------+---------------------------+
| user_id | first_action_date | last_action_date | previous_last_action_date |
+---------+-------------------+------------------+---------------------------+
| 123 | 2018-01-30 | 2018-01-31 | 2018-01-30 |
+---------+-------------------+------------------+---------------------------+
- “2018-02-15”日的源和预期目标表
+---------+-------------+-------------+
| user_id | action_name | action_date |
+---------+-------------+-------------+
| 123 | login | 2018-01-30 |
| 123 | logout | 2018-01-30 |
| 123 | click | 2018-01-30 |
| 123 | login | 2018-01-31 |
| 123 | logout | 2018-01-31 |
| 123 | logout | 2018-02-15 |
| 123 | logout | 2018-02-15 |
+---------+-------------+-------------+
+---------+-------------------+------------------+---------------------------+
| user_id | first_action_date | last_action_date | previous_last_action_date |
+---------+-------------------+------------------+---------------------------+
| 123 | 2018-01-30 | 2018-02-15 | 2018-01-31 |
+---------+-------------------+------------------+---------------------------+
【问题讨论】:
-
你用的是什么版本的 MySQL?
-
@GordonLinoff 5.6
-
在 MySQL 8 中使用
DENSE_RANK()会容易得多