【发布时间】:2022-11-26 22:10:45
【问题描述】:
我正在阅读带有熊猫的镶木地板文件:
import pandas as pd
df = pd.read_parquet('myfile.parquet', engine='pyarrow')
该文件具有以下结构:
| company_id | user_id | attribute_name | attribute_value | timestamp | |
|---|---|---|---|---|---|
| 1 | 116664 | 111f07000612 | first_name | Tom | 2022-03-23 17:11:58 |
| 2 | 116664 | 111f07000612 | last_name | Cruise | 2022-03-23 17:11:58 |
| 3 | 116664 | 111f07000612 | city | New York | 2022-03-23 17:11:58 |
| 4 | 116664 | abcf0700d009d122 | first_name | Matt | 2022-02-23 10:11:59 |
| 5 | 116664 | abcf0700d009d122 | last_name | Damon | 2022-02-23 10:11:59 |
我想按 user_id 分组并生成具有以下格式的对象列表(将存储为 json):
[
{
"user_id": "111f07000612",
"first_name": "Tom",
"last_name": "Cruise",
"city": "New York"
},
{
"user_id": "abcf0700d009d122",
"first_name": "Matt",
"last_name": "Damon"
}
]
【问题讨论】: