【问题标题】:Convert parquet to list of objects in python将镶木地板转换为 python 中的对象列表
【发布时间】:2022-11-26 22:10:45
【问题描述】:

我正在阅读带有熊猫的镶木地板文件:

import pandas as pd
df = pd.read_parquet('myfile.parquet', engine='pyarrow')

该文件具有以下结构:

company_id user_id attribute_name attribute_value timestamp
1 116664 111f07000612 first_name Tom 2022-03-23 17:11:58
2 116664 111f07000612 last_name Cruise 2022-03-23 17:11:58
3 116664 111f07000612 city New York 2022-03-23 17:11:58
4 116664 abcf0700d009d122 first_name Matt 2022-02-23 10:11:59
5 116664 abcf0700d009d122 last_name Damon 2022-02-23 10:11:59

我想按 user_id 分组并生成具有以下格式的对象列表(将存储为 json):

[
 {
   "user_id": "111f07000612",
   "first_name": "Tom",
   "last_name": "Cruise",
   "city": "New York"
 },
 {
   "user_id": "abcf0700d009d122",
   "first_name": "Matt",
   "last_name": "Damon"
 }
]

【问题讨论】:

    标签: python pandas parquet


    【解决方案1】:

    嗨??希望你一切顺利!

    你可以用类似的东西来实现它?

    
    from pprint import pprint
    
    import pandas as pd
    
    
    # because I don't have the exact parquet file, I will just mock it
    # df = pd.read_parquet("myfile.parquet", engine="pyarrow")
    df = pd.DataFrame(
        {
            "company_id": [116664, 116664, 116664, 116664, 116664],
            "user_id": ["111f07000612", "111f07000612", "111f07000612", "abcf0700d009d122", "abcf0700d009d122"],
            "attribute_name": ["first_name", "last_name", "city", "first_name", "last_name"],
            "attribute_value": ["Tom", "Cruise", "New York", "Matt", "Damon"],
            "timestamp": ["2022-03-23 17:11:58", "2022-03-23 17:11:58", "2022-03-23 17:11:58", "2022-03-23 17:11:58", "2022-03-23 17:11:58"]
        }
    )
    
    records = []
    
    for user_id, group in df.groupby("user_id"):
        transformed_group = (
            group[["attribute_name", "attribute_value"]]
            .set_index("attribute_name")
            .transpose()
            .assign(user_id=user_id)
        )
        rercord, *_ = transformed_group.to_dict("records")
        records.append(rercord)
    
    pprint(records)
    # [{'city': 'New York',
    #   'first_name': 'Tom',
    #   'last_name': 'Cruise',
    #   'user_id': '111f07000612'},
    #  {'first_name': 'Matt', 'last_name': 'Damon', 'user_id': 'abcf0700d009d122'}]
    

    【讨论】:

      猜你喜欢
      • 2018-01-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-11-09
      • 2020-09-22
      • 2017-11-21
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多