【问题标题】:Group csv data in a array of dictionaries - Python在字典数组中分组 csv 数据 - Python
【发布时间】:2021-07-29 15:27:23
【问题描述】:

我有一个这样的 CSV 文件:(userId、movieId、score)并按 userId 排序

user1,movie1,0.1
user1,movie2,0.2
user2,movie2,0.4
user2,movie1,0.2

我想将它们分组到这样的字典数组中:

[
   {
      "userId":"user1",
      "scores":[
         {
            "movieId":"movie1",
            "score":0.1
         },
         {
            "movieId":"movie2",
            "score":0.2
         },
         
      ]
   },
   {
      "userId":"user2",
      "scores":[
         {
            "movieId":"movie2",
            "score":0.4
         },
         {
            "movieId":"movie1",
            "score":0.2
         }
      ]
   }
]

这是我使用 python 的尝试,但它不起作用

def get_body(batch):
    
    result = []
    record = {}
    scores = []
   
    for row in batch:
        if 'userId' in record and record['userId'] != row[0]:
            result.append({'userId': record['userId'], 'scores': scores})
            record = {}
            scores = []
        
        if 'userId' not in record:
            record['userId'] = row[0]

        scores.append({'movieId': row[1], 'score': float(row[2])})
        
    return result

另外,我没有使用 pandas 作为替代品,非常感谢您的帮助

【问题讨论】:

    标签: python arrays dictionary


    【解决方案1】:

    仅使用内置 csv 模块:

    import csv
    import json
    
    out = {}
    with open("your_file.csv", "r") as f_in:
        reader = csv.reader(f_in)
        for row in reader:
            out.setdefault(row[0], []).append(
                {"movieId": row[1], "score": float(row[2])}
            )
    
    out = [{"userId": k, "scores": v} for k, v in out.items()]
    # pretty print:
    print(json.dumps(out, indent=4))
    

    打印:

    [
        {
            "userId": "user1",
            "scores": [
                {
                    "movieId": "movie1",
                    "score": 0.1
                },
                {
                    "movieId": "movie2",
                    "score": 0.2
                }
            ]
        },
        {
            "userId": "user2",
            "scores": [
                {
                    "movieId": "movie2",
                    "score": 0.4
                },
                {
                    "movieId": "movie1",
                    "score": 0.2
                }
            ]
        }
    ]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-12-20
      • 2021-05-19
      • 2018-01-10
      • 1970-01-01
      • 2021-04-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多