【问题标题】:flatten array of arrays json object column in a pandas dataframe展平熊猫数据框中的数组json对象列
【发布时间】:2020-04-30 20:40:58
【问题描述】:
0    [{'review_id': 4873356, 'rating': '5.0'}, {'review_id': 4973356, 'rating': '4.0'}]
1    [{'review_id': 4635892, 'rating': '5.0'}, {'review_id': 4645839, 'rating': '3.0'}] 

我有一种情况,我想将这样的 json 展平,如下所示:Converting array of arrays into flattened dataframe

但我想创建新列,以便输出为:

review_id_1  rating_1  review_id_2  rating_2
4873356       5.0      4973356      4.0 
4635892       5.0      4645839      3.0

任何帮助都非常感谢..

【问题讨论】:

    标签: python arrays json pandas


    【解决方案1】:

    尝试使用:

    print(pd.DataFrame(s.apply(lambda x: {a: b for i in [{x + str(i): y for x, y in v.items()} for i, v in enumerate(x, 1)] for a, b in i.items()}).tolist()))
    

    输出:

      rating1 rating2  review_id1  review_id2
    0     5.0     4.0     4873356     4973356
    1     5.0     3.0     4635892     4645839
    

    【讨论】:

      【解决方案2】:

      这种类型的数据修改往往是手动的。

      # Sample data.
      df = pd.DataFrame({
          'json_data': [
              [{'review_id': 4873356, 'rating': '5.0'}, {'review_id': 4973356, 'rating': '4.0'}],
              [{'review_id': 4635892, 'rating': '5.0'}, {'review_id': 4645839, 'rating': '3.0'}],
          ]
      })
      
      # Data transformation:
      # Step 1: Temporary dataframe that splits data from `df` into two columns.
      df2 = pd.DataFrame(zip(*df['json_data']))  
      # Step 2: Use a list comprehension to concatenate the records from each column so that the df now has 4 columns.
      df2 = pd.concat([pd.DataFrame.from_records(df2[col]) for col in df2], axis=1)
      # Step 3: Rename final columns
      df2.columns = ['review_id_1', 'rating_1', 'review_id_2', 'rating_2']
      >>> df2
         review_id_1 rating_1  review_id_2 rating_2
      0      4873356      5.0      4635892      5.0
      1      4973356      4.0      4645839      3.0
      

      【讨论】:

      • 谢谢 alex.. 你尝试过下面 U10 的解决方案吗?看起来更优雅..干杯..
      猜你喜欢
      • 2021-07-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-03-18
      • 2019-08-03
      • 1970-01-01
      • 2021-06-13
      • 2019-12-07
      相关资源
      最近更新 更多