【问题标题】:Every data point in a column has a list of dictionaries. How do I turn those entries into columns?列中的每个数据点都有一个字典列表。如何将这些条目变成列?
【发布时间】:2019-04-20 03:18:13
【问题描述】:

假设我有一个这样的数据框:

Name    Classes

Bill    [{'class': CS152, 'time': 2:00 PM}, {'class': PHYS162, 'time': 3:30 PM}]
Adam    [{'class': EE193, 'time': 1:00 PM}, {'class': PHYS162, 'time': 2:30 PM}]
Sara    [{'class': CS152, 'time': 4:00 PM}, {'class': BIO182, 'time': 6:30 PM}]

我怎样才能让数据框看起来像这样:

Name    CS152     PHYS162    EE193      BIO182

Bill    2:00 PM   3:30 PM    NaN        NaN
Adam    NaN       2:30 PM    1:00 PM    NaN
Sara    4:00 PM   NaN        NaN        6:30 PM

【问题讨论】:

    标签: python mongodb pandas dataframe pymongo


    【解决方案1】:

    或许有一种可能更优雅一点,但这里有一种可能性:

    def to_frame(key, classes):
        """expand list of dicts into DataFrame"""
        data = [d for row in classes for d in row]
        return pd.DataFrame(data, index=[key] * len(data))
    
    
    res = (
        # expand nested data structures
        pd.concat([
            to_frame(key, classes) for key, classes in data.groupby('name')['classes']
        ])
        .reset_index()
        .rename(columns={'index': 'name'})
        # pivot table
        .pivot_table(index='name', columns='class', values='time', aggfunc='first')
        .reset_index()
    )
    res.columns.name = None
    print(res)
    
           name   BIO182    CS152    EE193  PHYS162
    0      Adam      NaN      NaN  1:00 PM  2:30 PM
    1      Bill      NaN  2:00 PM      NaN  3:30 PM
    2      Sara  6:30 PM  4:00 PM      NaN      NaN
    

    【讨论】:

      【解决方案2】:

      一种方法来做到这一点...但是这可以优化

      so = pd.DataFrame([['Bill',[{'class': 'CS152', 'time': '2:00 PM'}, {'class': 'PHYS162', 'time': '3:30 PM'}]],
                         ['Adam',[{'class': 'EE193', 'time': '1:00 PM'}, {'class': 'PHYS162', 'time': '2:30 PM'}]],
                         ['Sara',[{'class': 'CS152', 'time': '4:00 PM'}, {'class': 'BIO182', 'time': '6:30 PM'}]]
                        ],columns=('Name','Classes'))
      
      for id in so.index:
          name = so.loc[id,'Name']
          classes = so.loc[id,'Classes']
          #create series data for individual person
          seriesdata = pd.Series([])
      
          for rowclass in classes:
              classname = rowclass['class']
              classtime = rowclass['time']
              seriesdata[classname]=classtime
          print(seriesdata)
          #Creating a dictionary of name:series data
          newdict[name]=seriesdata
      
      
      df = pd.DataFrame(newdict)
      print(df.T)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2023-01-29
        • 1970-01-01
        • 2015-04-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-08-27
        相关资源
        最近更新 更多