【问题标题】:List of tuples to data frame数据框的元组列表
【发布时间】:2017-07-21 07:14:14
【问题描述】:

寻找解决问题的方法如下:

我有一个数据框,其中一个列包含如下元组列表:

mydf = pd.DataFrame({ 
        'Field1' : ['A','B','C'],
        'Field2' : ['1','2','3'],
        'WeirdField' :[ 
                      [ ('xxx', 'F1'), ('yyy','F2') ],
                      [ ('asd', 'F3'), ('bla','F4') ],
                      [ ('123', 'F2'), ('www','F5') ]
                      ]
        })

我希望元组第二个位置上的每个元素都成为数据框上的一列,其中对应的值位于第一个位置。 对于上面的数据框,这是我所期待的:

列表可以有多个元素(不仅是 2 个作为示例),并且元素的数量可以在各行中有所不同。

谁能建议如何轻松实现这一目标?

谢谢

【问题讨论】:

    标签: python python-2.7 dataframe


    【解决方案1】:

    首先,我将mydf['WeirdField'] 列展平,这样我们就可以只看到值和列名,而不必担心它们所在的列表。接下来,您可以使用itertools.groupby 获取每个“F”列的所有对应值和索引。

    import itertools
    
    # Must first sort the list by F column, or groupby won't work                  
    flatter = sorted([list(x) + [idx] for idx, y in enumerate(mydf['WeirdField']) 
                      for x in y], key = lambda x: x[1]) 
    
    # Find all of the values that will eventually go in each F column                
    for key, group in itertools.groupby(flatter, lambda x: x[1]):
        list_of_vals = [(val, idx) for val, _, idx in group]
    
        # Add each value at the appropriate index and F column
        for val, idx in list_of_vals:
            mydf.loc[idx, key] = val
    

    产生这个:

    In [84]: mydf
    Out[84]: 
      Field1 Field2              WeirdField   F1   F2   F3   F4   F5
    0      A      1  [(xxx, F1), (yyy, F2)]  xxx  yyy  NaN  NaN  NaN
    1      B      2  [(asd, F3), (bla, F4)]  NaN  NaN  asd  bla  NaN
    2      C      3  [(123, F2), (www, F5)]  NaN  123  NaN  NaN  www
    

    【讨论】:

    • 谢谢,太好了。喜欢没有值或字段名称的硬编码。我只需要将枚举更改为 mydf['WeirdField'].iteritems() 因为我需要保留数据帧索引。
    【解决方案2】:
    import pandas as pd
    
    mydf = pd.DataFrame({ 
            'Field1' : ['A','B','C'],
            'Field2' : ['1','2','3'],
            'WeirdField' :[ 
                          [ ('xxx', 'F1'), ('yyy','F2'),('xyz','F6') ],
                          [ ('asd', 'F3'), ('bla','F4') ],
                          [ ('123', 'F2'), ('www','F5') ,('mno','F1') ]
                          ]
            })
    
    print mydf.head()
    
    # Create a new data frame with just field1 and field2
    
    newdf = pd.DataFrame({'Field1' : ['A','B','C'],
            'Field2' : ['1','2','3'],
            })
    # create a list of columns
    column_names = []
    for index, row in mydf.iterrows():
        for j in range( len(mydf['WeirdField'][index])):
            column_names.append( mydf['WeirdField'][index][j][1])
    
    # Create a unique set of columns names
    new_column_names = list(set(column_names))
    
    # Add list of columns to the new dataframe and populate with None
    for i,j in enumerate(new_column_names):
        newdf.insert(i+2,j,None)
    
    # now add the elements into the columns
    for index, row in mydf.iterrows():
        for j in range( len(mydf['WeirdField'][index])):
            newdf.set_value(index, [mydf['WeirdField'][index][j][1]], mydf['WeirdField'][index][j][0])
    
    print newdf.head()
    

    产量

      Field1 Field2    F1    F2    F3    F4    F5    F6
    0      A      1   xxx   yyy  None  None  None   xyz
    1      B      2  None  None   asd   bla  None  None
    2      C      3   mno   123  None  None   www  None
    

    【讨论】:

      【解决方案3】:

      在压缩列值后考虑pivot_table 解决方案。这将适用于 WeirdField 中的任意数量的元组,假设 F 在同一行中的重复项都不会取最大值:

      data =[]
      # APPEND TO LIST
      for f1,f2,w in zip(mydf['Field1'].values, mydf['Field2'].values, mydf['WeirdField'].values):
          for i in w:
              data.append((f1, f2) + i)
      # CAST LIST OF TUPLES TO DATAFRAME
      df = pd.DataFrame(data, columns=['Field1', 'Field2', 'Value', 'Indicator'])
      
      # PIVOT DATAFRAME
      pvt = df.pivot_table(index=['Field1', 'Field2'], columns=['Indicator'],
                           values='Value', aggfunc='max', fill_value=np.nan).reset_index()
      pvt.columns.name = None
      
      #   Field1 Field2   F1   F2   F3   F4   F5
      # 0      A      1  xxx  yyy  NaN  NaN  NaN
      # 1      B      2  NaN  NaN  asd  bla  NaN
      # 2      C      3  NaN  123  NaN  NaN  www
      

      【讨论】:

        猜你喜欢
        • 2018-11-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-06-29
        • 2021-08-11
        • 2020-04-16
        • 2020-04-13
        相关资源
        最近更新 更多