【问题标题】:Convert nested list with tuple elements into dataframe [closed]将带有元组元素的嵌套列表转换为数据框[关闭]
【发布时间】:2022-01-10 09:37:12
【问题描述】:

我有一个嵌套列表,里面有一个元组,看起来像这样:

[[(0, 0.01581311),
  (1, 0.00818853),
  (2, 0.01093196),
  (3, 0.95393395),
  (4, 0.0111324545)],
 [(0, 0.0026787873),
  (1, 0.001387138),
  (2, 0.9921792),
  (3, 0.0018690126),
  (4, 0.0018858392)],
 [(0, 0.013304136),
  (1, 0.0068892473),
  (2, 0.96115804),
  (3, 0.009282486),
  (4, 0.009366056)]]

我想将其转换为数据框,其中列名将是圆括号中的第一个元素(在本例中为数字 0 1 2 3 4)。或者也许有一种方法允许不在逗号之前显示第一个元素,并且会得到这样的结果:

[[(0.01581311),
  (0.00818853),
  (0.01093196),
  (0.95393395),
  (0.0111324545)],
 [(0.0026787873),
  (0.001387138),
  (0.9921792),
  (0.0018690126),
  (0.0018858392)],
 [(0.013304136),
  (0.0068892473),
  (0.96115804),
  (0.009282486),
  (0.009366056)]]

【问题讨论】:

  • 请附上您的尝试以及您为实现目标而编写的代码。请参阅stackoverflow.com/help/how-to-ask,了解一个好的问题应包括哪些内容。谢谢!
  • DataFrame.from_records() 可以工作,如果您将 inout 转换为元组列表,并丢弃列号。
  • 您的示例列号 0,1,2,3,4 是一个不好的示例,解决方案可以忽略它们。您可以将示例列号编辑为不是从 0 开始的连续整数吗?

标签: python dataframe tuples nested-lists


【解决方案1】:

这个函数应该给我们预期的结果:

import numpy as np
import pandas as pd

def list_to_dataframe(lis):
    lis = np.array(lis)
    shape = lis.shape
    LEN = len(lis)
    series = []
    if len(shape) == 3:
        # 3D list
        columns = lis[0,:,:][:,0].astype(np.int)
        get_section = lambda l, i: l[i,:,1]
    else:
        # 2D list
        columns = np.arange(0, shape[1]).astype(np.int)
        get_section = lambda l, i: l[i,:]
    # prepare the series
    for i in range(0, LEN):
        sec = get_section(lis, i)
        series.append(pd.Series(sec))
    df = pd.DataFrame(series)
    df.columns = columns
    return df

我是如何使用的:

l1 = [[(0, 0.01581311),
  (1, 0.00818853),
  (2, 0.01093196),
  (3, 0.95393395),
  (4, 0.0111324545)],
 [(0, 0.0026787873),
  (1, 0.001387138),
  (2, 0.9921792),
  (3, 0.0018690126),
  (4, 0.0018858392)],
 [(0, 0.013304136),
  (1, 0.0068892473),
  (2, 0.96115804),
  (3, 0.009282486),
  (4, 0.009366056)]]

l2 = [[(0.01581311),
  (0.00818853),
  (0.01093196),
  (0.95393395),
  (0.0111324545)],
 [(0.0026787873),
  (0.001387138),
  (0.9921792),
  (0.0018690126),
  (0.0018858392)],
 [(0.013304136),
  (0.0068892473),
  (0.96115804),
  (0.009282486),
  (0.009366056)]]


list_to_dataframe(l1)
list_to_dataframe(l2)

【讨论】:

    【解决方案2】:

    它与@enke 的差别不大,但它准备了一个空数据框,在循环过程中检索给定的多个列表,创建一个临时数据框,并将其添加到空数据框。

    import pandas as pd
    import numpy as np
    
    lst = [[(0, 0.01581311),
      (1, 0.00818853),
      (2, 0.01093196),
      (3, 0.95393395),
      (4, 0.0111324545)],
     [(0, 0.0026787873),
      (1, 0.001387138),
      (2, 0.9921792),
      (3, 0.0018690126),
      (4, 0.0018858392)],
     [(0, 0.013304136),
      (1, 0.0068892473),
      (2, 0.96115804),
      (3, 0.009282486),
      (4, 0.009366056)]]
    
    dfs = pd.DataFrame()
    for i in range(np.array(lst).shape[0]):
        df = pd.DataFrame(list(lst[i]))
        dfs = dfs.append(df.T.iloc[1], ignore_index=True)
    
    0   1   2   3   4
    0   0.015813    0.008189    0.010932    0.953934    0.011132
    1   0.002679    0.001387    0.992179    0.001869    0.001886
    2   0.013304    0.006889    0.961158    0.009282    0.009366
    

    【讨论】:

      猜你喜欢
      • 2015-03-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-09-11
      • 2021-12-29
      • 2012-06-28
      • 1970-01-01
      相关资源
      最近更新 更多