从列表创建多列熊猫数据框答案

【问题标题】：Create multiple-columns pandas dataframe from list从列表创建多列熊猫数据框
【发布时间】：2023-01-14 02:00:51
【问题描述】：

我不知道如何从列表中创建熊猫数据框（多列）。有些行在开头包含字符“>”。我希望它们成为列标题。每个标题后的行数不相同。

我的清单：

>header
a
b
>header2
c
d
e
f
>header3
g
h
i

我要创建的数据框：

>header1   >header2   >header3
a           c          g
b           d          h
            e          i
            f

【问题讨论】：

标签： pandas list multiple-columns

【解决方案1】：

只需遍历行并将标题与“>”匹配。但是，挑战在于从大小不等的列表字典中创建一个 df。

# The given list
lines = [">header", "a", "b", ">header2", "c", "d", "e", "f", ">header3", "g", "h", "i"]

# Iterate through the lines and create a sublist for each header
data = {}
column = ''
for line in lines:
    if line.startswith('>'):
        column = line
        data[column] = []
        continue
    data[column].append(line)

# Create the DataFrame
df = pd.DataFrame.from_dict(data,orient='index').T

输出：

  >header >header2 >header3
0       a        c        g
1       b        d        h
2    None        e        i
3    None        f     None

【讨论】：

【解决方案2】：

我假设您有包含此列表的文本。您可以使用str.splitlines() 拆分它，然后在itertools.zip_longest 的帮助下构造数据框：

from itertools import zip_longest

text = '''
>header
a
b
>header2
c
d
e
f
>header3
g
h
i'''

current, data = None, {}
for line in text.splitlines():
    line = line.strip()
    if line.startswith('>'):
        current = line
    else:
        data.setdefault(current, []).append(line)

df = pd.DataFrame(zip_longest(*data.values(), fillvalue=''), columns=list(data))
print(df)

印刷：

  >header >header2 >header3
0       a        c        g
1       b        d        h
2                e        i
3                f

【讨论】：