Python pandas：用列中的数组展平答案

【问题标题】：Python pandas: flatten with arrays in columnPython pandas：用列中的数组展平
【发布时间】：2017-03-09 21:09:19
【问题描述】：

我有一个熊猫数据框，它有一列包含数组。我想通过为数组的每个元素重复其他列的值来“展平”它。

我通过迭代每一行来构建一个临时值列表，但它使用的是“纯 python”并且速度很慢。

有没有办法在 pandas/numpy 中做到这一点？也就是说，我尝试在下面的示例中改进 flatten 功能。

非常感谢。

toConvert = pd.DataFrame({
    'x': [1, 2],
    'y': [10, 20],
    'z': [(101, 102, 103), (201, 202)]
})

def flatten(df):
    tmp = []
    def backend(r):
        x = r['x']
        y = r['y']
        zz = r['z']
        for z in zz:
            tmp.append({'x': x, 'y': y, 'z': z})
    df.apply(backend, axis=1)
    return pd.DataFrame(tmp)

print(flatten(toConvert).to_string(index=False))

这给出了：

【问题讨论】：

标签： python arrays performance pandas numpy

【解决方案1】：

这是一个基于 NumPy 的解决方案 -

np.column_stack((toConvert[['x','y']].values.\
     repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))

示例运行 -

In [78]: toConvert
Out[78]: 
   x   y                z
0  1  10  (101, 102, 103)
1  2  20       (201, 202)

In [79]: np.column_stack((toConvert[['x','y']].values.\
    ...:      repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]: 
array([[  1,  10, 101],
       [  1,  10, 102],
       [  1,  10, 103],
       [  2,  20, 201],
       [  2,  20, 202]])

【讨论】：

【解决方案2】：

您需要numpy.repeat 和str.len 来创建列x 和y，对于z，请使用此solution：

import pandas as pd
import numpy as np
from  itertools import chain

df = pd.DataFrame({
        "x": np.repeat(toConvert.x.values, toConvert.z.str.len()),
        "y": np.repeat(toConvert.y.values, toConvert.z.str.len()),
        "z": list(chain.from_iterable(toConvert.z))})

print (df)          
   x   y    z
0  1  10  101
1  1  10  102
2  1  10  103
3  2  20  201
4  2  20  202

【讨论】：