如何重复 Pandas DataFrame？答案

【问题标题】：How to repeat a Pandas DataFrame?如何重复 Pandas DataFrame？
【发布时间】：2014-07-16 06:36:41
【问题描述】：

这是我的 DataFrame，应该重复 5 次：

>>> x = pd.DataFrame({'a':1,'b':2}, index = range(1))
>>> x
   a  b
0  1  2

我想要这样的结果：

>>> x.append(x).append(x).append(x)
   a  b
0  1  2
0  1  2
0  1  2
0  1  2

但必须有比追加 4 次更聪明的方法。实际上，我正在处理的 DataFrame 应该重复 50 次。

我没有找到任何实用的东西，包括像 np.repeat 这样的东西——它只是在 DataFrame 上不起作用。

有人可以帮忙吗？

【问题讨论】：

标签： python pandas duplicates dataframe repeat

【解决方案1】：

你可以使用concat函数：

In [13]: pd.concat([x]*5)
Out[13]: 
   a  b
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2

如果你只想重复值而不是索引，你可以这样做：

In [14]: pd.concat([x]*5, ignore_index=True)
Out[14]: 
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2

【讨论】：

【解决方案2】：

我认为现在使用iloc 更清洁/更快：

In [11]: np.full(3, 0)
Out[11]: array([0, 0, 0])

In [12]: x.iloc[np.full(3, 0)]
Out[12]:
   a  b
0  1  2
0  1  2
0  1  2

更一般地，您可以将tile 或repeat 与arange 一起使用：

In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

In [22]: df
Out[22]:
   A  B
0  1  2
1  3  4

In [23]: np.tile(np.arange(len(df)), 3)
Out[23]: array([0, 1, 0, 1, 0, 1])

In [24]: np.repeat(np.arange(len(df)), 3)
Out[24]: array([0, 0, 0, 1, 1, 1])

In [25]: df.iloc[np.tile(np.arange(len(df)), 3)]
Out[25]:
   A  B
0  1  2
1  3  4
0  1  2
1  3  4
0  1  2
1  3  4

In [26]: df.iloc[np.repeat(np.arange(len(df)), 3)]
Out[26]:
   A  B
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

注意：这将适用于非整数索引的 DataFrame（和系列）。

【讨论】：

为什么这比其他解决方案更干净？
这是一个更好的解决方案

【解决方案3】：

尝试使用numpy.repeat：

>>> import numpy as np
>>> df = pd.DataFrame(np.repeat(x.to_numpy(), 5, axis=0), columns=x.columns)
>>> df
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2

【讨论】：

这比 pd.concat 至少快 2 倍

【解决方案4】：

我通常不会重复和/或追加，除非您的问题确实有必要 - 这是非常低效的，通常来自于不了解解决问题的正确方法。

我不知道您的确切用例，但如果您将值存储为

values = array(1, 2)
df2 = pd.DataFrame(index=arange(0,50),  columns=['a', 'b'])
df2[['a', 'b']] = values

将完成这项工作。也许您想更好地解释您想要实现的目标？

【讨论】：

我有一个数据框，每个标识符缺少一行。我想在其中插入这一行，所以我要做的就是将这一行重复N次，并将其附加到原始数据框中，然后再使用它。

【解决方案5】：

Append 也应该可以：

In [589]: x = pd.DataFrame({'a':1,'b':2},index = range(1))

In [590]: x
Out[590]: 
   a  b
0  1  2

In [591]: x.append([x]*5, ignore_index=True) #Ignores the index as per your need
Out[591]: 
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
5  1  2

In [592]: x.append([x]*5)
Out[592]: 
   a  b
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2

【讨论】：

【解决方案6】：

在我看来，通过 row-lambda 应用是一种通用方法：

df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

df.apply(lambda row: row.repeat(2), axis=0) #.reset_index()

Out[1]: 
    A   B
0   1   2
0   1   2
1   3   4
1   3   4

【讨论】：

【解决方案7】：

没有numpy，我们也可以使用Index.repeat + loc（或reindex）：

out = x.loc[x.index.repeat(5)].reset_index(drop=True)

或

out = x.reindex(x.index.repeat(5)).reset_index(drop=True)

输出：

【讨论】：