在 Pandas 数据框中插入总和行时如何保留列标题答案

【问题标题】：How to keep column titles when inserting a sum row in Pandas dataframe在 Pandas 数据框中插入总和行时如何保留列标题
【发布时间】：2018-03-21 13:04:29
【问题描述】：

我有一个数据框：

       Name    y1    y2   y3                  
 1     Ben     01    02   03
 2     Jane    04    05   06
 3     Sarah   07    07   06

我正在尝试在我的数据框中添加一行，该行提供每列中的总行数。我的代码是：

import pandas as pd

df = pd.DataFrame(np.insert(df.values, 0, values=[df.sum(axis=0)], axis=0))
df.set_value(0, 0,'total')
df.head()

这是成功的，但也会像这样删除我的列名：

       0       1     2    3                     
 0     Total   12    14   15
 1     Ben     01    02   03
 2     Jane    04    05   06
 3     Sarah   07    07   06

而不是根据需要返回：

       Name    y1    y2   y3                      
 0     Total   12    14   15
 1     Ben     01    02   03
 2     Jane    04    05   06
 3     Sarah   07    07   06

我试过插入

Index(['Name'], name=df.index.name)

到

df = pd.DataFrame(np.insert(df.values, 0, values=[df.sum(axis=0)], Index(['Name'], name=df.index.name) axis=0))

但这只是返回错误

TypeError: unhashable type: 'Index'

我哪里出错了？

【问题讨论】：

stackoverflow.com/questions/24284342/…
每个 OP 的梦想 - 很多完美的答案:)

标签： python pandas dataframe indexing

【解决方案1】：

IIUC，你可以这样做，使用select_types、assign和pd.concat：

pd.concat([df.select_dtypes(include=np.number)
             .sum()
             .to_frame()
             .T
             .assign(Name='Total'),df])

输出：

    Name  y1  y2  y3
0  Total  12  14  15
1    Ben   1   2   3
2   Jane   4   5   6
3  Sarah   7   7   6

【讨论】：

【解决方案2】：

避免这种情况的一种方法是通过.loc 添加新行，然后将其移至顶部：

df.loc[len(df)+1] = ['Total'] + df.iloc[:, 1:].sum(axis=0).tolist()

df = df.loc[[df.index[-1]] + df.index[:-1].tolist(), :]

#     Name  y1  y2  y3
# 4  Total  12  14  15
# 1    Ben   1   2   3
# 2   Jane   4   5   6
# 3  Sarah   7   7   6

如果这对您很重要，您可以在之后使用df.reset_index。

【讨论】：

【解决方案3】：

您可以使用pandas.concat 堆叠两个数据帧：

import pandas as pd
df = ...

df_total = pd.DataFrame(df.iloc[:, 1:].sum(), columns=["Total"]).T.reset_index()
df_total.columns = df.columns
df = pd.concat([df_total, df])
#     Name  y1  y2  y3
# 0  Total  12  14  15
# 1    Ben   1   2   3
# 2   Jane   4   5   6
# 3  Sarah   7   7   6

【讨论】：

【解决方案4】：

你可以试试

s=df.sum()    
s.loc['Name']='Total'
df.loc[0]=s    
df.sort_index()
Out[457]: 
    Name  y1  y2  y3
0  Total  12  14  15
1    Ben   1   2   3
2   Jane   4   5   6
3  Sarah   7   7   6

【讨论】：

【解决方案5】：

np.insert 的解决方案应该非常快，但必须首先使用非数字列创建 index：

#create index from `Name` column
df = df.set_index('Name')

#add first value to index
idx = np.insert(df.index, 0, 'Total')
#add columns and index parameters to DataFrame contructor and last reset index
df = pd.DataFrame(np.insert(df.values, 0, df.sum(), axis=0), 
                  columns=df.columns, 
                  index=idx).reset_index()
print (df)
    Name  y1  y2  y3
0  Total  12  14  15
1    Ben   1   2   3
2   Jane   4   5   6
3  Sarah   7   7   6

【讨论】：

这也适用于更大的数据集。我还用 .mean 替换了 .sum，这可能对其他想要做类似的人有用。 idx = np.insert(df.index, 0, 'Mean') df = pd.DataFrame(np.insert(df.values, 0, df.mean(), axis=0),