如何删除从熊猫中的excel中读取的重复列答案

【问题标题】：how to delete a duplicate column read from excel in pandas如何删除从熊猫中的excel中读取的重复列
【发布时间】：2015-08-12 06:38:12
【问题描述】：

excel中的数据：

a   b   a   d
1   2   3   4
2   3   4   5
3   4   5   6
4   5   6   7

代码：

df= pd.io.excel.read_excel(r"sample.xlsx",sheetname="Sheet1")
df
   a  b  a.1  d
0  1  2    3  4
1  2  3    4  5
2  3  4    5  6
3  4  5    6  7

如何删除列a.1？

当 pandas 从 excel 读取数据时，它会自动将 2nd a 的列名更改为 a.1。

我试过 df.drop("a.1",index=1) ，这不起作用。

我有一个巨大的 excel 文件，其中有重复的名称，我只对少数列感兴趣。

【问题讨论】：

【解决方案1】：

您需要通过axis=1 才能使drop 工作：

In [100]:
df.drop('a.1', axis=1)

Out[100]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

或者只是传递一个感兴趣的列列表来选择列：

In [102]:
cols = ['a','b','d']
df[cols]

Out[102]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

也适用于“花式索引”：

In [103]:
df.ix[:,cols]

Out[103]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

【讨论】：

【解决方案2】：

如果您知道要删除的列的名称：

df = df[[col for col in df.columns if col != 'a.1']]

如果您有几列要删除：

columns_to_drop = ['a.1', 'b.1', ... ]
df = df[[col for col in df.columns if col not in columns_to_drop]]

【讨论】：