以不同的方式访问 pandas 中的列答案

【问题标题】：accessing the column in pandas in different way以不同的方式访问 pandas 中的列
【发布时间】：2016-06-16 07:21:25
【问题描述】：

我有一个如下所示的数据集：

    Id  Economics      English    History  Literature  
0  56          1            1          2        1                     
1  11          1            0          0        1                    
2   6          0            1          1        0                     
3  43          2            0          1        1                     
4  14          0            1          1        0

我通过从文件中读取一些 csv 创建了这个数据集，例如，我可以很容易地使用 df['Economics'] 访问这些列。然后我将它保存到文件中：

df.to_csv(file_path, sep='\t')

但是当我出于其他目的在其他功能中重新打开数据集并尝试以相同方式访问列时，即

df=pd.read_csv(file_path, sep='\t')
print df['Economics']

我有

KeyError：经济学

我在阅读时尝试了多种编码，并验证了它是否不是多索引数据帧，但编码和索引一切正常。我发现还有另一种方法：df.get('Economocs')，在这种情况下可以正常工作。但是，如果我想遍历列名，寻找“经济”，我又遇到了一个 KeyError。

所以我的问题是：为什么会这样？为什么有时我可以使用 df['column_name'] 直接访问列，而有时我需要使用 df.get('column_name')。以及如果第一种方法不起作用，如何处理column.names？

【问题讨论】：

你能提供一个独立的例子来说明这个问题吗？
@BrenBarn，你所说的独立示例是什么意思？我更新了问题，如果它有助于理解问题
我的意思是提供一段代码和示例数据，以便其他人可以实际运行该代码并重现您的错误。任何人都很难从对问题的描述中帮助您；您需要一个其他人可以用来复制问题的实际示例。
@Amanda，请执行以下脚本并使用其输出更新您的问题："... print("before: %s" % df.columns); df.to_csv(...) ; df=pd.read_csv(...); print("之后: %s" % df.columns); "
你能把df.columns.tolist()的输出贴出来

标签： python pandas dataframe keyerror

【解决方案1】：

我猜你要么在所有/部分列名中有尾随空格，要么甚至只有一列，如下面的测试示例：

测试数据：

Id  Economics     English   History   Literature  
56  1   1   2   1
11  1   0   0   1
6   1   1   0   0
43  2   0   1   1
14  1   1   1   0

测试代码：

import pandas as pd

df = pd.read_csv('test.csv', sep='\t')
print(df)
print(df.columns.tolist())

输出：

  Id  Economics     English   History   Literature
0                                  56  1   1   2   1
1                                  11  1   0   0   1
2                                  6   1   1   0   0
3                                  43  2   0   1   1
4                                  14  1   1   1   0
['Id  Economics     English   History   Literature  ']

DataFrame 只有一列：'Id Economics English History Literature '

让我们将pd.read_csv() 中的sep='\t' 更改为sep='\s+'，并针对相同的数据集执行我们的测试代码：

   Id  Economics  English  History  Literature
0  56          1        1        2           1
1  11          1        0        0           1
2   6          1        1        0           0
3  43          2        0        1           1
4  14          1        1        1           0
['Id', 'Economics', 'English', 'History', 'Literature']

【讨论】：

不，这不是分隔符的问题，因为当我打印 df.columns 时，我得到了所有列的有效列表
@Amanda, "KeyError" - 明确表示您正在尝试访问不存在的列。所以我认为在您发布“df.columns.tolist()”的输出在您最后一次“pd.read_csv()”调用之前，我无法为您提供帮助。不管怎样，祝你好运！

【解决方案2】：

列名中似乎有一些不需要的字符。也许是“经济学”之类的东西。

df.get('Economics') 在这种情况下不会给出 KeyError，而是什么都不返回。

尝试使用len(df.columns[1]) 检查df.columns 的输出和列名的长度。

【讨论】：

后面没有多余的字符，我验证过了