Jupyter Notebooks 中的错误，但在 Mac 终端或 Visual Studio Code 中没有答案

【问题标题】：error in Jupyter Notebooks but not in Mac Terminal or Visual Studio CodeJupyter Notebooks 中的错误，但在 Mac 终端或 Visual Studio Code 中没有
【发布时间】：2020-06-10 13:40:36
【问题描述】：

我有以下代码，应该这样做：获取人口普查数据，清理它（仅保留县 - SUMLEV==50 的列，仅保留所需的列），将州列设置为索引，按县人口对州进行排序，仅按每个州的人口显示前 3 个县，将这 3 个县的人口相加，按前 3 个人口最多的县的人口计算，返回人口最多的 3 个州。

该代码在 Mac 终端和 VSC 中运行良好，但在 Coursera 的 Jupyter Notebooks 中引发错误。我尝试重新启动内核，同样的事情。知道为什么吗？

谢谢。

import pandas as pd

census_df = pd.read_csv('census.csv')
census_df.head()

def answer_six():
    census = census_df[census_df['SUMLEV']==50] 
    colstokeep = ['STNAME', 'CTYNAME', 'CENSUS2010POP']
    census = census[colstokeep]
    census = census.set_index(['STNAME'])
    census = census.sort_values(['STNAME', 'CENSUS2010POP'], ascending= (True, False))
    census = census.groupby(level=0).head(3)
    final = census.groupby(['STNAME']).sum()
    final = final.sort_values(['CENSUS2010POP'], ascending=False)

    final_indexes = final.index.values.tolist()
    answ = final_indexes[:3]
    return answ

answer_six()

The error I get in JN:

KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'STNAME'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-12-5fdb76484a21> in <module>()
     14     return answ
     15 
---> 16 answer_six()

<ipython-input-12-5fdb76484a21> in answer_six()
      5     census = census[colstokeep]
      6     census = census.set_index(['STNAME'])
----> 7     census = census.sort_values(['STNAME', 'CENSUS2010POP'], ascending= (True, False))
      8     census = census.groupby(level=0).head(3)
      9     final = census.groupby(['STNAME']).sum()

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position)
   3216             keys = []
   3217             for x in by:
-> 3218                 k = self.xs(x, axis=other_axis).values
   3219                 if k.ndim == 2:
   3220                     raise ValueError('Cannot sort by duplicate column %s' %

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   1768 
   1769         if axis == 1:
-> 1770             return self[key]
   1771 
   1772         self._consolidate_inplace()

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'STNAME'

【问题讨论】：

“Coursera 的 Jupyter 笔记本”是什么意思？
Coursera 上的课程作业在 Jupyter Notebooks 上运行。
您是否在本地安装了 Jupyter？您是否以某种方式远程运行它？
无需下载，可在线使用。作业中的所有其他函数都按预期运行。
它基本上告诉你没有名为“STNAME”的列。为了调试，您可以尝试打印 census.columns 吗？

标签： python pandas dataframe jupyter-notebook

【解决方案1】：

这是您的代码的问题：

colstokeep = ['STNAME', 'CTYNAME', 'CENSUS2010POP'] 
census = census[colstokeep]                              # Keep only some columns
census = census.set_index(['STNAME'])                    # turn STNAME into an index
                                                         # at this point, it's an 
                                                         # index and no longer a column

census = census.sort_values(['STNAME', 'CENSUS2010POP'], # now try to sort on a column that
                             ascending= (True, False))   # no longer exists - and 
                                                         # you get an error

要解决它，请切换两行：

# first sort 
census = census.sort_values(['STNAME', 'CENSUS2010POP'], ascending= (True, False))
# then set the index
census = census.set_index(['STNAME'])

【讨论】：

我做到了，它在 VSC 中运行良好，在 JN 中同样的错误。我认为这是因为在代码中我仍然有“final = census.groupby(['STNAME']).sum()”，按不存在的列分组。但是如果我在代码末尾移动“census = census.set_index(['STNAME'])”，一切都会崩溃
我按照你说的切换了，然后在 "final = census.groupby(['STNAME']).sum()" 中将 "STNAME" 更改为 "level=0" 并且它起作用了。非常感谢您的帮助，我正在疯狂地试图找出在哪里寻找问题。
如果这回答了你的问题，如果你能接受我的回答那就太好了:)