【问题标题】:error in Jupyter Notebooks but not in Mac Terminal or Visual Studio CodeJupyter Notebooks 中的错误,但在 Mac 终端或 Visual Studio Code 中没有
【发布时间】:2020-06-10 13:40:36
【问题描述】:

我有以下代码,应该这样做:获取人口普查数据,清理它(仅保留县 - SUMLEV==50 的列,仅保留所需的列),将州列设置为索引,按县人口对州进行排序,仅按每个州的人口显示前 3 个县,将这 3 个县的人口相加,按前 3 个人口最多的县的人口计算,返回人口最多的 3 个州。

该代码在 Mac 终端和 VSC 中运行良好,但在 Coursera 的 Jupyter Notebooks 中引发错误。我尝试重新启动内核,同样的事情。知道为什么吗?

谢谢。

import pandas as pd

census_df = pd.read_csv('census.csv')
census_df.head()

def answer_six():
    census = census_df[census_df['SUMLEV']==50] 
    colstokeep = ['STNAME', 'CTYNAME', 'CENSUS2010POP']
    census = census[colstokeep]
    census = census.set_index(['STNAME'])
    census = census.sort_values(['STNAME', 'CENSUS2010POP'], ascending= (True, False))
    census = census.groupby(level=0).head(3)
    final = census.groupby(['STNAME']).sum()
    final = final.sort_values(['CENSUS2010POP'], ascending=False)

    final_indexes = final.index.values.tolist()
    answ = final_indexes[:3]
    return answ

answer_six()
The error I get in JN:
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'STNAME'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-12-5fdb76484a21> in <module>()
     14     return answ
     15 
---> 16 answer_six()

<ipython-input-12-5fdb76484a21> in answer_six()
      5     census = census[colstokeep]
      6     census = census.set_index(['STNAME'])
----> 7     census = census.sort_values(['STNAME', 'CENSUS2010POP'], ascending= (True, False))
      8     census = census.groupby(level=0).head(3)
      9     final = census.groupby(['STNAME']).sum()

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position)
   3216             keys = []
   3217             for x in by:
-> 3218                 k = self.xs(x, axis=other_axis).values
   3219                 if k.ndim == 2:
   3220                     raise ValueError('Cannot sort by duplicate column %s' %

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   1768 
   1769         if axis == 1:
-> 1770             return self[key]
   1771 
   1772         self._consolidate_inplace()

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'STNAME'

【问题讨论】:

  • “Coursera 的 Jupyter 笔记本”是什么意思?
  • Coursera 上的课程作业在 Jupyter Notebooks 上运行。
  • 您是否在本地安装了 Jupyter?您是否以某种方式远程运行它?
  • 无需下载,可在线使用。作业中的所有其他函数都按预期运行。
  • 它基本上告诉你没有名为“STNAME”的列。为了调试,您可以尝试打印 census.columns 吗?

标签: python pandas dataframe jupyter-notebook


【解决方案1】:

这是您的代码的问题:

colstokeep = ['STNAME', 'CTYNAME', 'CENSUS2010POP'] 
census = census[colstokeep]                              # Keep only some columns
census = census.set_index(['STNAME'])                    # turn STNAME into an index
                                                         # at this point, it's an 
                                                         # index and no longer a column

census = census.sort_values(['STNAME', 'CENSUS2010POP'], # now try to sort on a column that
                             ascending= (True, False))   # no longer exists - and 
                                                         # you get an error

要解决它,请切换两行:

# first sort 
census = census.sort_values(['STNAME', 'CENSUS2010POP'], ascending= (True, False))
# then set the index
census = census.set_index(['STNAME'])

【讨论】:

  • 我做到了,它在 VSC 中运行良好,在 JN 中同样的错误。我认为这是因为在代码中我仍然有“final = census.groupby(['STNAME']).sum()”,按不存在的列分组。但是如果我在代码末尾移动“census = census.set_index(['STNAME'])”,一切都会崩溃
  • 我按照你说的切换了,然后在 "final = census.groupby(['STNAME']).sum()" 中将 "STNAME" 更改为 "level=0" 并且它起作用了。非常感谢您的帮助,我正在疯狂地试图找出在哪里寻找问题。
  • 如果这回答了你的问题,如果你能接受我的回答那就太好了:)
猜你喜欢
  • 1970-01-01
  • 2022-10-16
  • 1970-01-01
  • 2019-11-20
  • 2021-10-19
  • 2018-02-16
  • 1970-01-01
  • 2020-06-16
  • 1970-01-01
相关资源
最近更新 更多