【问题标题】:finding max element from column of dataframe gives error从数据框列中查找最大元素会出错
【发布时间】:2018-12-09 00:12:51
【问题描述】:

我试图从我的 DataFrame 中的列中查找最大元素,但这会产生以下错误。 而且我已经测试过它只会给这个列名带来错误,其余的列都可以正常工作。

这是我从文件 posts1.csv 创建的 DataFrame

import pandas as pd

posts_n = pd.read_csv('posts1.csv',encoding='latin-1')
posts=posts_n.fillna(0)

当我尝试从特定列(即“分数”)中查找最大元素时,

max_post = posts['score'].max()
max_post

我收到以下错误

KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2441             try:
-> 2442                 return self._engine.get_loc(key)
   2443             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'score'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-12-09c353ba0de2> in <module>()
     34 #MAximum posts done by a user
     35 
---> 36 max_post = posts['score'].max()
     37 max_post
     38 #scr=posts.iloc[:,4]

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   1969         # get column
   1970         if self.columns.is_unique:
-> 1971             return self._get_item_cache(key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1643         res = cache.get(item)
   1644         if res is None:
-> 1645             values = self._data.get(item)
   1646             res = self._box_item_values(item, values)
   1647             cache[item] = res

~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2442                 return self._engine.get_loc(key)
   2443             except KeyError:
-> 2444                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2445 
   2446         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'score'

这就是数据的外观 posts1.csv

【问题讨论】:

  • 请将 posts.head() 添加到您的帖子中
  • @tobsecret 没用
  • 我的意思是在posts=posts_n.fillna(0) 之后,请添加print(posts.head()) 并相应地编辑您的帖子,这样我就知道您的DataFrame 中到底有什么。
  • 我找到了解决方案,这行得通。 posts_n = pd.read_csv('posts1.csv',encoding='latin-1',sep='\s*,\s*')

标签: pandas dataframe machine-learning max slice


【解决方案1】:

'score' 不在(column) index 中,因此不是将 csv 的第一行作为标题行加载,而是将其作为数据读取。

尝试以下方法:

posts = pd.read_csv('posts1.csv', header=1)

【讨论】:

  • 我写了这个 posts_n = pd.read_csv('posts1.csv',encoding='latin-1',header=1) 还是一样的错误
猜你喜欢
  • 1970-01-01
  • 2016-05-24
  • 2021-05-31
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-10-07
  • 2021-08-29
  • 1970-01-01
相关资源
最近更新 更多