【问题标题】:pandas: filter out column values containing empty listpandas:过滤掉包含空列表的列值
【发布时间】:2017-08-15 08:08:20
【问题描述】:

我有以下数据框my_df

col_A    col_B
---------------
John     []
Mary     ['A','B','C']
Ann      ['B','C']

我想删除col_B 有一个空列表的行。即我希望新的数据框是:

col_A    col_B
---------------
Mary     ['A','B','C']
Ann      ['B','C']

以下是我所做的:

my_df[ len(my_df['col_B']) >0 ]

但我收到以下错误:


KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-27-75da0b0af6a1> in <module>()
----> 1 records_df_pair_count[ len(records_df_pair_count['stable_seq']) >0 ]

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath)
   3539 
   3540             if not isnull(item):
-> 3541                 loc = self.items.get_loc(item)
   3542             else:
   3543                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: True

知道我在这里做错了什么吗?谢谢!

【问题讨论】:

    标签: python-3.x pandas filter


    【解决方案1】:

    你可以使用Series.str.len()方法:

    my_df[my_df['col_B'].str.len() > 0]
    

    【讨论】:

      【解决方案2】:

      另一种方法:

      my_df[my_df['col_b'].apply(lambda x: len(x)) > 0]
      

      【讨论】:

        【解决方案3】:

        您已经得到了一些解决问题的答案。但我想我会插话解释为什么你的不起作用。

        这给出了一个熊猫系列:

        my_df['col_B']
        

        所以这给出了系列的长度:

        len(my_df['col_B'])
        

        由于您有一个非空系列,因此计算结果为 True:

        len(my_df['col_B']) >0
        

        还有这个:

        my_df[ len(my_df['col_B']) >0 ]
        

        评估为:

        my_df[True]
        

        显然 my_df 不会将 True 作为列索引。因此出现 KeyError。

        【讨论】:

          猜你喜欢
          • 2022-11-02
          • 1970-01-01
          • 2023-03-16
          • 2015-12-23
          • 2022-06-20
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2021-10-26
          相关资源
          最近更新 更多