试图理解为什么比较不起作用但过滤器起作用（熊猫）答案

【问题标题】：Trying to understand why a compare does not work but a filter does (Panda)试图理解为什么比较不起作用但过滤器起作用（熊猫）
【发布时间】：2018-07-22 17:22:50
【问题描述】：

dfclean = dfclean[dfclean['Count'] > 1]

我用它从数据框中清除了

dfsorted = dfbottom.groupby("ST").filter(lambda dfbottom:dfbottom.shape[0] > 1)

我用它来过滤掉 1 个实例的值。在通过stackoverflow倾注了一段时间后我使用了这个，并找到了正确的代码来理解。

dfbottom = dfbottom[dfbottom.groupby("ST").count() > 1]

如果可能的话，我需要帮助理解的是，为什么这不起作用？在我看来，这应该做一个类似的清理工作（查看“ST”列，计算值，它发现值 > 1 的位置保留数据。相反，发生的是 Dataframe 以所有 NaN 值结束。如果我运行只是 dfbottom 代码，我得到了一个“True”和“False”表。该表是正确的，但我显然缺少使用该数据创建新数据框的正确格式。

【问题讨论】：

标签： python pandas

【解决方案1】：

.count聚合DataFrame有问题。

解决方案是使用GroupBy.transform 返回Series，大小与原始DataFrame 相同，因此可以进行过滤：

dfbottom = dfbottom[dfbottom.groupby("ST")['ST'].transform('count') > 1]

示例：

dfbottom = pd.DataFrame({'ST':list('abbbcec')})
print (dfbottom)
  ST
0  a
1  b
2  b
3  b
4  c
5  e
6  c

dfbottom = dfbottom[dfbottom.groupby("ST")['ST'].transform('count') > 1]
print (dfbottom)
  ST
1  b
2  b
3  b
4  c
6  c

详情：

print (dfbottom.groupby("ST")['ST'].transform('count'))
0    1
1    3
2    3
3    3
4    2
5    1
6    2
Name: ST, dtype: int64

print (dfbottom.groupby("ST")['ST'].transform('count') > 1)
0    False
1     True
2     True
3     True
4     True
5    False
6     True
Name: ST, dtype: bool

如果想按聚合值过滤：

print (dfbottom.groupby("ST")['ST'].count())
ST
a    1
b    3
c    2
e    1
Name: ST, dtype: int64

print (dfbottom.groupby("ST")['ST'].count() > 1)
ST
a    False
b     True
c     True
e    False
Name: ST, dtype: bool

print (dfbottom[dfbottom.groupby("ST")['ST'].count() > 1])

IndexingError：作为索引器提供的不可对齐的布尔系列（布尔系列的索引与索引对象的索引不匹配

这行不通，因为布尔掩码的大小不同 - 在此示例中，长度为 4，原始 DataFrame 为 7。

【讨论】：

谢谢，这与使用过滤器方法的结果相同。我现在正在阅读pandas.pydata.org/pandas-docs/stable/groupby.html，以帮助我更好地理解 groupby；我还有很多东西要学。谢谢。
@Kafka - 当然，.filter 也在这里工作，但更大的DataFrame 更快.transform 解决方案，所以我更喜欢它:)