在熊猫中显示列答案

【问题标题】：Showing columns in pandas在熊猫中显示列
【发布时间】：2017-01-14 17:09:50
【问题描述】：

我在 pandas（由 CSV 制成）中有一个术语 x 文档矩阵，格式如下：

cheese, milk, bread, butter
0,2,1,0
1,1,0,0
1,1,1,1
0,1,0,1

所以如果我说'给我索引 1 和 2 的列，其中给定行的值都是 > 0'。

我想这样结束：

cheese, milk,
[omitted]
1,1
1,1
[omitted]

这样，我可以对number of rows / number of documents 求和并得出一个频繁项集，即(cheese, milk) --[2/4 support]

我已经尝试过这种方法，如单独的 stackoverflow 线程所示：

fil_df.select([fil_df.columns[1] > 0 and fil_df.columns[2] > 0], [fil_df.columns[1], fil_df.columns[2]])

但遗憾的是，它对我不起作用。我收到了错误：

TypeError：不可排序的类型：str() > int()

我不知道如何解决，因为当我从 csv 制作数据框时，我无法将行的单元格设为 integers。

【问题讨论】：

fil_df.columns[1] 返回列名而不是列本身。因此 TypeError
python也是从零开始的，所以如果你想要前两列你应该使用0和1作为索引

标签： python pandas indexing conditional-statements multiple-columns

【解决方案1】：

您可以将iloc 与boolean indexing 一起使用：

#get 1. and 2. columns
subset = df.iloc[:, [0,1]]
print (subset)
   cheese  milk
0       0     2
1       1     1
2       1     1
3       0     1

#mask
print ((subset > 0))
  cheese  milk
0  False  True
1   True  True
2   True  True
3  False  True

#get all values where True by rows
print ((subset > 0).all(1))
0    False
1     True
2     True
3    False
dtype: bool

#get first and second columns names
print (df.columns[[0,1]])
Index(['cheese', 'milk'], dtype='object')

print (df.ix[(subset > 0).all(1), df.columns[[0,1]]])
   cheese  milk
1       1     1
2       1     1

【讨论】：

【解决方案2】：

df.loc[[1, 2], df.loc[[1, 2]].gt(0).all()]

【讨论】：