【发布时间】:2020-08-07 16:07:16
【问题描述】:
我试图弄清楚如何根据两个条件比较和排名熊猫数据框中的多行。
这些是条件:
rule1 < rule2
if support(rule1) <= support(rule2) and confidence(rule1) < confidence(rule2)
or support(rule1) < support(rule2) and confidence(rule1) <= confidence(rule2)
rule1 = rule2
if support(rule1) = support(rule2) and confidence(rule1) = confidence(rule2)
这就是我的数据框的设置方式:
import pandas as pd
data = {
'rules': [(4444, 5555), (8747, 1254), (7414, 1214), (5655, 6651), (4454, 3321), (4893, 4923), (1271, 8330), (9112, 4722), (4511, 6722), (1102, 5789), (2340, 5720), (9822, 5067)],
'support': [0.0048, 0.00141, 0.0085, 0.00106, 0.00106, 0.00038, 0.00179, 0.00913, 0.00221, 0.00173, 0.00098, 0.00024],
'confidence': [0.873015, 0.533333, 0.593220, 0.012060, 0.012060, 0.237699, 0.453423, 0.097672, 0.116983, 0.541221, 0.743222, 0.378219]
}
df = pd.DataFrame(data=data, index=data['rules']).drop(columns=['rules'])
(Index)
Rules Support Confidence
(4444, 5555) 0.0048 0.873015
(8747, 1254) 0.00141 0.533333
(7414, 1214) 0.0085 0.593220
(5655, 6651) 0.00106 0.012060
(4454, 3321) 0.00106 0.012060
(4893, 4923) 0.00038 0.237699
(1271, 8330) 0.00179 0.453423
(9112, 4722) 0.00913 0.097672
(4511, 6722) 0.00221 0.116983
(1102, 5789) 0.00173 0.541221
(2340, 5720) 0.00098 0.743222
(9822, 5067) 0.00024 0.378219
这是我想要的数据框的外观(不确定排名到底是什么......这是假设的排名)
(Index)
Rules Support Confidence Rank
(7414, 1214) 0.0085 0.593220 1
(4444, 5555) 0.0048 0.873015 2
(5655, 6651) 0.00106 0.012060 3
(4454, 3321) 0.00106 0.012060 3
(8747, 1254) 0.00141 0.533333 4
(1271, 8330) 0.00179 0.453423 5
(1102, 5789) 0.00173 0.541221 6
(2340, 5720) 0.00098 0.743222 7
(9822, 5067) 0.00024 0.378219 8
(9112, 4722) 0.00913 0.097672 9
(4511, 6722) 0.00221 0.116983 10
(4893, 4923) 0.00038 0.237699 11
我对如何让这段代码正常工作有了一些想法,但我不确定如何将每条规则与每条规则进行比较。我希望根据条件浮动到顶部的最佳规则。它不是一个大数据框(
这是我目前得到的代码:
def rank_rules(confidence, support):
# IF / ELSE goes here
df['rank'] = some_var.rank(method='max')
df.sort_values(by=['rank'], ascending=False)
return df
df = df.apply(lambda x: rank_rules(x['confidence'], x['support']), axis=1)
【问题讨论】:
-
如果 (a)
support(rule1) <= support(rule2) and confidence(rule1) > confidence(rule2)(b)support(rule1) > support(rule2) and confidence(rule1) <= confidence(rule2)会发生什么? -
好点——我的老板给我发了我上面贴的条件。显然,除了我上面发布的条件之外,还有下限。我正在关注研究论文“挖掘最有趣的规则” - Roberto Bayardo。既然你在逻辑上戳了一个洞,看起来也需要发布下限才能解决这个问题。
标签: python pandas analytics ranking