【发布时间】:2017-12-04 01:17:21
【问题描述】:
我有这个数据集:
import numpy as np
import pandas as pd
from itertools import product
A= ['ABC', 'DEF']
M= ['X', 'Y', 'Z']
F= ['plus', 'minus', 'star']
# Create all possible permutation from <A,M,F>
df = pd.DataFrame(list(product(A,M,F)), columns=['A', 'M', 'F'])
df['value'] = np.random.uniform(0, 1, df.shape[0])
数据集如下:
A M F value
0 ABC X plus 0.666602
1 ABC X minus 0.716765
2 ABC X star 0.032931
3 ABC Y plus 0.275616
4 ABC Y minus 0.489233
在这里,我想获得能够最大化我的目标的前 k 个集合组合:
My goal is : The maximum of Sum(values of combination sets) + sum(distance of combination sets)
这是我的代码:
#diversity/distance function
def diversity(a, b):
c = a.intersection(b)
d = float(len(c)) / (len(a) + len(b) - len(c))
return 1 - d
我的代码:
from itertools import combinations
k = 3
max_distance = []
# I drop the column 'value' because sets that I want to compare is <A,M,F>
df_distance = df.drop(['value'],axis=1)
series_set = df_distance.apply(lambda row: set(row), axis=1)
data = series_set
for z in combinations(data, k):
dis = 0
sum_values = 0
for a in combinations(z, 2):
dis += diversity(*a)
# I am stuck here, I want to sum the value but I don't know, how to get the value and sum it in combination
max_distance.append((dis, tuple(z)))
max_distance.sort(key=lambda x: x[0], reverse=True)
print(max_distance[:k])
输出:
[(2.8, ({'plus', 'ABC', 'X'}, {'Y', 'minus', 'ABC'}, {'Z', 'star', 'DEF'})), (2.8, ({'plus', 'ABC', 'X'}, {'Y', 'star', 'ABC'}, {'Z', 'minus', 'DEF'})), (2.8, ({'plus', 'ABC', 'X'}, {'Z', 'minus', 'ABC'}, {'Y', 'star', 'DEF'}))]
在我上面的代码中,我只是计算距离的总和。值 2.8 只是距离的总和。我想对集合之间的距离求和,但只能从列 [A,M,F] 中求和,我还想对这些值求和。预期输出是(距离之和 + 值之和),它是所有集合组合的最佳值。
我真的很困惑如何对组合中的值求和?
预期输出:
[(sum(distance) + sum(values) , ({'plus', 'ABC', 'X'}, {'Y', 'minus', 'ABC'}, {'Z', 'star', 'DEF'})), ((sum(distance) + sum(values), ({'plus', 'ABC', 'X'}, {'Y', 'star', 'ABC'}, {'Z', 'minus', 'DEF'})), ((sum(distance) + sum(values), ({'plus', 'ABC', 'X'}, {'Z', 'minus', 'ABC'}, {'Y', 'star', 'DEF'}))]
如果您有任何问题,请告诉我,对不起我的英语。
【问题讨论】:
标签: python pandas dataframe combinations itertools