【问题标题】:Sum values inside the itertools.combinations Python在 itertools.combinations Python 中求和值
【发布时间】:2017-12-04 01:17:21
【问题描述】:

我有这个数据集:

import numpy as np
import pandas as pd
from itertools import product

A= ['ABC', 'DEF'] 
M= ['X', 'Y', 'Z']
F= ['plus', 'minus', 'star']

# Create all possible permutation from <A,M,F> 
df = pd.DataFrame(list(product(A,M,F)), columns=['A', 'M', 'F'])
df['value'] = np.random.uniform(0, 1, df.shape[0])

数据集如下:

     A  M   F        value
0   ABC X   plus    0.666602
1   ABC X   minus   0.716765
2   ABC X   star    0.032931
3   ABC Y   plus    0.275616
4   ABC Y   minus   0.489233

在这里,我想获得能够最大化我的目标的前 k 个集合组合:

My goal is : The maximum of Sum(values of combination sets) + sum(distance of combination sets)

这是我的代码:

#diversity/distance function
def diversity(a, b):
    c = a.intersection(b)
    d = float(len(c)) / (len(a) + len(b) - len(c))
    return 1 - d

我的代码:

from itertools import combinations

k = 3

max_distance = []

# I drop the column 'value' because sets that I want to compare is <A,M,F>
df_distance = df.drop(['value'],axis=1)
series_set = df_distance.apply(lambda row: set(row), axis=1)
data = series_set

for z in combinations(data, k):
    dis = 0
    sum_values = 0
    for a in combinations(z, 2):
        dis += diversity(*a)
        # I am stuck here, I want to sum the value but I don't know, how to get the value and sum it in combination
    max_distance.append((dis, tuple(z)))

max_distance.sort(key=lambda x: x[0], reverse=True)
print(max_distance[:k])

输出:

[(2.8, ({'plus', 'ABC', 'X'}, {'Y', 'minus', 'ABC'}, {'Z', 'star', 'DEF'})), (2.8, ({'plus', 'ABC', 'X'}, {'Y', 'star', 'ABC'}, {'Z', 'minus', 'DEF'})), (2.8, ({'plus', 'ABC', 'X'}, {'Z', 'minus', 'ABC'}, {'Y', 'star', 'DEF'}))]

在我上面的代码中,我只是计算距离的总和。值 2.8 只是距离的总和。我想对集合之间的距离求和,但只能从列 [A,M,F] 中求和,我还想对这些值求和。预期输出是(距离之和 + 值之和),它是所有集合组合的最佳值。

我真的很困惑如何对组合中的值求和?

预期输出:

  [(sum(distance) + sum(values) , ({'plus', 'ABC', 'X'}, {'Y', 'minus', 'ABC'}, {'Z', 'star', 'DEF'})), ((sum(distance) + sum(values), ({'plus', 'ABC', 'X'}, {'Y', 'star', 'ABC'}, {'Z', 'minus', 'DEF'})), ((sum(distance) + sum(values), ({'plus', 'ABC', 'X'}, {'Z', 'minus', 'ABC'}, {'Y', 'star', 'DEF'}))]

如果您有任何问题,请告诉我,对不起我的英语。

【问题讨论】:

    标签: python pandas dataframe combinations itertools


    【解决方案1】:

    请参阅下面稍微修改过的代码版本。我认为这是你想要的。我基本上将您的 set 转换为多样性函数,以便 series_set 可以是一个元组。然后可以使用该元组对具有多索引的 DataFrame 进行切片。

    import numpy as np
    import pandas as pd
    from itertools import product, combinations
    
    A = ['ABC', 'DEF']
    M = ['X', 'Y', 'Z']
    F = ['plus', 'minus', 'star']
    
    # Create all possible permutation from <A,M,F>
    df = pd.DataFrame(list(product(A,M,F)), columns=['A', 'M', 'F'])
    df['value'] = np.random.uniform(0, 1, df.shape[0])
    
    
    # diversity/distance function
    def diversity(a, b):
        c = set(a).intersection(b)
        d = float(len(c)) / (len(a) + len(b) - len(c))
        return 1 - d
    
    k = 3
    max_distance = []
    max_values = []
    
    # I drop the column 'value' because sets that I want to compare is <A,M,F>
    df_distance = df.drop(['value'],axis=1)
    df_sum = df.set_index(['A', 'M', 'F'])
    series_set = df_distance.apply(lambda row: tuple(row), axis=1)
    data = series_set
    
    for z in combinations(data, k):
        dis = 0
        sum_values = 0
        for a in combinations(z, 2):
            dis += diversity(*a)
            sum_values += df_sum.ix[a[0], 'value'] + df_sum.ix[a[1], 'value']
        max_distance.append((dis, tuple(z)))
        max_values.append((sum_values, tuple(z)))
    
    max_distance.sort(key=lambda x: x[0], reverse=True)
    print(max_distance[:k])
    
    max_values.sort(key=lambda x: x[0], reverse=True)
    print(max_values[:k])
    

    -- 更新--

    max_total = []
    for z in combinations(data, k):
        dis = 0
        sum_values = 0
        for a in combinations(z, 2):
            dis += diversity(*a)
            sum_values += df_sum.loc[a[0], 'value'] + df_sum.loc[a[1], 'value']
        total_sum = dis + sum_values
        max_total.append((total_sum, tuple(z)))
    
    max_total.sort(key=lambda x: x[0], reverse=True)
    print(max_total[:k])
    

    【讨论】:

    • 嗨,谢谢你的回答,但在这里我想max_total.sort(key=lambda x: x[0], reverse=True) 然后print(max_total[:k)。我的意思是总和值也将用于决定哪个集合组合具有最大值。具有最大总数的组合将是结果。
    • 它也给了我一个错误:C:\Anaconda\lib\site-packages\ipykernel_launcher.py:35: DeprecationWarning: .ix is deprecated。请使用 .loc 进行基于标签的索引或使用 .iloc 进行位置索引 请参阅此处的文档:pandas.pydata.org/pandas-docs/stable/…
    • Hi Ido S 谢谢,但如果我没记错的话,似乎总值是错误的。例如(8.4226393121550736, (('ABC', 'X', 'plus'), ('DEF', 'Y', 'minus'), ('DEF', 'Z', 'star'))total distance = 2.8 not 8.422 total values = 0.98+0.94+0.88 = 2.8max_total should be : 5.6
    • 我使用了您在帖子中生成的数据,这似乎与您在这里提到的数据不同?对于这个组合,我得到了[(4.8669925765449484, (('ABC', 'X', 'plus'), ('DEF', 'Y', 'minus'), ('DEF', 'Z', 'star')))]。根据你的数据表,总值应该是[0.64515316530289457, 0.58542398144044849, 0.036449070043609644],假设它只是对应于元组行的value字段。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2014-10-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多