纸浆优化不重复排列组合答案

【问题标题】：Pulp optimization not repeated permutation combination纸浆优化不重复排列组合
【发布时间】：2020-08-18 10:29:05
【问题描述】：

我正在研究一个优化问题，我的目标是在出售两个产品对时最大化利润，但约束是，这对产品不应该重复。

我正在使用 Pulp 来优化解决方案，但代码效率低下并且会陷入无限循环。

file = pd.read_csv('input_file.csv')

main_product_1 = list(file['Product ID'].unique())
main_product_2 = list(file['Product ID 2'].unique())

file.set_index(['Product ID', 'Product ID 2'], inplace=True)
file = file['Profit']

# Target Variable
combine = pulp.LpVariable.dicts("combine",
                                ((product_1, product_2) for product_1 in main_product_1 for
                                 product_2 in main_product_2 if product_1 != product_2),
                                cat='Binary')

# Initializing the model

model = pulp.LpProblem("prof_max", pulp.LpMaximize)

# Objective Function optimization
model += pulp.lpSum(
        [combine[product_1, product_2] * file.loc[(product_1, product_2)] for product_1 in
         main_product_1 for product_2 in main_product_2 if product_1 != product_2])

# Constraints for optimization
for area in set_plant:
  model += pulp.lpSum([combine[area, other] for other in main_product_1 if area != other]
                      + [combine[other, area] for other in main_product_2 if area != other]) == 1

model.solve()
print(pulp.LpStatus[model.status])

# Check
set_index = set(file.index)
set_expected = set(
        [(product_1, product_2) for product_1 in main_product_1 for product_2 in main_product_2 if
         product_1 != product_2])
len(set_expected - set_index)

问题是代码进入了无限循环，我没有得到任何结果，有没有更优化的方法来运行这个方法？

【问题讨论】：

你说的是产品ID的排列还是组合？ (1000, 1001) 和 (1001, 1000) 可以都在结果中，还是只有一个？
如果我们有一对已经有 1000 和 1001，它们不能在任何其他对中重复为产品 ID 或产品 ID 1。

标签： python math optimization pulp

【解决方案1】：

问题在于您要添加非常非常多的整数变量，而整数线性规划是一个非常难以解决且效率低下的问题。不过，您可能还有一些额外的改进。

请考虑以下构造：

如果您有产品 1 的 M 实例和产品 2 的 N 实例，那么您有 MN 二进制变量 x_mn;
如果 x_mn 为 1，则这些变量中的每一个都将 P 贡献给目标函数；
您在约束中说x_mn + x_nm == 1，但实际上应该是x_mn + x_nm <= 1。否则，您说您必须在您的列表中拥有每个组合。这可能会导致不可行的解决方案；
如果您正在考虑组合而不是排列（即 [1000, 1001] 和 [1001, 1000] 是相同的，那么这意味着 M = N 并且您可以删除大约一半的变量只有 @987654330 @left。（如果您将值空间视为 MxM 正方形，则您只接受大约一个三角形，因为另一个三角形是等价的；
如果像上面提到的那样限制变量空间，实际上不需要任何约束；如果x_nm 不存在，那么x_mn <= 1 对于二进制变量是显而易见的。

import pandas as pd
import numpy as np
import pulp

file = pd.read_csv('file.csv')
file = file[file['Product ID'] < file['Product ID 2']]
file.set_index(['Product ID', 'Product ID 2'], inplace=True)
file = file['Profit']

combinations = file.index
individual_products = set()

for product_1, product_2 in combinations:
individual_products.add(product_1)
individual_products.add(product_2)

# Target Variable
combine = pulp.LpVariable.dicts("combine", combinations, cat='Binary')

# Initializing the model

model = pulp.LpProblem("prof_max", pulp.LpMaximize)

# Objective Function optimization
model += pulp.lpSum([combine[i] for i in file.index] * file)

# All individual products can only be used once
for product in individual_products:
matching_combinations = combinations[(combinations.to_frame() == product).any(axis=1)]
model += pulp.lpSum([combine[i] for i in matching_combinations]) <= 1

model.solve()
print(pulp.LpStatus[model.status])
print([v for v in model.variables() if v.varValue > 0])

通过这些更改，在不改变问题或固有实现的情况下，您基本上消除了 75% 的约束。

【讨论】：

感谢 Ruben，我在使用 "" # 目标函数优化行时收到“索引器过多”的错误 - 再次注意
文件没有所需的结构，它仍然包含许多（A，B）和（B，A）的组合。你能把它清理干净吗？
我可以用 "file = file[~file[['Product ID', 'Product ID 1']].apply(frozenset, axis=1).duplicated()] 来清理它“仍然面临着这个问题。 “索引器太多”
如果你使用file.loc[product_1, product_2]而不使用()-brackets呢？
使用打印语句查找问题，是file还是combine？ print(combine[product_1, product_2]) 和 print(file.loc[product_1, product_2]) 之一应该给出错误。我开始认为你需要使用combine[(product_1, product_2)]