列上的 Python / Pandas / PuLP 优化答案

【问题标题】：Python / Pandas / PuLP optimization on a column列上的 Python / Pandas / PuLP 优化
【发布时间】：2020-09-15 10:28:52
【问题描述】：

我正在尝试优化 Pandas 数据框中的一列数据。我浏览了过去的帖子，但找不到解决优化数据框中列中的值问题的帖子。这是我的第一篇文章，对编码来说相对较新，所以提前道歉。下面是我正在使用的代码

from pandas import DataFrame
import numpy as np
from pulp import *

heading = [184, 153, 140, 122, 119]
df = DataFrame (heading, columns=['heading'])
df['speed'] = 50 
df['ratio'] = df.speed/df.heading
conditions = [
    (df['ratio'] < 0.1),
    (df['ratio'] >= 0.1 ) & (df['ratio'] < 0.2),
    (df['ratio'] >= 0.2 ) & (df['ratio'] < 0.3),
    (df['ratio'] >= 0.3 ) & (df['ratio'] < 0.4),
    (df['ratio'] > 0.4 )]
choices = [3, 1, 8, 5, 2]
df['choice'] = np.select(conditions, choices)
df['final_column'] = df.choice * df.heading

print(np.sum(df.final_column))

我使用 np.select 搜索“条件”并返回适当的“选择”。这就像我在 excel 中使用的 vlookup。

我正在尝试获取 PuLP 或任何其他适当的优化工具，甚至可能只是一个循环来找到 df.speed 的最佳值（我从临时值 50 开始）以最大化 ' 中的值总和final_column。以下是我尝试过的代码，但它不起作用。

prob = LpProblem("Optimal Values",LpMaximize)
speed_vars = LpVariable("Variable",df.speed,lowBound=0,cat='Integer')
prob += lpSum(df.new_column_final)
prob.solve()

以下是我得到的错误：

speed_vars = LpVariable("变量",df.speed,lowBound=0,cat='Integer') TypeError: init() 为参数 'lowBound' 获得了多个值

非常感谢您的帮助。任何帮助将不胜感激！

【问题讨论】：

“优化一列数据”是什么意思？
对不起 - 我的意思是我希望该列的每一行中的数据/值成为最佳/最佳选择，以使该行成为最高值并且所有行的总和成为可能的最高值。
您能否提供minimal reproducible example - 基本上是一个玩具问题，其中包含您问题中代码中所需的所有数据，其中答案是显而易见的。这使人们更容易回答，您可以将该方法扩展/应用到更大/更复杂的问题。
要添加，请向我们展示当前数据的样本和期望的结果，以更好地说明您的问题。由于人类语言并不总是准确的，因此数字往往比文本中的单词更有帮助。最后，不要假设我们现在您的域就像纸浆一样。
感谢 kabdulla 和 Parfait - 相应地进行了简化和编辑

标签： python pandas pulp

【解决方案1】：

首先，您收到的具体错误消息： TypeError: __init__() got multiple values for argument 'lowBound'

在python中调用函数时，您可以通过“位置”传递参数——这意味着传递参数的顺序告诉函数它们每个是什么——或者通过命名它们。如果您在 documentation 中查找纸浆.LpVariable 方法，您会看到第二个位置参数是 'lowbound'，然后您也将其作为命名参数传递 - 因此出现错误消息。

我认为您可能还对数据框的工作方式有些误解。它不像 excel，您在列中设置“公式”，并且随着该行上的其他元素发生变化，它会保持更新到该公式。您可以为列分配值，但如果输入数据发生更改 - 只有再次运行该位代码时才会更新单元格。

在解决您的问题方面 - 我不相信我已经理解您想要做什么，但我已经理解以下内容。

我们希望选择 df['speed'] 的值来最大化 heading 和 choices 列的和积
选择列的值取决于speed 到heading 的ratio（根据给定的5 个范围）
Heading 列已修复

通过检查，将通过设置所有速度来实现最佳值，使比率在 [0.2 - 0.3] 范围内，而它们落在该范围内的位置无关紧要。在下面的 pandas 数据框中的 PuLP 中执行此操作的代码。它依赖于使用二进制变量来跟踪比率落在哪个范围内。

虽然语法有点尴尬 - 我建议完全在数据帧之外进行优化，并在最后加载结果 - 使用 LpVariable.dicts 方法来创建变量数组。

from pandas import DataFrame
import numpy as np
from pulp import *

headings = [184.0, 153.0, 140.0, 122.0, 119.0]
df = DataFrame (headings, columns=['heading'])
df['speed'] = 50
max_speed = 500.0
max_ratio = max_speed / np.min(headings)
df['ratio'] = df.speed/df.heading
conditions_lb = [0,   0.1, 0.2, 0.3, 0.4]
conditions_ub = [0.1, 0.2, 0.3, 0.4, max_speed / np.min(headings)]
choices = [3, 1, 8, 5, 2]
n_range = len(choices)
n_rows = len(df)

# Create primary ratio variables - one for each variable:
df['speed_vars'] = [LpVariable("speed_"+str(j)) for j in range(n_rows)]

# Create auxilary variables - binaries to control
# which bit of range each speed is in
df['aux_vars'] = [[LpVariable("aux_"+str(i)+"_"+str(j), cat='Binary')
                   for i in range(n_range)]
                   for j in range(n_rows)]

# Declare problem
prob = LpProblem("max_pd_column",LpMaximize)

# Define objective function
prob += lpSum([df['aux_vars'][j][i]*choices[i]*headings[j] for i in range(n_range)
               for j in range(n_rows)])

# Constrain only one range to be selected for each row
for j in range(n_rows):
    prob += lpSum([df['aux_vars'][j][i] for i in range(n_range)]) == 1

# Constrain the value of the speed by the ratio range selected
for j in range(n_rows):
    for i in range(n_range):
        prob += df['speed_vars'][j]*(1.0/df['heading'][j]) <= \
                        conditions_ub[i] + (1-df['aux_vars'][j][i])*max_ratio
        prob += df['speed_vars'][j]*(1.0/df['heading'][j]) >= \
                        conditions_lb[i]*df['aux_vars'][j][i]

# Solve problem and print results
prob.solve()

# Dislay the optimums of each var in problem
for v in prob.variables ():
    print (v.name, "=", v.varValue)

# Set values in dataframe and print:
df['speed_opt'] = [df['speed_vars'][j].varValue for j in range(n_rows)]
df['ratio_opt'] = df.speed_opt/df.heading
print(df)

打印出来的最后一位：

   heading speed_vars                                    b  spd_opt  rat_opt
0    184.0    speed_0  [b_0_0, b_1_0, b_2_0, b_3_0, b_4_0]     36.8      0.2
1    153.0    speed_1  [b_0_1, b_1_1, b_2_1, b_3_1, b_4_1]     30.6      0.2
2    140.0    speed_2  [b_0_2, b_1_2, b_2_2, b_3_2, b_4_2]     28.0      0.2
3    122.0    speed_3  [b_0_3, b_1_3, b_2_3, b_3_3, b_4_3]     24.4      0.2
4    119.0    speed_4  [b_0_4, b_1_4, b_2_4, b_3_4, b_4_4]     23.8      0.2

【讨论】：