多因素优化功能答案

【问题标题】：Optimization function across multiple factors多因素优化功能
【发布时间】：2019-04-06 21:46:09
【问题描述】：

我正在尝试为产生最大成功率的两项活动确定适当的阈值。

下面列出的是我想要完成的一个示例。对于每个位置，我试图确定用于活动 1 和 2 的阈值，因此如果满足任一标准，那么我们会猜测“是”（1）。然后，我需要确保我们只对每个位置的总体积的一定百分比猜测“是”，并且我们正在最大化我们的准确性（我们的猜测 =“结果”为 1）。

location <- c(1,2,3)    
testFile <- data.frame(location = rep.int(location, 20),
                          activity1 = round(rnorm(20, mean = 10, sd = 3)),
                          activity2 = round(rnorm(20, mean = 20, sd = 3)),
                          outcome = rbinom(20,1,0.5)
                       )
    set.seed(145)
    act_1_thresholds <- seq(7,12,1)
    act_2_thresholds <- seq(19,24,1)

我能够通过创建一个表来完成此操作，其中包含活动 1 和 2 的所有可能的唯一阈值组合，然后将其与样本数据集中的每个观察结果合并。然而，在实际数据集中有大约 200 个位置，每个位置都有数千个观察值，我很快就用完了空间。

我想创建一个函数，它获取位置 ID、活动 1 和活动 2 的一组可能阈值，然后计算我们猜是的频率（即“活动 1”或“活动”中的值） activity2' 超过了我们正在测试的各自阈值），以确保我们的应用率保持在我们期望的范围内（50% - 75%）。然后，对于在我们期望的范围内产生应用率的每组阈值，我们希望只存储最大化准确性的一组阈值，以及它们各自的位置 ID、应用率和准确率。下面列出了所需的输出。

      location act_1_thresh act_2_thresh application_rate accuracy_rate
1        1           13           19             0.52          0.45
2        2           11           24             0.57          0.53
3        3           14           21             0.67          0.42

我曾尝试将其写入 for 循环，但无法通过我必须创建的嵌套参数数量来解决所有这些条件。我会感谢任何尝试过类似问题的人的帮助。谢谢！

下面列出了如何计算一组阈值的应用和准确率的示例。

### Create yard IDs
location <- c(1,2,3)

### Create a single set of thresholds
single_act_1_threshold <- 12
single_act_2_threshold <- 20

### Calculate the simulated application, and success rate of thresholds mentioned above using historical data
as.data.table(testFile)[,
                        list(
                        application_rate = round(sum(ifelse(single_act_1_threshold <= activity1 | single_act_2_threshold <= activity2, 1, 0))/
                                                   nrow(testFile),2),
                        accuracy_rate = round(sum(ifelse((single_act_1_threshold <= activity1 | single_act_2_threshold <= activity2) & (outcome == 1), 1, 0))/
                                                sum(ifelse(single_act_1_threshold <= activity1 | single_act_2_threshold <= activity2, 1, 0)),2)
                        ),
                        by = location]

【问题讨论】：

样本数据中的location 是什么（即rep.int(location, 20)）？另外，请显示（不要用文字说明）代码以使用一个位置 ID 计算您的阈值和费率，以便我们可以帮助扩展到所有 ID。
请看上面调整后的说明和脚本。请注意，这仅显示了一组阈值的应用程序和准确率的计算。我希望能够迭代两组阈值的独特组合。

标签： r for-loop optimization

【解决方案1】：

考虑expand.grid，它构建了一个包含两个阈值之间所有组合的数据框。然后使用Map 在数据框的两列之间逐元素迭代以构建数据表列表（其中现在包括每个阈值指标的列）。

act_1_thresholds <- seq(7,12,1)
act_2_thresholds <- seq(19,24,1)

# ALL COMBINATIONS
thresholds_df <- expand.grid(th1=act_1_thresholds, th2=act_2_thresholds)

# USER-DEFINED FUNCTION
calc <- function(th1, th2)
     as.data.table(testFile)[, list(
                                  act_1_thresholds = th1,     # NEW COLUMN
                                  act_2_thresholds = th2,     # NEW COLUMN                      
                                  application_rate = round(sum(ifelse(th1 <= activity1 | th2 <= activity2, 1, 0)) /
                                                           nrow(testFile),2),
                                  accuracy_rate = round(sum(ifelse((th1 <= activity1 | th2 <= activity2) & (outcome == 1), 1, 0)) /
                                                        sum(ifelse(th1 <= activity1 | th2 <= activity2, 1, 0)),2)
                                ), by = location]    
# LIST OF DATA TABLES
dt_list <- Map(calc, thresholds_df$th1, thresholds_df$th2)

# NAME ELEMENTS OF LIST
names(dt_list) <- paste(thresholds_df$th1, thresholds_df$th2, sep="_")

# SAME RESULT AS POSTED EXAMPLE
dt_list$`12_20`  
#    location act_1_thresholds act_2_thresholds application_rate accuracy_rate
# 1:        1               12               20             0.23           0.5
# 2:        2               12               20             0.23           0.5
# 3:        3               12               20             0.23           0.5

如果您需要附加所有元素，请使用 data.table 的 rbindlist:

final_dt <- rbindlist(dt_list)
final_dt

#      location act_1_thresholds act_2_thresholds application_rate accuracy_rate
#   1:        1                7               19             0.32          0.47
#   2:        2                7               19             0.32          0.47
#   3:        3                7               19             0.32          0.47
#   4:        1                8               19             0.32          0.47
#   5:        2                8               19             0.32          0.47
#  ---                                                                          
# 104:        2               11               24             0.20          0.42
# 105:        3               11               24             0.20          0.42
# 106:        1               12               24             0.15          0.56
# 107:        2               12               24             0.15          0.56
# 108:        3               12               24             0.15          0.56

【讨论】：