如何在 optuna 中采样参数而不重复？答案

【问题标题】：How to sample parameters without duplicates in optuna?如何在 optuna 中采样参数而不重复？
【发布时间】：2019-11-12 14:22:38
【问题描述】：

我正在使用 optuna 对我的自定义模型进行参数优化。

在当前参数集之前没有测试之前，有什么方法可以对参数进行采样？我的意思是，如果过去有一些使用相同参数集的试验，请尝试对另一个参数进行采样。

在某些情况下，这是不可能的，例如，当存在分类分布并且n_trials 大于可能的唯一采样值的数量时。

我想要什么：有一些像num_attempts 这样的配置参数，以便在for-loop 中对高达num_attempts 的参数进行采样，直到有一个之前没有测试过的集合，否则 - 在最后一个采样的集合上运行试验.

为什么我需要这个：只是因为在相同参数上多次运行重型模型的成本太高。

我现在做什么：只做这个“for-loop”的东西，但它很乱。

如果有另一种聪明的方法 - 将非常感谢您提供信息。

谢谢！

【问题讨论】：

标签： python optuna

【解决方案1】：

据我所知，目前没有直接的方法可以处理您的案件。作为一种解决方法，您可以检查参数重复并跳过评估，如下所示：

import optuna

def objective(trial: optuna.Trial):
    # Sample parameters.
    x = trial.suggest_int('x', 0, 10)
    y = trial.suggest_categorical('y', [-10, -5, 0, 5, 10])

    # Check duplication and skip if it's detected.
    for t in trial.study.trials:
        if t.state != optuna.structs.TrialState.COMPLETE:
            continue

        if t.params == trial.params:
            return t.value  # Return the previous value without re-evaluating it.

            # # Note that if duplicate parameter sets are suggested too frequently,
            # # you can use the pruning mechanism of Optuna to mitigate the problem.
            # # By raising `TrialPruned` instead of just returning the previous value,
            # # the sampler is more likely to avoid sampling the parameters in the succeeding trials.
            #
            # raise optuna.structs.TrialPruned('Duplicate parameter set')

    # Evaluate parameters.
    return x + y

# Start study.
study = optuna.create_study()

unique_trials = 20
while unique_trials > len(set(str(t.params) for t in study.trials)):
    study.optimize(objective, n_trials=1)

【讨论】：

谢谢！这看起来比我的自定义循环准确得多。我看到的唯一问题是，如果对重复值进行采样，那么当前试验将被“使用”，不会尝试对非重复值进行采样（我的意思是，唯一试验的数量会更少）。这就是我不想使用修剪的原因，因为n_trials 的实际数量变得低于预期。但无论如何，如果没有这样的选择，我会使用你的方法，因为它看起来更简洁:)
很高兴我的回答对您有所帮助。顺便说一句，为了解决您提到的问题（即“n_trials 的实际数量变得低于预期”），我稍微更新了我的示例代码。请看代码的底部三行。
太棒了！非常感谢：）这正是我想要的！绝对看起来比凌乱的 for 循环好多了。

【解决方案2】：

对于第二个@sile 的代码注释，你可以写一个剪枝器，例如：

class RepeatPruner(BasePruner):
    def prune(self, study, trial):
        # type: (Study, FrozenTrial) -> bool

        trials = study.get_trials(deepcopy=False)
        completed_trials = [t.params for t in trials if t.state == TrialState.COMPLETE]
        n_trials = len(completed_trials)

        if n_trials == 0:
            return False

        if trial.params in completed_trials:
            return True

        return False

然后将修剪器称为：

study = optuna.create_study(study_name=study_name, storage=storage, load_if_exists=True, pruner=RepeatPruner())

【讨论】：

感谢您的回答！一个问题 - 如果我这样做 n_trials = 5 是否正确，其中有 2 个参数集是相同的（因此采样了 4 个唯一集），然后使用 pruner 我将只有 4 次尝试尝试/试验，对吗？我的意思是 pruner 会丢弃重复的样本，不会尝试重新采样另一个参数。
您仍然可以使用@sile 的while 循环来强制优化器尝试您想要的确切次数。希望这能回答您的问题。
请注意，多目标优化不支持修剪器