【问题标题】:Using balance_classes on AutoML H2O generated error "java.lang.IllegalArgumentException: Error during sampling - too few points?"在 AutoML H2O 上使用 balance_classes 生成错误“java.lang.IllegalArgumentException: Error during sampling - too little points?”
【发布时间】:2019-12-10 18:22:15
【问题描述】:

在 AutoML H2O 上使用 balance_classes 生成错误“java.lang.IllegalArgumentException: Error during sampling - too little points?”

我正在尝试使用启用了 nfolds=5 和 balance_classes 的 AutoML H2O 模型来训练多类问题:

数据框上有三个不同的标签:

target           Count
-------------  -------
não conhecido     3789
não provido      11039
provido           3225

[3 rows x 2 columns]

所有模型都失败,并显示消息“java.lang.IllegalArgumentException: Error during sampling - too little points?”。

我不认为有太少的点。谁能解释一下这个问题?

使用的参数:

        include_algos = ["DRF", "GBM", "StackedEnsemble"],
        seed=1234,
        nfolds = nfolds,
        balance_classes = True,
        max_runtime_secs = 86400,
        max_models=8,
        max_runtime_secs_per_model = 1200,
        keep_cross_validation_predictions = True,
        verbosity = "debug",

日志:

Executando o treinamento do modelo do problema < tipo_decisao >...
AutoML progress: |
02:51:01.681: Project: automl_py_488_sid_932d
02:51:01.681: AutoML job created: 2019.12.10 02:51:01.680
02:51:01.681: Disabling Algo: DeepLearning as requested by the user.
02:51:01.682: Disabling Algo: XGBoost as requested by the user.
02:51:01.682: Disabling Algo: GLM as requested by the user.
02:51:01.682: Build control seed: 1234
02:51:01.706: training frame: Frame key: automl_training_py_488_sid_932d    cols: 1225    rows: 18053  chunks: 200    size: 192349542  checksum: 7379304490974335888
02:51:01.706: validation frame: NULL
02:51:01.706: leaderboard frame: NULL
02:51:01.706: blending frame: NULL
02:51:01.706: response column: target
02:51:01.706: fold column: null
02:51:01.706: weights column: null
02:51:01.737: Setting stopping tolerance adaptively based on the training frame: 0.007442610801832542
02:51:01.799: AutoML build started: 2019.12.10 02:51:01.799

█
02:51:04.812: Default Random Forest build failed: java.lang.IllegalArgumentException: Error during sampling - too few points?

██
02:51:07.831: GBM 1 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?

██
02:51:10.844: GBM 2 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?

██████
02:51:14.878: GBM 3 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?

███
02:51:18.897: GBM 4 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?

███
02:51:19.915: GBM 5 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?

███
02:51:22.954: Extremely Randomized Trees (XRT) Random Forest build failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
02:51:22.954: AutoML: starting GBM hyperparameter search

████████████████████████████████████| 100%

02:51:41.57: No models were built, due to timeouts or the exclude_algos option. StackedEnsemble builds skipped.
02:51:41.57: AutoML build stopped: 2019.12.10 02:51:41.57
02:51:41.57: AutoML build done: built 0 models
02:51:41.57: AutoML duration: 39.258 sec

【问题讨论】:

    标签: h2o sampling


    【解决方案1】:

    我检查了源代码,看起来不是因为观察太少。

    您能否只运行一个启用了平衡类的 GBM 模型并提供 H2O 日志? http://docs.h2o.ai/h2o/latest-stable/h2o-docs/logs.html#logging-in-python

    我不太确定当前日志是否会为我们提供足够的信息来解决这个问题,但我会进行更改,以便在下一个版本中添加更多信息。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-12
      • 2021-12-31
      • 2018-01-25
      • 1970-01-01
      • 1970-01-01
      • 2021-04-30
      相关资源
      最近更新 更多