具有超参数优化的回归答案

【问题标题】：Regression with hyperparameter optimization具有超参数优化的回归
【发布时间】：2021-09-05 11:32:18
【问题描述】：

我正在尝试使用pytorch 解决时间序列回归问题，并使用optuna 进行超参数优化。我已尝试在 optuna 文档中改编 this example，该文档旨在进行手写数字识别。

我在FashionMNISTDataModule 类中自定义了setup 函数，以便它接受我的pandas 数据框：

from sklearn.model_selection import train_test_split
...

class FashionMNISTDataModule(pl.LightningDataModule):
    
    ...

    def setup(self, stage: Optional[str] = None) -> None:

        #self.mnist_test = datasets.FashionMNIST(
        #    self.data_dir, train=False, download=0, transform=transforms.ToTensor()
        #)
        #mnist_full = datasets.FashionMNIST(
        #    self.data_dir, train=True, download=0, transform=transforms.ToTensor()
        #)

        # inputs
        X = df[['x1','x2','x3','x4']]

        # output
        y = df[['y']]

        # separate into training/testing data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=500)

        # convert to tensors
        X_train = torch.from_numpy(X_train.to_numpy()).float()
        y_train = torch.squeeze(torch.from_numpy(y_train.to_numpy()).float())
        X_test = torch.from_numpy(X_test.to_numpy()).float()
        y_test = torch.squeeze(torch.from_numpy(y_test.to_numpy()).float())

        self.mnist_test = X_test
        mnist_full = y_test

        self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000])

    ...

我的示例数据是：

>> df

                    y       x1        x2         x3      x4  
Date                                                          
2018-03-05   73.68750   2204.0  108.6875   5.964844  2018.0   
2018-03-12   65.06250   2244.0  106.0000  11.164062  2102.0   
2018-03-19   61.28125   2240.0  106.8750   8.304688  2130.0   
2018-03-26   57.87500   2256.0  107.5625  16.750000  2154.0   
2019-03-04  173.37500   1826.0  113.8125  16.328125  2130.0   
2019-03-11  199.75000   1789.0  110.3750   6.386719  2038.0   
2019-03-18  206.25000   1809.0  109.6250   4.468750  1958.0   
2019-03-25  186.50000   1780.0  111.1875  17.375000  1949.0   
2020-03-02   63.81250   2586.0  113.2500   8.281250  2108.0   
2020-03-09   52.75000   2514.0  111.6875  12.937500  2088.0   
2020-03-16   72.12500   2468.0  109.7500  15.960938  2058.0   
2020-03-23   75.87500   2394.0  111.0000  18.890625  2023.0   
2020-03-30   51.71875   2298.0   95.1250  10.843750  2122.0

如果我使用我的新函数运行代码（保持其他所有内容几乎相同），我会收到错误消息：

usage: ipykernel_launcher.py [-h] [--pruning]
ipykernel_launcher.py: error: unrecognized arguments: -f /home/<username>/.local/share/jupyter/runtime/kernel-027f206b-6952-4ff2-bdbe-c947aad00191.json

An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

这可能是因为我试图将张量传递给不接受张量的函数，但我不确定要更改什么。

如果我通过 .py 文件而不是 jupyterlab 运行代码，我会收到错误消息

ValueError('Sum of input lengths does not equal the length of the input dataset!')
Traceback (most recent call last):
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\optuna\_optimize.py", line 216, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\<username>\Downloads\mymodel_v12beta.py", line 169, in objective
    trainer.fit(model, datamodule=datamodule)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
    self._run(model)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 713, in _run
    self.call_setup_hook(model)  # allow user to setup lightning_module in accelerator environment
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1159, in call_setup_hook
    self.datamodule.setup(stage=fn)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\core\datamodule.py", line 384, in wrapped_fn
    return fn(*args, **kwargs)
  File "C:\Users\<username>\Downloads\mymodel_v12beta.py", line 129, in setup
    self.mnist_train, self.mnist_val = random_split(mnist_full, [10, 3])
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataset.py", line 332, in random_split
    raise ValueError("Sum of input lengths does not equal the length of the input dataset!")
ValueError: Sum of input lengths does not equal the length of the input dataset! [0m
Traceback (most recent call last):
  File "C:\Users\<username>\Downloads\mymodel_v12beta.py", line 190, in <module>
    study.optimize(objective, n_trials=100, timeout=600)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\optuna\study.py", line 401, in optimize
    _optimize(
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\optuna\_optimize.py", line 65, in _optimize
    _optimize_sequential(
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\optuna\_optimize.py", line 162, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\optuna\_optimize.py", line 267, in _run_trial
    raise func_err
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\optuna\_optimize.py", line 216, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\<username>\Downloads\mymodel_v12beta.py", line 169, in objective
    trainer.fit(model, datamodule=datamodule)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
    self._run(model)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 713, in _run
    self.call_setup_hook(model)  # allow user to setup lightning_module in accelerator environment
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1159, in call_setup_hook
    self.datamodule.setup(stage=fn)
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\core\datamodule.py", line 384, in wrapped_fn
    return fn(*args, **kwargs)
  File "C:\Users\<username>\Downloads\mymodel_v12beta.py", line 129, in setup
    self.mnist_train, self.mnist_val = random_split(mnist_full, [10, 3])
  File "C:\Users\<username>\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\data\dataset.py", line 332, in random_split
    raise ValueError("Sum of input lengths does not equal the length of the input dataset!")
ValueError: Sum of input lengths does not equal the length of the input dataset!

【问题讨论】：

标签： python machine-learning optimization neural-network pytorch

【解决方案1】：

错误似乎是指解析器参数与使用 jupyter notebook 冲突。

您是否尝试在命令行上运行它（使用 .py 文件而不是笔记本）？

您可以尝试here的一些解决方案

【讨论】：

谢谢，我在 .py 文件中再次尝试并收到错误 ValueError: Sum of input lengths does not equal the length of the input dataset!
你有完整的错误吗？尤其是对线的引用。我认为它可能是这一行： self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000]) 您不再使用 fashionmnist 数据集，因此这些值是错误的。您必须事先计算它们。此外，我不确定在此处使用 mnist_full 是否是您的目标，因为它只是目标。最好将您的训练集拆分为训练和验证，就像您之前使用 train_test_split() 函数所做的那样。
我在帖子中添加了错误消息。我尝试更改 self.mnist_train, self.mnist_val = random_split(mnist_full, ...) 以考虑我的数据集尺寸，但得到相同的错误。
所以 mnist_full = y_test 大小为 0.2*13 所以 2 或 3 取决于后面的代码。所以我认为你的拆分 [10,3] 仍然是错误的。你可以试试： X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=500) X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=500) 然后做对张量的转换和对 self.mnist_ 的转换，用于训练、验证和测试