【问题标题】:Hyperopt mongotrials issue with Pickle: AttributeError: 'module' object has no attributePickle 的 Hyperopt mongotrials 问题:AttributeError:“模块”对象没有属性
【发布时间】:2017-05-13 10:27:01
【问题描述】:

我正在尝试将 Hyperopt 并行搜索与 MongoDB 一起使用,并遇到了 Mongotrials 的一些问题,这些问题已在 here 进行了讨论。我已经尝试了他们所有的方法,但我仍然无法找到解决我的具体问题的方法。我试图最小化的具体模型是来自 sklearn 的 RadomForestRegressor。

我关注了这个tutorial。而且我可以毫无问题地打印出计算出的“fmin”。

这是我目前的步骤:

1) 激活一个名为“tensorflow”的虚拟环境(我已经在那里安装了我所有的库)

2) 启动 MongoDB:

(tensorflow) bash-3.2$ mongod --dbpath . --port 1234 --directoryperdb --journal --nohttpinterface

3) 启动工人:

(tensorflow) bash-3.2$ hyperopt-mongo-worker --mongo=localhost:1234/foo_db --poll-interval=0.1

4)运行我的python代码,我的python代码如下:

import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error

from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials


# Preprocessing data
train_xg = pd.read_csv('train.csv')
n_train = len(train_xg)
print "Whole data set size: ", n_train

# Creating columns for features, and categorical features
features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
for c in range(len(cat_features_col)):
    train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes

# Use this to train random forest regressor
train_xg_x = np.array(train_xg[features_col])
train_xg_y = np.array(train_xg['loss'])


space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }

trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')

def minMe(params):
    # Hyperopt tuning for hyperparameters
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import RandomForestRegressor
    from hyperopt import STATUS_OK

    try:
        import dill as pickle
        print('Went with dill')
    except ImportError:
        import pickle

    def hyperopt_rf(params):
        rf = RandomForestRegressor(**params)
        return cross_val_score(rf, train_xg_x, train_xg_y).mean()

    acc = hyperopt_rf(params)
    print 'new acc:', acc, 'params: ', params
    return {'loss': -acc, 'status': STATUS_OK}

best = fmin(fn=minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
print "Best: ", best

5) 运行上述 Python 代码后,出现以下错误:

INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 1234
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:no job found, sleeping for 0.7s
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'minMe'
Traceback (most recent call last):
  File "/Users/WernerChao/tensorflow/bin/hyperopt-mongo-worker", line 6, in <module>
    sys.exit(hyperopt.mongoexp.main_worker())
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
    return main_worker_helper(options, args)
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
    mworker.run_one(reserve_timeout=float(options.reserve_timeout))
  File "/Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
    domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'minMe'
INFO:hyperopt.mongoexp:exiting with N=9223372036854775803 after 4 consecutive exceptions

6) 然后 Mongo 工作人员将关闭。

我尝试过的事情:

  • 按照错误提示安装“dill” -> 不起作用
  • 将全局导入放入目标函数中,这样它就可以腌制 -> 不起作用
  • 尝试使用“dill”或“pickle”作为导入除外 -> 无效

有人有类似的问题吗?我已经没有想法可以尝试了,并且一直在徒劳地工作了 2 天。我想我在这里错过了一些非常简单的东西,只是似乎找不到它。 我错过了什么? 欢迎任何建议!

【问题讨论】:

    标签: mongodb python-2.7


    【解决方案1】:

    尝试在你的 tensorflow(或者可能是 worker)的 Python 环境中安装 Dill

    /Users/WernerChao/tensorflow/lib/python2.7/site-packages/hyperopt
    

    您的目标是摆脱 hyperopt 错误消息:

    hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
    

    这是因为默认情况下 Python 无法编组函数。它需要 dill 库来扩展 Python 的 pickling 模块以对 Python 对象进行序列化/反序列化。在您的情况下,它无法序列化您的函数minMe()

    【讨论】:

      【解决方案2】:

      在想出一个可行的解决方案之前,我为此奋斗了几天。有两个问题: 1. mongo worker 生成一个单独的进程来运行优化器,因此原始 python 文件中的任何上下文都将丢失并且对于这个新进程不可用。 2. 这个新进程的导入发生在 hyperopt-mongo-worker scipy 的上下文中,在您的情况下将是 /Users/WernerChao/tensorflow/bin/。

      所以我的解决方案是让这个新的优化器功能完全自给自足

      优化器.py

      import numpy as np
      import pandas as pd
      
      from sklearn.metrics import mean_absolute_error
      
      # Preprocessing data
      train_xg = pd.read_csv('train.csv')
      n_train = len(train_xg)
      print "Whole data set size: ", n_train
      
      # Creating columns for features, and categorical features
      features_col = [x for x in train_xg.columns if x not in ['id', 'loss', 'log_loss']]
      cat_features_col = [x for x in train_xg.select_dtypes(include=['object']).columns if x not in ['id', 'loss', 'log_loss']]
      for c in range(len(cat_features_col)):
          train_xg[cat_features_col[c]] = train_xg[cat_features_col[c]].astype('category').cat.codes
      
      # Use this to train random forest regressor
      train_xg_x = np.array(train_xg[features_col])
      train_xg_y = np.array(train_xg['loss'])
      
      
      
      def minMe(params):
          # Hyperopt tuning for hyperparameters
          from sklearn.model_selection import cross_val_score
          from sklearn.ensemble import RandomForestRegressor
          from hyperopt import STATUS_OK
      
          try:
              import dill as pickle
              print('Went with dill')
          except ImportError:
              import pickle
      
          def hyperopt_rf(params):
              rf = RandomForestRegressor(**params)
              return cross_val_score(rf, train_xg_x, train_xg_y).mean()
      
          acc = hyperopt_rf(params)
          print 'new acc:', acc, 'params: ', params
          return {'loss': -acc, 'status': STATUS_OK}
      

      包装器.py

      from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
      from hyperopt.mongoexp import MongoTrials
      
      import optimizer
      
      space_rf = { 'min_samples_leaf': hp.choice('min_samples_leaf', range(1,100)) }
      best = fmin(fn=optimizer.minMe, space=space_rf, trials=trials, algo=tpe.suggest, max_evals=100)
      print "Best: ", best
      
      trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
      

      一旦你有了这段代码,将 optimizer.py 链接到 bin 文件夹

      ln -s /Users/WernerChao/Git/test/optimizer.py /Users/WernerChao/tensorflow/bin/
      

      现在运行 wrapper.py,然后运行 ​​mongo worker,它应该能够从本地上下文中导入优化器并运行 minMe 函数。

      【讨论】:

        【解决方案3】:

        在 python 3.5 中遇到了同样的问题。安装 Dill 没有帮助,在 MongoTrials 或 hyperopt-mongo-worker cli 中设置 workdir 也没有帮助。 hyperopt-mongo-worker 似乎无法访问定义函数的__main__

        AttributeError: Can't get attribute 'minMe' on <module '__main__' from ...hyperopt-mongo-worker
        

        正如@jaikumarm 所建议的,我通过编写一个包含所有必需功能的模块文件来绕过这个问题。但是,我没有将其软链接到bin 目录,而是在运行hyperopt-mongo-worker 之前扩展了PYTHONPATH

        export PYTHONPATH="${PYTHONPATH}:<dir_with_the_module.py>"
        hyperopt-mongo-worker ...
        

        这样,hyperopt-monogo-worker 可以导入包含minMe 的模块。

        【讨论】:

        • 这种方法对我有用。在我的实现中有意义的是将超参数逻辑分离为仅处理输入参数并有一个单独的model_runner 模块来接收参数、读取模型文件和训练模型。
        【解决方案4】:

        我制作了一个单独的文件来计算损失并将其复制到/anaconda2/bin//anaconda2/lib/python2.7/site-packages/hyperopt 它工作正常。

        这是我的追溯

        Traceback (most recent call last):
        File "/home/greatskull/anaconda2/bin/hyperopt-mongo-worker", line 6, in <module>
        sys.exit(hyperopt.mongoexp.main_worker())
        File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
        return main_worker_helper(options, args)
        File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
        mworker.run_one(reserve_timeout=float(options.reserve_timeout))
        File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1073, in run_one
        with temp_dir(workdir, erase_created_workdir), working_dir(workdir):
        File "/home/greatskull/anaconda2/lib/python2.7/contextlib.py", line 17, in __enter__
        return self.gen.next()
        File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/utils.py", line 229, in temp_dir
        os.makedirs(dir)
        File "/home/greatskull/anaconda2/lib/python2.7/os.py", line 150, in makedirs
        makedirs(head, mode)
        File "/home/greatskull/anaconda2/lib/python2.7/os.py", line 157, in makedirs
        mkdir(name, mode)
        

        【讨论】:

          猜你喜欢
          • 2010-11-18
          • 2018-08-28
          相关资源
          最近更新 更多