【问题标题】:Keras/Theano: Node compilation failed during trainingKeras/Theano:训练期间节点编译失败
【发布时间】:2017-03-04 06:18:32
【问题描述】:

我正在尝试在已编译的 Mac OS X 上训练 Keras 模型,但出现以下错误:

Problem occurred during compilation with the command line below:
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/lib/python2.7/site-packages/theano/gof -L/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib -fvisibility=hidden -o /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/c6acccb2fd68eac67ca5b0f0fb9ad9bb.so /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp
/Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:894:21: warning: comparison of array 'outputs' equal to a null pointer is always false [-Wtautological-pointer-compare]
                if (outputs == NULL) {
                    ^~~~~~~    ~~~~
/Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:919:54: error: arithmetic on a pointer to void
                                    PyArray_DATA(V3) + data_offset,
                                    ~~~~~~~~~~~~~~~~ ^
1 warning and 1 error generated.

Traceback (most recent call last):
  File "osr.py", line 359, in <module>
    osr.train_osr_model()
  File "osr.py", line 88, in train_osr_model
    nb_worker=1)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 1454, in fit_generator
    self._make_train_function()
  File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 767, in _make_train_function
    **self._function_kwargs)
  File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 969, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 955, in __init__
    **kwargs)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1795, in orig_function
    defaults)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1661, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "/usr/local/lib/python2.7/site-packages/theano/gof/vm.py", line 1063, in make_all
    impl=impl))
  File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 924, in make_thunk
    no_recycling)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 828, in make_c_thunk
    output_storage=node_output_storage)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1190, in make_thunk
    keep_lock=keep_lock)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1131, in __compile__
    keep_lock=keep_lock)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1586, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1155, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1489, in compile_cmodule
    preargs=preargs)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2304, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Split{4}(InplaceDimShuffle{1,0,2}.0, TensorConstant{2}, TensorConstant{(4,) of 256}), '\n', "Compilation failed (return status=1): /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:894:21: warning: comparison of array 'outputs' equal to a null pointer is always false [-Wtautological-pointer-compare].                 if (outputs == NULL) {.                     ^~~~~~~    ~~~~. /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:919:54: error: arithmetic on a pointer to void.                                     PyArray_DATA(V3) + data_offset,.                                     ~~~~~~~~~~~~~~~~ ^. 1 warning and 1 error generated.. ", '[*1 -> Split{4}(<TensorType(float32, 3D)>, TensorConstant{2}, TensorConstant{(4,) of 256}), *1::1, *1::2, *1::3]')

我已更新 Keras 和 Theano,但问题仍然存在。我很困惑,因为就在几天前训练完全相同的模型没有这个问题。以下是训练期间使用的函数:

def train_osr_model(self):
    """ Train the optical speech recognizer
    """
    print "\nTraining OSR"
    validation_ratio = 0.3
    batch_size = 32
    with h5py.File(self.training_save_fn, "r") as training_save_file:
        sample_count = int(training_save_file.attrs["sample_count"])
        sample_idxs = range(0, sample_count)
        sample_idxs = np.random.permutation(sample_idxs)
        training_sample_idxs = sample_idxs[0:int((1-validation_ratio)*sample_count)]
        validation_sample_idxs = sample_idxs[int((1-validation_ratio)*sample_count):]
        training_sequence_generator = self.generate_training_sequences(batch_size=batch_size, 
                                                                       training_save_file=training_save_file,
                                                                       training_sample_idxs=training_sample_idxs)
        validation_sequence_generator = self.generate_validation_sequences(batch_size=batch_size, 
                                                                           training_save_file=training_save_file,
                                                                           validation_sample_idxs=validation_sample_idxs)

        print "Sample Idxs: {0}\n".format(sample_idxs) # FOR DEBUG ONLY
        print "Training Idxs: {0}\n".format(training_sample_idxs) # FOR DEBUG ONLY
        print "Validation Idxs: {0}\n".format(validation_sample_idxs) # FOR DEBUG ONLY

        pbi = ProgressDisplay()
        self.osr.fit_generator(generator=training_sequence_generator,
                               validation_data=validation_sequence_generator,
                               samples_per_epoch=len(training_sample_idxs),
                               nb_val_samples=len(validation_sample_idxs),
                               nb_epoch=10,
                               max_q_size=1,
                               verbose=2,
                               callbacks=[pbi],
                               class_weight=None,
                               nb_worker=1)

def generate_training_sequences(self, batch_size, training_save_file, training_sample_idxs):
    """ Generates training sequences from HDF5 file on demand
    """
    while True:
        # generate sequences for training
        training_sample_count = len(training_sample_idxs)
        batches = int(training_sample_count/batch_size)
        remainder_samples = training_sample_count%batch_size
        if remainder_samples:
            batches = batches + 1
        # generate batches of samples
        for idx in xrange(0, batches):
            if idx == batches - 1:
                batch_idxs = training_sample_idxs[idx*batch_size:]
            else:
                batch_idxs = training_sample_idxs[idx*batch_size:idx*batch_size+batch_size]

            print batch_idxs # FOR DEBUG ONLY

            X = training_save_file["X"][batch_idxs]
            Y = training_save_file["Y"][batch_idxs]

            yield (np.array(X), np.array(Y))

def generate_validation_sequences(self, batch_size, training_save_file, validation_sample_idxs):
    while True:
        # generate sequences for validation
        validation_sample_count = len(validation_sample_idxs)
        batches = int(validation_sample_count/batch_size)
        remainder_samples = validation_sample_count%batch_size
        if remainder_samples:
            batches = batches + 1
        # generate batches of samples
        for idx in xrange(0, batches):
            if idx == batches - 1:
                batch_idxs = validation_sample_idxs[idx*batch_size:]
            else:
                batch_idxs = validation_sample_idxs[idx*batch_size:idx*batch_size+batch_size]

            print batch_idxs # FOR DEBUG ONLY

            X = training_save_file["X"][batch_idxs]
            Y = training_save_file["Y"][batch_idxs]

            yield (np.array(X), np.array(Y))

作为参考,这里是正在训练的模型:

def generate_osr_model(self):
    """ Builds the optical speech recognizer model
    """
    print "".join(["\nGenerating OSR model\n",
                   "-"*40])
    with h5py.File(self.training_save_fn, "r") as training_save_file:
        class_count = len(training_save_file.attrs["training_classes"].split(","))
    video = Input(shape=(self.frames_per_sequence,
                         3,
                         self.rows,
                         self.columns))
    cnn_base = VGG16(input_shape=(3,
                                  self.rows, 
                                  self.columns),
                     weights="imagenet",
                     include_top=False)
    cnn_out = GlobalAveragePooling2D()(cnn_base.output)
    cnn = Model(input=cnn_base.input, output=cnn_out)
    cnn.trainable = False
    encoded_frames = TimeDistributed(cnn)(video)
    encoded_vid = LSTM(256)(encoded_frames)
    hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_vid)
    outputs = Dense(output_dim=class_count, activation="softmax")(hidden_layer)
    osr = Model([video], outputs)
    optimizer = Nadam(lr=0.002,
                      beta_1=0.9,
                      beta_2=0.999,
                      epsilon=1e-08,
                      schedule_decay=0.004)
    osr.compile(loss="categorical_crossentropy",
                optimizer=optimizer,
                metrics=["categorical_accuracy"])
    self.osr = osr
    print " * OSR MODEL GENERATED * "

模型总结:

Generating OSR model
----------------------------------------
 * OSR MODEL GENERATED *

*** MODEL SUMMARY ***
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 30, 3, 100, 15 0
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribut (None, 30, 512)       14714688    input_1[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 256)           787456      timedistributed_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1024)          263168      lstm_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 3)             3075        dense_1[0][0]
====================================================================================================
Total params: 15,768,387
Trainable params: 1,053,699
Non-trainable params: 14,714,688

【问题讨论】:

  • 我认为问题可能来自于将TimeDistributed 包装器用于模型而不是分层。这有过效果吗?
  • 你好 Marcin 是的,在我更新 Theano/Keras 之前,它在我的 Mac OS X 机器上工作。它实际上目前也适用于我的 Ubuntu 机器。我不明白为什么它不再适用于我的 Mac OS X 机器了。
  • 你更新numpy了吗?
  • 是的 numpy 目前是最新的
  • 我们可以在编译之前尝试一个简单的健全性检查并打印osr.summary()吗?

标签: python theano keras


【解决方案1】:

问题似乎源于从他们的 github 存储库安装 Theano 和 Keras,如下所示:

pip install git+git://github.com/Theano/Theano.git
pip install git+git://github.com/fchollet/keras.git

我通过卸载 Theano 和 Keras 来修复它,然后直接使用 pip 安装它们:

pip uninstall Theano
pip uninstall keras
pip install Theano
pip install keras

Theano 或 Keras 的前沿版本可能存在问题。希望这对其他人也有帮助。

编辑:看起来问题真的来自 Theano 的 master 分支。关注我在 Theano 存储库上发布的问题,以获得潜在的永久修复 https://github.com/Theano/Theano/issues/5655

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-10-19
    • 2020-03-20
    • 2021-08-26
    • 2021-12-10
    • 2017-06-30
    • 2018-12-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多