【发布时间】:2020-06-25 12:44:35
【问题描述】:
我正在尝试在 g4dn.xlarge GPU ec2 机器上训练 cGAN,但每次经过 8 个 epoch 后它都会崩溃,并显示以下消息:
Traceback (most recent call last):
File "pix2pix_tf2.py", line 841, in <module>
main()
File "pix2pix_tf2.py", line 802, in main
results = sess.run(fetches, options=options, run_metadata=run_metadata)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: 2 root error(s) found.
(0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
(1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
[[{{node TensorArrayV2Write/TensorListSetItem}}]]
(1) Invalid argument: 2 root error(s) found.
(0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
(1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
[[{{node TensorArrayV2Write/TensorListSetItem}}]]
[[Func/encode_images/target_pngs/while/body/_47/input/_154/_773]]
0 successful operations.
0 derived errors ignored.
环境规范: 张量流 2.2.0 CUDA V10.0.130 cudnn 7.6.5
【问题讨论】:
标签: tensorflow gpu cudnn