【问题标题】:tensorflow 2 on g4dn.xlarge GPU crashes after 8 epochsg4dn.xlarge GPU 上的 tensorflow 2 在 8 个 epoch 后崩溃
【发布时间】:2020-06-25 12:44:35
【问题描述】:

我正在尝试在 g4dn.xlarge GPU ec2 机器上训练 cGAN,但每次经过 8 个 epoch 后它都会崩溃,并显示以下消息:

Traceback (most recent call last):
  File "pix2pix_tf2.py", line 841, in <module>
    main()
  File "pix2pix_tf2.py", line 802, in main
    results = sess.run(fetches, options=options, run_metadata=run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run
    run_metadata_ptr)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: 2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
     [[{{node TensorArrayV2Write/TensorListSetItem}}]]
  (1) Invalid argument: 2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
     [[{{node TensorArrayV2Write/TensorListSetItem}}]]
     [[Func/encode_images/target_pngs/while/body/_47/input/_154/_773]]
0 successful operations.
0 derived errors ignored.

环境规范: 张量流 2.2.0 CUDA V10.0.130 cudnn 7.6.5

【问题讨论】:

    标签: tensorflow gpu cudnn


    【解决方案1】:

    将 CUDA 更新到 10.1 解决了这个问题

    【讨论】:

      猜你喜欢
      • 2022-08-21
      • 1970-01-01
      • 1970-01-01
      • 2021-06-08
      • 2021-07-07
      • 2018-04-08
      • 1970-01-01
      • 1970-01-01
      • 2018-03-25
      相关资源
      最近更新 更多