【问题标题】:Kernel stops working after a while, when fitting the model拟合模型时,内核在一段时间后停止工作
【发布时间】:2021-10-08 22:59:32
【问题描述】:

我正在尝试运行 TensorFlow 为 image classification 提供的代码。我使用的是 TensorFlow 提供的完全相同的代码,所以我不在这里分享。代码完美运行到适合模型的程度。它打印一次“Epoch”,然后内核关闭并显示“启动内核时发生错误”。作为它产生的错误消息:

2021???????? 21:19:59.749095: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021???????? 21:20:02.178383: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021???????? 21:20:02.198734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.62GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021???????? 21:20:02.198906: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021???????? 21:20:02.204104: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021???????? 21:20:02.204165: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021???????? 21:20:02.207305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021???????? 21:20:02.208428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021???????? 21:20:02.213539: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021???????? 21:20:02.215481: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021???????? 21:20:02.216199: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021???????? 21:20:02.216287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021???????? 21:20:02.216750: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance‑critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021???????? 21:20:02.217490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.62GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021???????? 21:20:02.217546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021???????? 21:20:02.708850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021???????? 21:20:02.708874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 
2021???????? 21:20:02.708880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 
2021???????? 21:20:02.709035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5484 MB memory) ‑> physical GPU (device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021???????? 21:20:04.004652: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021???????? 21:20:05.150123: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll```

【问题讨论】:

    标签: python tensorflow spyder


    【解决方案1】:

    我在 colab 中复制了相同的给定代码。它运行成功,没有任何错误。请在此处找到相关代码gist

    但是,它们只是信息消息,因为它们以I为前缀,如果是错误消息,它们将以EW为前缀作为警告,如下所示:

    2020-12-30 21:30:27.549172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll
    
    2020-12-30 21:30:27.599977: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory.
    
    2021-12-30 21:30:27.704083: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
    

    您可以使用以下代码超越这些警告:

    import os
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2010-11-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-12-20
      • 1970-01-01
      相关资源
      最近更新 更多