为什么 CUDA 内存没有通过 torch.cuda.empty_cache() 释放答案

【问题标题】：Why the CUDA memory is not release with torch.cuda.empty_cache()为什么 CUDA 内存没有通过 torch.cuda.empty_cache() 释放
【发布时间】：2020-09-08 05:06:49
【问题描述】：

在我的 Windows 10 上，如果我直接创建 GPU 张量，我可以成功释放它的内存。

import torch
a = torch.zeros(300000000, dtype=torch.int8, device='cuda')
del a
torch.cuda.empty_cache()

但如果我创建一个普通张量并将其转换为 GPU 张量，我将无法再释放它的内存。

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a.cuda()
del a
torch.cuda.empty_cache()

为什么会这样。

【问题讨论】：

标签： pytorch

【解决方案1】：

至少在 Ubuntu 中，您的脚本在交互式 shell 中运行时不会释放内存，并且在作为脚本运行时按预期工作。我认为就地通话中存在一些参考问题。以下内容将在交互式 shell 和脚本中工作。

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a = a.cuda()
del a
torch.cuda.empty_cache()

【讨论】：

【解决方案2】：

是的，这也发生在我的电脑上，配置如下：

20.04.1-Ubuntu
1.7.1+cu110

根据fastai讨论的信息：https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8

这和ipython环境下的python垃圾收集器有关。

def pretty_size(size):
    """Pretty prints a torch.Size object"""
    assert(isinstance(size, torch.Size))
    return " × ".join(map(str, size))

def dump_tensors(gpu_only=True):
    """Prints a list of the Tensors being tracked by the garbage collector."""
    import gc
    total_size = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if not gpu_only or obj.is_cuda:
                    print("%s:%s%s %s" % (type(obj).__name__, 
                                          " GPU" if obj.is_cuda else "",
                                          " pinned" if obj.is_pinned else "",
                                          pretty_size(obj.size())))
                    total_size += obj.numel()
            elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                if not gpu_only or obj.is_cuda:
                    print("%s → %s:%s%s%s%s %s" % (type(obj).__name__, 
                                                   type(obj.data).__name__, 
                                                   " GPU" if obj.is_cuda else "",
                                                   " pinned" if obj.data.is_pinned else "",
                                                   " grad" if obj.requires_grad else "", 
                                                   " volatile" if obj.volatile else "",
                                                   pretty_size(obj.data.size())))
                    total_size += obj.data.numel()
        except Exception as e:
            pass        
    print("Total size:", total_size)

如果我做类似的事情

import torch as th
a = th.randn(10, 1000, 1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()

您不会看到 nvidia-smi/nvtop 有任何减少。但是您可以使用方便的功能找出正在发生的事情

dump_tensors()

您可能会观察到以下信息：

Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000

这意味着你的 gc 仍然持有资源。

python gc机制可以参考更多讨论。

Force garbage collection in Python to free memory

【讨论】：

【解决方案3】：

我遇到了同样的问题。解决方案：

cuda = torch.device('cuda')
a.to(cuda)

【讨论】：

【解决方案4】：

你不应该使用torch.cuda.empty_cache()，因为它会减慢你的代码而没有任何收益https://discuss.pytorch.org/t/what-is-torch-cuda-empty-cache-do-and-where-should-i-add-it/40975

【讨论】：