【发布时间】:2021-03-24 15:14:14
【问题描述】:
我在 HPC 的 SLURM 调度程序上运行一个简单的 python 脚本。 它读入一个数据集(大约 6GB)并绘制和保存部分数据的图像。这些数据文件有好几个,所以我使用循环进行迭代,直到完成每个文件的数据绘制。
但是,由于某种原因,每个循环中的内存使用量都会增加。我已经使用 getsizeof() 映射了我的变量,但它们似乎不会随着迭代而改变。所以我不确定这个内存“泄漏”可能来自哪里。
这是我的脚本:
import os, psutil
import sdf_helper as sh
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
plt.rcParams['figure.figsize'] = [6, 4]
plt.rcParams['figure.dpi'] = 120 # 200 e.g. is really fine, but slower
from sys import getsizeof
for i in range(5,372):
plt.clf()
fig, ax = plt.subplots()
#dd gets data using the epoch specific SDF file reader sh.getdata
dd = sh.getdata(i,'/dfs6/pub/user');
#extract density data as 2D array
den = dd.Derived_Number_Density_electron.data.T;
nmin = np.min(dd.Derived_Number_Density_electron.data[np.nonzero(dd.Derived_Number_Density_electron.data)])
#extract grid points as 2D array
xy = dd.Derived_Number_Density_electron.grid.data
#extract single number time
time = dd.Header.get('time')
#free up memory from dd
dd = None
#plotting
plt.pcolormesh(xy[0], xy[1],np.log10(den), vmin = 20, vmax = 30)
cbar = plt.colorbar()
cbar.set_label('Density in log10($m^{-3}$)')
plt.title("time: %1.3e s \n Min e- density: %1.2e $m^{-3}$" %(time,nmin))
ax.set_facecolor('black')
plt.savefig('D00%i.png'%i, bbox_inches='tight')
print("dd: ", getsizeof(dd))
print("den: ",getsizeof(den))
print("nmin: ",getsizeof(nmin))
print("xy: ",getsizeof(xy))
print("time: ",getsizeof(time))
print("fig: ",getsizeof(fig))
print("ax: ",getsizeof(ax))
process = psutil.Process(os.getpid())
print(process.memory_info().rss)
输出
Reading file /dfs6/pub/user/0005.sdf
dd: 16
den: 112
nmin: 32
xy: 56
time: 24
fig: 48
ax: 48
8991707136
Reading file /dfs6/pub/user0006.sdf
dd: 16
den: 112
nmin: 32
xy: 56
time: 24
fig: 48
ax: 48
13814497280
Reading file /dfs6/pub/user/0007.sdf
dd: 16
den: 112
nmin: 32
xy: 56
time: 24
fig: 48
ax: 48
18648313856
SLURM 输入
#!/bin/bash
#SBATCH -p free
#SBATCH --job-name=epochpyd1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=20000
#SBATCH --mail-type=begin,end
#SBATCH --mail-user=**
module purge
module load python/3.8.0
python3 -u /data/homezvol0/user/CNTDensity.py > density.out
SLURM 输出
/data/homezvol0/user/CNTDensity.py:21: RuntimeWarning: divide by zero encountered in log10
plt.pcolormesh(xy[0], xy[1],np.log10(den), vmin = 20, vmax = 30)
/export/spool/slurm/slurmd.spool/job1910549/slurm_script: line 16: 8004 Killed python3 -u /data/homezvol0/user/CNTDensity.py > density.out
slurmstepd: error: Detected 1 oom-kill event(s) in step 1910549.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
据我所知,一切似乎都在工作。不确定什么会占用超过 20GB 的内存。
编辑 所以我开始从下往上注释掉循环的各个部分。现在很明显 pcolormesh 是罪魁祸首。
我已添加 (Closing pyplot windows):
fig.clear()
plt.clf()
plt.close('all')
fig = None
ax = None
del fig
del ax
到最后,无论如何记忆都在不断攀升。我对正在发生的事情完全不知所措。
【问题讨论】:
标签: python memory-leaks out-of-memory slurm