【发布时间】:2019-02-07 12:31:58
【问题描述】:
MySQL 5.7.18
Python 2.7.5
熊猫 0.17.1
CentOS 7.3
一个 MySQL 表:
CREATE TABLE test (
id varchar(12)
) ENGINE=InnoDB;
大小为 10GB。
select round(((data_length) / 1024 / 1024 / 1024)) "GB"
from information_schema.tables
where table_name = "test"
10GB
盒子有 250GB 内存:
$ free -hm
total used free shared buff/cache available
Mem: 251G 15G 214G 2.3G 21G 232G
Swap: 2.0G 1.2G 839M
选择数据:
import psutil
print '1 ' + str(psutil.phymem_usage())
import os
import sys
import time
import pyodbc
import mysql.connector
import pandas as pd
from datetime import date
import gc
print '2 ' + str(psutil.phymem_usage())
db = mysql.connector.connect({snip})
c = db.cursor()
print '3 ' + str(psutil.phymem_usage())
c.execute("select id from test")
print '4 ' + str(psutil.phymem_usage())
e=c.fetchall()
print 'getsizeof: ' + str(sys.getsizeof(e))
print '5 ' + str(psutil.phymem_usage())
d=pd.DataFrame(e)
print d.info()
print '6 ' + str(psutil.phymem_usage())
c.close()
print '7 ' + str(psutil.phymem_usage())
db.close()
print '8 ' + str(psutil.phymem_usage())
del c, db, e
print '9 ' + str(psutil.phymem_usage())
gc.collect()
print '10 ' + str(psutil.phymem_usage())
time.sleep(60)
print '11 ' + str(psutil.phymem_usage())
输出:
1 svmem(total=270194331648L, available=249765777408L, percent=7.6, used=39435464704L, free=230758866944L, active=20528222208, inactive=13648789504, buffers=345387008L, cached=18661523456)
2 svmem(total=270194331648L, available=249729019904L, percent=7.6, used=39472222208L, free=230722109440L, active=20563484672, inactive=13648793600, buffers=345387008L, cached=18661523456)
3 svmem(total=270194331648L, available=249729019904L, percent=7.6, used=39472222208L, free=230722109440L, active=20563484672, inactive=13648793600, buffers=345387008L, cached=18661523456)
4 svmem(total=270194331648L, available=249729019904L, percent=7.6, used=39472222208L, free=230722109440L, active=20563484672, inactive=13648793600, buffers=345387008L, cached=18661523456)
getsizeof: 1960771816
5 svmem(total=270194331648L, available=181568315392L, percent=32.8, used=107641655296L, free=162552676352L, active=88588271616, inactive=13656334336, buffers=345395200L, cached=18670243840)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 231246823 entries, 0 to 231246822
Data columns (total 1 columns):
0 object
dtypes: object(1)
memory usage: 3.4+ GB
None
6 svmem(total=270194331648L, available=181571620864L, percent=32.8, used=107638353920L, free=162555977728L, active=88587603968, inactive=13656334336, buffers=345395200L, cached=18670247936)
7 svmem(total=270194331648L, available=181571620864L, percent=32.8, used=107638353920L, free=162555977728L, active=88587603968, inactive=13656334336, buffers=345395200L, cached=18670247936)
8 svmem(total=270194331648L, available=181571620864L, percent=32.8, used=107638353920L, free=162555977728L, active=88587603968, inactive=13656334336, buffers=345395200L, cached=18670247936)
9 svmem(total=270194331648L, available=183428308992L, percent=32.1, used=105781678080L, free=164412653568L, active=86735921152, inactive=13656334336, buffers=345395200L, cached=18670260224)
10 svmem(total=270194331648L, available=183428308992L, percent=32.1, used=105781678080L, free=164412653568L, active=86735921152, inactive=13656334336, buffers=345395200L, cached=18670260224)
11 svmem(total=270194331648L, available=183427203072L, percent=32.1, used=105782812672L, free=164411518976L, active=86736560128, inactive=13656330240, buffers=345395200L, cached=18670288896)
我什至删除了数据库连接并调用了垃圾回收。
一个 10GB 的表怎么会占用我 60GB 的内存?
【问题讨论】:
-
如果将
DataFrame创建和c.fetchall()分开并在两者之间打印内存使用情况会怎样? -
我按照你说的把这两个项目分开了,并替换了上面问题中的代码和输出。似乎所有的内存都在 fetchall() 处立即丢失。
-
您的 fetchall 正在返回 2.31 亿行,这似乎是个坏主意。您可以将获取大小设置为更合理的值。另外,MySQL 连接器是否有上下文处理程序?
-
你能在
e元组列表的元素上做一个getsizeof吗?而且也没有元组的每个元素?
标签: python memory-management memory-leaks mysql-connector-python