Jupyter Notebook 中“to_sql”命令中的 MemoryError答案

【问题标题】：MemoryError in "to_sql" command in Jupyter NotebookJupyter Notebook 中“to_sql”命令中的 MemoryError
【发布时间】：2019-11-30 05:17:34
【问题描述】：

我正在 AWS Sage Maker 上开发一个 jupyter 笔记本。我已经对 5000 行的数据进行了文本处理。我想用下面的代码把它写到另一个 SQL 查询中。

conn=sqlite3.connect('final_2.sqlite')
c=conn.cursor()
conn.text_factory=str
final.to_sql('Reviews',conn,schema=None,if_exists='replace')

它节省了 2.09 GB 并停止。当我打开这是文件时，它不被视为文件。然后我尝试写入一个 .csv 文件，但仍然是同样的问题。当我下载并打开 csv 文件时，出现以下错误。

Jupyter Notebook
current mode
File
Edit
View
Language
1
Error! Traceback (most recent call last):
2
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/tornado/web.py", line 1699, in _execute
3
    result = await result
4
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
5
    yielded = next(result)
6
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/handlers.py", line 112, in get
7
    path=path, type=type, format=format, content=content,
8
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/filemanager.py", line 438, in get
9
    model = self._file_model(path, content=content, format=format)
10
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/filemanager.py", line 365, in _file_model
11
    content, format = self._read_file(os_path, format)
12
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/fileio.py", line 309, in _read_file
13
    bcontent = f.read()
14
MemoryError
15

16
Saving disabled.
17
See Console for more details.

我尝试在 python 中检查我的可用空间，但仍有大约 30 GB 可用空间。

谁能告诉我这种情况有什么问题。谢谢！

【问题讨论】：

可能是 RAM 内存问题，而不是磁盘空间问题。
查看我的解决方案。很有可能解决你的问题

标签： python amazon-web-services amazon-ec2 jupyter-notebook amazon-sagemaker

【解决方案1】：

这个确切的问题发生在我身上。我通过增加 RAM 大小解决了这个问题。

出现问题是因为to_sql 命令试图将整个数据帧转换为 SQL 代码。在某一时刻，它会耗尽内存。

解决它以像这样批量加载数据：

batch_size = 10000
for i in range(0,range(len(final)),batch_size):
    final[i,i+batch_size].to_sql('Reviews',con=conn,schema=None,if_exists='append')

【讨论】：