如何修复 jupyter notebook 中的内存错误

【问题标题】：How can I fix a memory error in jupyter notebook如何修复 jupyter notebook 中的内存错误
【发布时间】：2020-05-05 11:00:07
【问题描述】：

我正在 jupyter notebook 中做一个 NLP 项目，其数据集涉及 160000 行。在运行给定的代码时，我遇到了内存错误。

messages = list(zip(processed, Y))

# defined a seed for reproducibility
seed = 1
np.random.seed = seed
np.random.shuffle(messages)

# calling find_features function for each comments
featuresets = [(find_features(text), label) for (text, label) in messages]

显示的错误是 -

<ipython-input-18-faca481e94f7> in find_features(message)
      3     features = {}
      4     for word in word_features:
----> 5         features[word] = (word in words)
      6 
      7     return features

MemoryError:

有什么办法可以解决这个问题。我正在运行 Windows 64bit 4gb RAM core i5 8th Gen 笔记本电脑。

【问题讨论】：

标签： python machine-learning deep-learning nlp jupyter-notebook

【解决方案1】：

不确定它是否会完全解决您的问题，但您似乎创建了一个带有布尔值的字典，该字典将单词搜索结果存储在列表/集合/任何内容中。

如果列表中只有几个单词，它仍然会创建一个包含大量 False 值的巨大字典，而您只需要 True 值（除非您需要知道哪些值已经过测试）

我会替换：

features = {}
for word in word_features:
   features[word] = (word in words)

与

features = set()
for word in word_features:
    if word in words:
        features.add(word)

或集合理解：

features = {word for word in word_features if word in words}

现在要测试word 是否存在于features 中，只需执行if word in features:

创建一个只包含匹配单词的set 会消除测试为False 的条目，它还会消除值，只保留单词所属的键。

【讨论】：

见鬼，我什至不知道存在 set 理解......！ :)