【发布时间】:2019-04-30 18:29:16
【问题描述】:
我正在尝试对具有 5 个属性和 1 个类的 2,79,900 个实例运行 sklearn 随机森林分类。但是我在尝试在拟合线上运行分类时遇到内存分配错误,它无法训练分类器本身。有关如何解决此问题的任何建议?
数据a是
x、y、日、周、准确度
x 和 y 是坐标 day 是一个月中的哪一天 (1-30) 星期是一周中的哪一天 (1-7) 准确率是一个整数
代码:
import csv
import numpy as np
from sklearn.ensemble import RandomForestClassifier
with open("time_data.csv", "rb") as infile:
re1 = csv.reader(infile)
result=[]
##next(reader, None)
##for row in reader:
for row in re1:
result.append(row[8])
trainclass = result[:251900]
testclass = result[251901:279953]
with open("time_data.csv", "rb") as infile:
re = csv.reader(infile)
coords = [(float(d[1]), float(d[2]), float(d[3]), float(d[4]), float(d[5])) for d in re if len(d) > 0]
train = coords[:251900]
test = coords[251901:279953]
print "Done splitting data into test and train data"
clf = RandomForestClassifier(n_estimators=500,max_features="log2", min_samples_split=3, min_samples_leaf=2)
clf.fit(train,trainclass)
print "Done training"
score = clf.score(test,testclass)
print "Done Testing"
print score
错误:
line 366, in fit
builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
File "sklearn/tree/_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn/tree/_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn/tree/_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
File "sklearn/tree/_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
File "sklearn/tree/_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 10206838784 bytes
【问题讨论】:
标签: python scikit-learn random-forest