【发布时间】:2016-10-04 14:13:33
【问题描述】:
我正在使用 Python 2.7、sklearn 0.17.1、numpy 1.11.0 测试一个简单的预测程序。我从 LDA 模型中得到了带有概率的矩阵,现在我想创建 RandomForestClassifier 来通过概率预测结果。 我的代码是:
maxlen = 40
props = []
for doc in corpus:
topics = model.get_document_topics(doc)
tprops = [0] * maxlen
for topic in topics:
tprops[topics[0]] = topics[1]
props.append(tprops)
ntheta = np.array(props)
ny = np.array(y)
clf = RandomForestClassifier(n_estimators=100)
accuracy = cross_val_score(clf, ntheta, ny, scoring = 'accuracy')
print accuracy
ValueError Traceback (most recent call last)
<ipython-input-65-a7d276df43e9> in <module>()
1 # clf.fit(nteta, ny)
2 print nteta.shape, ny.shape
----> 3 accuracy = cross_val_score(clf, nteta, ny, scoring = 'accuracy')
4 print accuracy
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
1431 train, test, verbose, None,
1432 fit_params)
-> 1433 for train, test in cv)
1434 return np.array(scores)[:, 0]
1435
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
798 # was dispatched. In particular this covers the edge
799 # case of Parallel used with an exhausted iterator.
--> 800 while self.dispatch_one_batch(iterator):
801 self._iterating = True
802 else:
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
656 return False
657 else:
--> 658 self._dispatch(tasks)
659 return True
660
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
564
565 if self._pool is None:
--> 566 job = ImmediateComputeBatch(batch)
567 self._jobs.append(job)
568 self.n_dispatched_batches += 1
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, batch)
178 # Don't delay the application, to avoid keeping the input
179 # arguments in memory
--> 180 self.results = batch()
181
182 def get(self):
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1529 estimator.fit(X_train, **fit_params)
1530 else:
-> 1531 estimator.fit(X_train, y_train, **fit_params)
1532
1533 except Exception as e:
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.pyc in fit(self, X, y, sample_weight)
210 """
211 # Validate or convert input data
--> 212 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
213 if issparse(X):
214 # Pre-sort indices to avoid that each individual tree of the
/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
405 " minimum of %d is required%s."
406 % (n_samples, shape_repr, ensure_min_samples,
--> 407 context))
408
409 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required.
UPD 因为我得到了 2 减?让批评者具有建设性。
更新
cotique发现y填写不正确(必须是其他类)。如果 y 填写正确,则问题不会发生。在我的案例中,类是错误的,它们的计数是 39774。但理论上这不是一个答案,为什么当我们有 39774 个类并且必须预测它们时会发生错误。
【问题讨论】:
-
print ntheta.size, ny.size。赌其中一个是空的。 -
结果 1590960 39774 。没错。
标签: python numpy scikit-learn