ValueError：所需数组的深度太小，答案

【问题标题】：ValueError: object of too small depth for desired array,ValueError：所需数组的深度太小，
【发布时间】：2020-05-03 07:53:02
【问题描述】：

当我昨天运行下面的代码时，它正在工作。但是当我今天运行这段代码时，我得到了这个错误。我认为这个问题源于修改我的数据，但是当我尝试使用旧数据时，它仍然给出同样的错误。（我不确定，它是否与数据的形状有关，但我想展示它。）有人可以帮我吗？

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

print("Shape of x_train :", x_train.shape)
print("Shape of x_test :", x_test.shape)
print("Shape of y_train :", y_train.shape)
print("Shape of y_test :", y_test.shape)

Shape of x_train : (257763, 96)
Shape of x_test : (64441, 96)
Shape of y_train : (257763,)
Shape of y_test : (64441,)

from imblearn.ensemble import BalancedRandomForestClassifier 


model = BalancedRandomForestClassifier(n_estimators = 200, random_state = 0, max_depth=6)
model.fit(x_train, y_train)

以下是完全错误；

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-7698c432c37d> in <module>
  7 
  8 model = BalancedRandomForestClassifier(n_estimators = 200, random_state = 
0, max_depth=6)
----> 9 model.fit(x_train, y_train)
 10 y_pred_rf = model.predict(x_test)
 11 

  /opt/anaconda/envs/env_python/lib/python3.6/site- 
  packages/imblearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
  433                         s, t, self, X, y, sample_weight, i, 
  len(trees),
  434                         verbose=self.verbose, 
  class_weight=self.class_weight)
  --> 435                     for i, (s, t) in enumerate(zip(samplers, 
  trees)))
  436             samplers, trees = zip(*samplers_trees)
  437 

  /opt/anaconda/envs/env_python/lib/python3.6/site- 
  packages/joblib/parallel.py 
  in __call__(self, iterable)
  919             # remaining jobs.
  920             self._iterating = False
  --> 921             if self.dispatch_one_batch(iterator):
  922                 self._iterating = self._original_iterator is not None
  923 

  /opt/anaconda/envs/env_python/lib/python3.6/site- 
  packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
   757                 return False
   758             else:
   --> 759                 self._dispatch(tasks)
   760                 return True
   761 

   /opt/anaconda/envs/env_python/lib/python3.6/site- 
   packages/joblib/parallel.py in _dispatch(self, batch)
   714         with self._lock:
   715             job_idx = len(self._jobs)
   --> 716             job = self._backend.apply_async(batch, callback=cb)
   717             # A job can complete so quickly than its callback is
   718             # called before we get here, causing self._jobs to

   /opt/anaconda/envs/env_python/lib/python3.6/site- 
   packages/joblib/_parallel_backends.py in apply_async(self, func, 
   callback)
   180     def apply_async(self, func, callback=None):
   181         """Schedule a func to be run"""
   --> 182         result = ImmediateResult(func)
   183         if callback:
   184             callback(result)

   /opt/anaconda/envs/env_python/lib/python3.6/site- 
   packages/joblib/_parallel_backends.py in __init__(self, batch)
   547         # Don't delay the application, to avoid keeping the input
   548         # arguments in memory
   --> 549         self.results = batch()
   550 
   551     def get(self):

   /opt/anaconda/envs/env_python/lib/python3.6/site- 
   packages/joblib/parallel.py in __call__(self)
   223         with parallel_backend(self._backend, n_jobs=self._n_jobs):
   224             return [func(*args, **kwargs)
   --> 225                     for func, args, kwargs in self.items]
   226 
   227     def __len__(self):

   /opt/anaconda/envs/env_python/lib/python3.6/site- 
   packages/joblib/parallel.py in <listcomp>(.0)
   223         with parallel_backend(self._backend, n_jobs=self._n_jobs):
   224             return [func(*args, **kwargs)
   --> 225                     for func, args, kwargs in self.items]
   226 
   227     def __len__(self):

   /opt/anaconda/envs/env_python/lib/python3.6/site- 
   packages/imblearn/ensemble/_forest.py in 
   _local_parallel_build_trees(sampler, tree, forest, X, y, sample_weight, 
   tree_idx, n_trees, verbose, class_weight)
   43     tree = _parallel_build_trees(tree, forest, X_resampled, 
   y_resampled,
   44                                  sample_weight, tree_idx, n_trees,
   ---> 45                                  verbose=verbose, 
  class_weight=class_weight)
  46     return sampler, tree
  47 

  /opt/anaconda/envs/env_python/lib/python3.6/site- 
  packages/sklearn/ensemble/_forest.py in _parallel_build_trees(tree, 
  forest, X, y, sample_weight, tree_idx, n_trees, verbose, class_weight, 
  n_samples_bootstrap)
  153         indices = _generate_sample_indices(tree.random_state, 
  n_samples,
  154                                            n_samples_bootstrap)
  --> 155         sample_counts = np.bincount(indices, minlength=n_samples)
  156         curr_sample_weight *= sample_counts
  157 

  <__array_function__ internals> in bincount(*args, **kwargs)

  ValueError: object of too small depth for desired array

【问题讨论】：

你得到的错误是什么？
添加了我的完整错误文本
代码对我来说看起来不错。 Value error 表示它收到了一个无法执行所需任务的值。您的 x 或 y 似乎已损坏。您应该检查输入数据。当您尝试不同的算法（例如 sklearn 的 RandomForest）时，也会出现同样的错误。
确保尺寸和数据类型与函数的文档相匹配。
@MertTürkyılmaz 你能找到解决方案吗？在我的情况下，不同的算法不会出错。我只收到 BalanceRandomForest 的错误

标签： pandas numpy scikit-learn model numpy-ndarray

【解决方案1】：

根据回溯，bincount 引发了错误。这再现了它：

In [13]: np.bincount(0)                                                                          
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-65825aeaf27a> in <module>
----> 1 np.bincount(0)

<__array_function__ internals> in bincount(*args, **kwargs)

ValueError: object of too small depth for desired array
In [14]: np.bincount(np.arange(5))                                                               
Out[14]: array([1, 1, 1, 1, 1])

bincount 适用于一维数组；如果给定标量，它会引发此错误。

现在回到traceback，找出代码中的哪个变量是标量，而它应该是一个数组。

【讨论】：

【解决方案2】：

一个小技巧是在 Jupyter notebook 中安装最新版本的 python（对我来说安装 3.7.4 有效）。对于旧版本的python，错误仍然存在。

我也有同样的问题。我在我的电脑上安装了 Jupyter notebook，我的笔记本上的 python 版本是 3.7.4。 BalancedRandomForestClassifier 工作得很好。但是，当我尝试在旧版本上运行它时说 python 3.6。我遇到了上面提到的同样的故障。

我创建的特征（BoW）也是一个二维数组。

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

Jupyter notebook on my machine

Jupyter notebook on my Google Colab

【讨论】：