【发布时间】:2017-07-14 15:01:09
【问题描述】:
我正在尝试在 Python 中使用 Sklearn 进行 k 折交叉验证,现在已经学习了两个教程,但我的代码无法运行验证。
每次我尝试做
cross_val_score(dt, x, y, cv=5)
我得到错误:
Traceback (most recent call last):
File "C:/Users/djsg38/Documents/CS6001-SpatialTemporal/HW2/main.py", line 573, in <module>
scores = cross_val_score(dt, x, y, cv=5)
File "C:\Python27\lib\site-packages\sklearn\model_selection\_validation.py", line 128, in cross_val_score
X, y, groups = indexable(X, y, groups)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 206, in indexable
check_consistent_length(*result)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 177, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 116, in _num_samples
'estimator %s' % x)
TypeError: Expected sequence or array-like, got estimator Its official US President Barack Obama wants lawmakers weigh \
0 1 4 12 3 2 12 4 4 2
1 0 0 1 0 0 0 0 0 0
2 1 0 4 0 0 0 0 0 0
3 0 0 0 0 0 0 4 0 0
4 0 3 10 0 0 1 0 0 0
5 0 0 0 0 0 0 0 0 0
6 0 0 0 4 1 7 0 0 0
7 3 0 0 0 0 0 0 0 0
8 1 0 4 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
10 0 1 6 3 0 3 0 0 0
11 0 0 0 1 0 0 0 0 0
12 0 2 1 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0
17 0 0 5 4 1 9 1 0 0
18 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
21 0 0 3 2 1 1 0 0 1
22 0 0 0 0 0 0 0 0 0
23 0 0 1 0 0 0 0 0 0
24 1 0 0 0 0 0 0 0 0
25 0 0 0 1 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0
27 0 0 1 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0
29 0 1 0 0 0 0 0 0 0
.. ... ... .. ... ... ... ... ... ...
70 0 0 0 0 0 0 0 0 0
71 0 0 0 2 0 5 0 0 0
72 5 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0
74 0 0 1 0 0 0 0 0 0
75 1 0 1 0 0 0 1 0 0
76 2 0 0 0 0 0 0 0 0
77 1 0 0 0 0 0 0 0 0
78 0 0 0 0 0 0 0 0 0
79 1 0 0 0 0 0 0 0 0
80 0 0 0 0 0 0 0 0 0
81 0 0 1 0 0 0 0 0 0
82 0 0 1 0 0 0 0 0 0
83 0 0 0 0 0 0 0 1 0
84 0 0 2 4 1 3 1 0 0
85 0 0 0 1 0 0 0 0 0
86 0 0 1 0 0 0 0 0 0
87 0 0 0 0 0 0 0 0 0
88 0 0 0 0 0 0 0 0 0
89 0 0 0 0 0 0 0 0 0
90 0 0 0 0 0 0 0 0 0
91 0 0 2 1 0 0 0 0 0
92 0 0 0 0 0 0 0 0 0
93 0 0 0 0 0 0 0 0 0
94 1 0 0 0 0 0 0 0 0
95 0 2 1 0 0 0 0 0 0
96 0 0 0 0 0 0 0 0 0
97 0 0 4 1 0 0 0 0 0
98 0 0 11 1 0 0 0 0 0
99 0 0 0 0 0 0 0 0 0
whether ... Heh heh funny disassociate personWere \
0 4 ... 0 0 0 0 0
1 0 ... 0 0 0 0 0
2 0 ... 0 0 0 0 0
3 0 ... 0 0 0 0 0
4 0 ... 0 0 0 0 0
5 0 ... 0 0 0 0 0
6 2 ... 0 0 0 0 0
7 0 ... 0 0 0 0 0
8 0 ... 0 0 0 0 0
9 0 ... 0 0 0 0 0
10 0 ... 0 0 0 0 0
11 1 ... 0 0 0 0 0
12 0 ... 0 0 0 0 0
13 1 ... 0 0 0 0 0
14 0 ... 0 0 0 0 0
15 1 ... 0 0 0 0 0
16 0 ... 0 0 0 0 0
17 1 ... 0 0 0 0 0
18 0 ... 0 0 0 0 0
19 0 ... 0 0 0 0 0
20 0 ... 0 0 0 0 0
21 8 ... 0 0 0 0 0
22 0 ... 0 0 0 0 0
23 0 ... 0 0 0 0 0
24 0 ... 0 0 0 0 0
25 0 ... 0 0 0 0 0
26 1 ... 0 0 0 0 0
27 0 ... 0 0 0 0 0
28 0 ... 0 0 0 0 0
29 0 ... 0 0 0 0 0
.. ... ... ... ... ... ... ...
70 0 ... 0 0 0 0 0
71 1 ... 0 0 0 0 0
72 0 ... 0 0 0 0 0
73 0 ... 0 0 0 0 0
74 0 ... 0 0 0 0 0
75 0 ... 0 0 0 0 0
77 0 ... 0 0 0 0 0
78 0 ... 0 0 0 0 0
79 1 ... 0 0 0 0 0
80 0 ... 0 0 0 0 0
81 3 ... 0 0 0 0 0
82 0 ... 0 0 0 0 0
83 0 ... 0 0 0 0 0
84 0 ... 0 0 0 0 0
85 0 ... 0 0 0 0 0
86 0 ... 0 0 0 0 0
87 0 ... 0 0 0 0 0
88 0 ... 0 0 0 0 0
89 1 ... 0 0 0 0 0
90 0 ... 0 0 0 0 0
91 0 ... 0 0 0 0 0
92 0 ... 0 0 0 0 0
93 0 ... 0 0 0 0 0
94 1 ... 0 0 0 0 0
95 0 ... 0 0 0 0 0
96 0 ... 0 0 0 0 0
97 0 ... 0 0 0 0 0
98 1 ... 0 0 0 0 0
99 0 ... 1 1 1 1 1
therehighlightAs indepth umpireshighlightThe headhighlightTwo \
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
16 0 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
22 0 0 0 0
23 0 0 0 0
24 0 0 0 0
25 0 0 0 0
26 0 0 0 0
27 0 0 0 0
28 0 0 0 0
29 0 0 0 0
.. ... ... ... ...
70 0 0 0 0
71 0 0 0 0
72 0 0 0 0
73 0 0 0 0
74 0 0 0 0
75 0 0 0 0
76 0 0 0 0
77 0 0 0 0
78 0 0 0 0
79 0 0 0 0
80 0 0 0 0
81 0 0 0 0
82 0 0 0 0
83 0 0 0 0
84 0 0 0 0
85 0 0 0 0
86 0 0 0 0
87 0 0 0 0
88 0 0 0 0
89 0 0 0 0
90 0 0 0 0
91 0 0 0 0
92 0 0 0 0
93 0 0 0 0
94 0 0 0 0
95 0 0 0 0
96 0 0 0 0
97 0 0 0 0
98 0 0 0 0
99 1 1 1 1
disrespect
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
.. ...
70 0
71 0
72 0
73 0
74 0
75 0
76 0
77 0
78 0
79 0
80 0
81 0
82 0
83 0
84 0
85 0
86 0
87 0
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 0
97 0
98 0
99 1
[100 rows x 12993 columns]
这是我的代码:
def encode_target(df, target_column):
df_mod = df.copy()
targets = df_mod[target_column].unique()
map_to_int = {name: n for n, name in enumerate(targets)}
df_mod["Target"] = df_mod[target_column].replace(map_to_int)
return (df_mod, targets)
df = pd.read_csv("C:/Users/djsg38/Documents/CS6001- SpatialTemporal/HW2/finalCounts.csv")
df2, targets = encode_target(df, "MYLABEL")
features = list(df2.columns[:12338])
y = df2["TARGET"]
x = df2[features]
dt = DecisionTreeClassifier()
dt.fit(x, y)
scores = cross_val_score(dt, x, y, cv=5)
我的 DecisionTreeClassifier 似乎工作正常,当我将其输出为图像时,它看起来不错,但这里的问题在于最后一行。
附:我不确定是否有列限制?我遵循的经典示例使用了 Iris 数据集,因此有四列可以查看数据。不过,对我来说,我有 12,338 列数据(100 篇文章中每个唯一单词的字数)。
【问题讨论】:
-
首先,打印错误的所有堆栈跟踪。其次,我无法理解您的代码。
encode_target在做什么?您在dt中将df2[features]作为x传递,但在cross-val_score中以y的形式传递。 -
encode_target 根据 target_column 抓取数据帧的值,对我来说它是 'MYLABEL' - 这是我给它的标签。然后它抓取该列中的所有 UNIQUE 值并将其放入列表中。然后它枚举它们并为它们提供整数值,因为显然分类器只能处理整数。我不确定我能提供多少帮助,就像我在问题中所说的那样,我正在关注用于进行此分类的在线教程。但假设 y 值是“目标”处数据帧的所有特征,即整数映射。 x 值就是其他一切。
-
看到这个功能就明白了。您应该在
cross_val_score中使用与在dt.fit中使用的相同的 X、y -
您知道,阅读您的问题并就此向我提问,我意识到我没有使用相同的东西似乎很奇怪..我今天会尝试!我只是尽可能多地按照教程进行操作,而他们似乎就是这样做的。如果它仍然不起作用,我会更新。到目前为止,谢谢你,Vivek。
-
另外请说明您使用的教程来源
标签: python scikit-learn cross-validation