【发布时间】:2017-05-08 14:52:23
【问题描述】:
我正在使用 pandas、numpy 和 sklearn 设计一个基本的垃圾邮件分类器程序 (python 3),但我收到此错误并且无法确定在哪里。我试图查看不同变量的数据类型,但没有找到位置。 (火腿 = 不是垃圾邮件)。输入文件与此错误无关,因为它与 python 2.7 一起使用 它的包/模块兼容性或数据类型转换错误。
import os
import io
import numpy
from pandas import DataFrame
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
def readFiles(path):
for root, dirnames, filenames in os.walk(path):
for filename in filenames:
path = os.path.join(root, filename)
inBody = False
lines = []
f = io.open(path, 'r', encoding='latin1')
for line in f:
if inBody:
lines.append(line)
elif line == '\n':
inBody = True
f.close()
message = '\n'.join(lines)
yield path, message
def dataFrameFromDirectory(path, classification):
rows = []
index = []
for filename, message in readFiles(path):
rows.append({'message': message, 'class': classification})
index.append(filename)
return DataFrame(rows, index=index)
data = DataFrame({'message': [], 'class': []})
data = data.append(dataFrameFromDirectory('D:/emails/spam', 'spam'))
data = data.append(dataFrameFromDirectory('D:/emails/ham', 'ham'))
来自 ipython NoteBook 的堆栈跟踪:
TypeError Traceback (most recent call last)
<ipython-input-5-555887356cc2> in <module>()
3 import numpy
4 from pandas import DataFrame
----> 5 from sklearn.feature_extraction.text import CountVectorizer
6 from sklearn.naive_bayes import MultinomialNB
7
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\__init__.py in <module>()
55 else:
56 from . import __check_build
---> 57 from .base import clone
58 __check_build # avoid flakes unused variable error
59
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\base.py in <module>()
10 from scipy import sparse
11 from .externals import six
---> 12 from .utils.fixes import signature
13 from .utils.deprecation import deprecated
14 from .exceptions import ChangedBehaviorWarning as _ChangedBehaviorWarning
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\utils\__init__.py in <module>()
9
10 from .murmurhash import murmurhash3_32
---> 11 from .validation import (as_float_array,
12 assert_all_finite,
13 check_random_state, column_or_1d, check_array,
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\utils\validation.py in <module>()
16
17 from ..externals import six
---> 18 from ..utils.fixes import signature
19 from .deprecation import deprecated
20 from ..exceptions import DataConversionWarning as _DataConversionWarning
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\utils\fixes.py in <module>()
404
405
--> 406 if np_version < (1, 12, 0):
407 class MaskedArray(np.ma.MaskedArray):
408 # Before numpy 1.12, np.ma.MaskedArray object is not picklable
TypeError: unorderable types: str() < int()
【问题讨论】:
-
你应该发布你得到的完整错误。
-
发布堆栈跟踪。
-
尝试设计一个minimal reproducible example。我们也没有您的数据文件,因此我们无法运行您的代码来重现您的错误。
-
如果您是 python 新手,您应该在使用高级框架(如 sklearn)之前学习该语言的基础知识。这可能有助于调试错误。
-
我怀疑包版本不匹配。某些东西,可能是
numpy或scipy比sklearn和/或pandas预期的要旧。错误在sklearn导入中,而不是在您自己的代码中。
标签: python-3.x pandas numpy