【发布时间】:2016-05-18 03:48:39
【问题描述】:
我正在尝试标记一些文档,但出现此错误
UnicodeDecodeError: 'ascii' 编解码器无法在位置解码字节 0xef 6:序数不在范围内(128)
import nltk
import pandas as pd
df = pd.DataFrame(pd.read_csv('status2.csv'))
documents = df['status']
result = [nltk.word_tokenize(sent) for sent in documents]
我认为是 unicode 问题所以我添加了
documents = unicode(documents, 'utf-8')
另一个错误
TypeError: coercing to Unicode: need string or buffer, Series found
print documents
1 Brandon Cachia ,All I know is that,you're so n...
2 Melissa Zejtunija:HAM AND CHEESE BIEX INI??? *...
3 .........Where is my mind?????
4 Having a philosophical discussion with Trudy D...
【问题讨论】: