【问题标题】:TypeError: 'WordListCorpusReader' object has no attribute '__getitem__' while using nltk.classify.apply_featuresTypeError:“WordListCorpusReader”对象在使用 nltk.classify.apply_features 时没有属性“__getitem__”
【发布时间】:2015-04-12 17:12:15
【问题描述】:

我在this site 上关注本教程来学习 NaiveBayes。我的代码是:

from nltk.corpus import names
from nltk.classify import apply_features

def gender_features(word):
  return {'last_letter': word[-1]}

labeled_names = ([(name, 'male') for name in names.words('male.txt')] +
[(name, 'female') for name in names.words('female.txt')])

feature_sets = [(gender_features(n), gender) for (n, gender) in labeled_names]

#train_set, test_set = feature_sets[500:], feature_sets[:500]
train_set = apply_features(gender_features, names[500:])
test_set = apply_features(gender_features, names[:500])

classifier = NaiveBayesClassifier.train(train_set)

print classifier.classify(gender_features('Neo'))

使用不带 apply_features 的 train_set 可以正常工作。有人知道我该如何解决吗?谢谢。

【问题讨论】:

    标签: python machine-learning nlp classification nltk


    【解决方案1】:

    首先,我认为http://www.nltk.org/book/ch06.html的教程有错别字

    词表语料库不能像列表一样访问。

    >>> from nltk.corpus import names
    >>> names[:5]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'LazyCorpusLoader' object has no attribute '__getitem__'
    >>> names.words()[:5]
    [u'Abagael', u'Abagail', u'Abbe', u'Abbey', u'Abbi']
    

    接下来看看apply_features 的作用 (https://github.com/nltk/nltk/blob/develop/nltk/classify/util.py#L28)。

    基本上,给定[('input_1', 'label_1'), ...('input_N', 'label_N')] 的元组列表,它返回[(feature_func(tok), label) for (tok, label) in toks]。例如

    # To get the input list of tuples for apply_features, we do this:
    >>> [(word,'female') for word in names.words('female.txt')[:10]]
    [(u'Abagael', 'female'), (u'Abagail', 'female'), (u'Abbe', 'female'), (u'Abbey', 'female'), (u'Abbi', 'female'), (u'Abbie', 'female'), (u'Abby', 'female'), (u'Abigael', 'female'), (u'Abigail', 'female'), (u'Abigale', 'female')]
    
    # Let's get 250 from female and 250 from male names.
    >>> train_female = [(word,'female') for word in names.words('female.txt')[:250]] 
    >>> train_male = [(word,'male') for word in names.words('male.txt')[:250]]
    >>> train_data = train_female + train_male
    >>> apply_features(gender_features, train_data)
    [({'last_letter': u'l'}, 'female'), ({'last_letter': u'l'}, 'female'), ...]
    

    让 Naivebayes 在 NLTK 中为名称语料库工作的完整代码:

    from nltk.corpus import names
    from nltk.classify import apply_features, NaiveBayesClassifier
    
    def gender_features(word):
        return {'last_letter': word[-1]}
    
    
    train_female = [(word,'female') for word in names.words('female.txt')[:250]] 
    train_male = [(word,'male') for word in names.words('male.txt')[:250]]
    train_data = train_female + train_male
    train_set = apply_features(gender_features, train_data)
    
    # Do like wise for the test set.
    '''
    test_female = [(word,'female') for word in names.words('female.txt')[250:]]
    test_male = [(word,'male') for word in names.words('male.txt')[250:]] 
    test_data = test_female + test_male
    test_set = apply_features(gender_features, test_data)
    '''
    
    classifier = NaiveBayesClassifier.train(train_set)
    print classifier.classify(gender_features('Neo'))
    

    [出]:

    'male'
    

    【讨论】:

    • 谢谢。你的解释很有帮助。我意识到错误是由于names[500:]。应该是labeled_names[500:]
    猜你喜欢
    • 2012-12-04
    • 2012-10-15
    • 2014-01-16
    • 2017-02-27
    • 2018-01-28
    • 2014-06-10
    • 2012-10-16
    • 2015-11-04
    • 1970-01-01
    相关资源
    最近更新 更多