【问题标题】:Error in Saving NLTK HMM保存 NLTK HMM 时出错
【发布时间】:2016-06-23 07:10:57
【问题描述】:

我试图用 Pickle 来保存 NLTK 的 HMM Tagger,如下所示。但它给我的错误如下, 请给我一个解决方案。

>>> import nltk
>>> import pickle
>>> brown_a = nltk.corpus.brown.tagged_sents()[:300]
>>> hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
>>> sent = nltk.corpus.brown.sents()[400]
>>> hmm_tagger.tag(sent)
[(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
>>> f = open('my_tagger.pickle', 'wb')
>>> pickle.dump(hmm_tagger, f)

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    pickle.dump(hmm_tagger, f)
  File "C:\Python27\lib\pickle.py", line 1376, in dump
    Pickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 754, in save_global
    (obj, module, name))
PicklingError: Can't pickle <function estimator at 0x0575F6F0>: it's not found as nltk.tag.hmm.estimator
>>> 

我在 MS-Windows10 上使用带有 NLTK3.1 的 Python2.7.11。

提前致谢。

【问题讨论】:

    标签: python python-2.7 nltk hidden-markov-models


    【解决方案1】:

    为什么要腌制模型?棕色语料库的训练速度非常快。如果您想要更好的词性标注器,请考虑查看https://spacy.io/,它在 Python 中易于使用,具有出色的酸洗支持并产生最先进的结果。的确,如今 HMM 标注器真的很糟糕。

    无论如何,这是一个 NLTK 错误。三个选项:

    1. 将错误报告给 NLTK 和/或通过将估计器函数移到 _train 函数之外以放入模块中来修复它(以便 pickle 可以在 nltk.tag.hmm.estimator 中找到它
    2. 提供您自己的估算器函数,以便 pickle 在您自己的模块中找到它
    3. 使用 pickle 替代品,例如 dill 或 cloudpickle:他们可能能够处理此估算器函数。

    以下是使用 dill 转储标记器的方法:

    import nltk
    import dill
    
    brown_a = nltk.corpus.brown.tagged_sents()[:300]
    hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
    sent = nltk.corpus.brown.sents()[400]
    hmm_tagger.tag(sent)
    # [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
    
    with open('my_tagger.dill', 'wb') as f:
        dill.dump(hmm_tagger, f)
    

    现在你可以加载标注器了:

    import dill
    
    with open('my_tagger.dill', 'rb') as f:
        hmm_tagger = dill.load(f)
    
    hmm_tagger.tag(sent)
    # [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
    

    【讨论】:

      猜你喜欢
      • 2020-08-21
      • 1970-01-01
      • 1970-01-01
      • 2017-09-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-05-11
      相关资源
      最近更新 更多