【问题标题】:must capture output of a function that has no return statement必须捕获没有返回语句的函数的输出
【发布时间】:2017-11-16 19:22:31
【问题描述】:

我正在使用 NLTK 包,它有一个功能可以告诉我给定的句子是正面的、负面的还是中性的:

from nltk.sentiment.util import demo_liu_hu_lexicon

demo_liu_hu_lexicon('Today is a an awesome, happy day')
>>> Positive

问题是,该函数没有返回语句 - 它只是将“正”、“负”或“中性”打印到标准输出。它返回的所有 - 隐式 - 是一个 NoneType 对象。 (Here是函数的源代码。)

有什么方法可以捕获这个输出(除了弄乱我机器上的 NLTK 源代码)?

【问题讨论】:

  • 只需复制该函数,并根据需要进行修改。它的名字中有“demo”是有原因的……
  • 如果有帮助,请考虑将答案标记为正确。
  • 嘿?您正在将 NLTK 中的演示代码用作函数!这是不可取的。改为根据演示编写自己的自定义函数 =)

标签: python python-3.x nltk


【解决方案1】:
import sys
from io import StringIO

class capt_stdout:
    def __init__(self):
        self._stdout = None
        self._string_io = None

    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._string_io = StringIO()
        return self

    def __exit__(self, type, value, traceback):
        sys.stdout = self._stdout

    @property
    def string(self):
        return self._string_io.getvalue()

这样使用:

with capt_stdout() as out:
    demo_liu_hu_lexicon('Today is a an awesome, happy day')
    demo_liu_hu_lexicon_output = out.string

【讨论】:

    【解决方案2】:

    TL;DR

    demo_liu_hu_lexicon 函数是一个演示函数,用于演示如何使用opinion_lexicon。它用于测试,不应直接使用。


    长期

    让我们看一下函数,看看我们如何重新创建一个类似的函数https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616

    def demo_liu_hu_lexicon(sentence, plot=False):
        """
        Basic example of sentiment classification using Liu and Hu opinion lexicon.
        This function simply counts the number of positive, negative and neutral words
        in the sentence and classifies it depending on which polarity is more represented.
        Words that do not appear in the lexicon are considered as neutral.
        :param sentence: a sentence whose polarity has to be classified.
        :param plot: if True, plot a visual representation of the sentence polarity.
        """
        from nltk.corpus import opinion_lexicon
        from nltk.tokenize import treebank
    
        tokenizer = treebank.TreebankWordTokenizer()
    

    好的,在函数内部存在导入是一个奇怪的用途,但这是因为它是用于简单测试或文档的演示函数。

    另外,treebank.TreebankWordTokenizer() 的用法比较奇怪,我们可以简单地使用nltk.word_tokenize

    让我们将导入移出并将demo_liu_hu_lexicon 重写为simple_sentiment 函数。

    from nltk.corpus import opinion_lexicon
    from nltk import word_tokenize
    
    def simple_sentiment(text):
        pass
    

    接下来,我们看到

    def demo_liu_hu_lexicon(sentence, plot=False):
        """
        Basic example of sentiment classification using Liu and Hu opinion lexicon.
        This function simply counts the number of positive, negative and neutral words
        in the sentence and classifies it depending on which polarity is more represented.
        Words that do not appear in the lexicon are considered as neutral.
        :param sentence: a sentence whose polarity has to be classified.
        :param plot: if True, plot a visual representation of the sentence polarity.
        """
        from nltk.corpus import opinion_lexicon
        from nltk.tokenize import treebank
    
        tokenizer = treebank.TreebankWordTokenizer()
        pos_words = 0
        neg_words = 0
        tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]
    
        x = list(range(len(tokenized_sent))) # x axis for the plot
        y = []
    

    功能

    1. 首先对句子进行分词和小写
    2. 初始化肯定词和否定词的个数。
    3. xy 为稍后的一些绘图而初始化,所以让我们忽略它。

    如果我们进一步向下函数:

    def demo_liu_hu_lexicon(sentence, plot=False):
        from nltk.corpus import opinion_lexicon
        from nltk.tokenize import treebank
    
        tokenizer = treebank.TreebankWordTokenizer()
        pos_words = 0
        neg_words = 0
        tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]
    
        x = list(range(len(tokenized_sent))) # x axis for the plot
        y = []
    
        for word in tokenized_sent:
            if word in opinion_lexicon.positive():
                pos_words += 1
                y.append(1) # positive
            elif word in opinion_lexicon.negative():
                neg_words += 1
                y.append(-1) # negative
            else:
                y.append(0) # neutral
    
        if pos_words > neg_words:
            print('Positive')
        elif pos_words < neg_words:
            print('Negative')
        elif pos_words == neg_words:
            print('Neutral')
    
    1. 循环简单地遍历每个标记并检查单词是否在正/负词典中。

    2. 最后,它检查编号。正面和负面的单词并返回标签。

    现在让我们看看是否可以有更好的simple_sentiment 函数,现在我们知道demo_liu_hu_lexicon 做了什么。

    无法避免步骤 1 中的标记化,因此我们有:

    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank
    
    def simple_sentiment(text):
        tokens = [word.lower() for word in word_tokenize(text)]
    

    第 2-5 步有一个懒惰的方法是复制+粘贴并更改 print() -> return

    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank
    
    def simple_sentiment(text):
        tokens = [word.lower() for word in word_tokenize(text)]
    
        for word in tokenized_sent:
            if word in opinion_lexicon.positive():
                pos_words += 1
                y.append(1) # positive
            elif word in opinion_lexicon.negative():
                neg_words += 1
                y.append(-1) # negative
            else:
                y.append(0) # neutral
    
        if pos_words > neg_words:
            return 'Positive'
        elif pos_words < neg_words:
            return 'Negative'
        elif pos_words == neg_words:
            return 'Neutral'
    

    现在,你有了一个可以随心所欲的功能。


    顺便说一句,这个演示真的很奇怪..

    当我们看到一个肯定的词时添加 1,当我们看到一个否定的词时,我们添加 -1。 当pos_words &gt; neg_words 时,我们会说一些积极的事情。

    这意味着整数列表比较遵循一些可能没有语言或数学逻辑的 Pythonic 序列比较 =(参见 What happens when we compare list of integers?

    【讨论】:

      【解决方案3】:
      import sys
      import io
      from io import StringIO
      
      stdout_ = sys.stdout
      stream = StringIO()
      sys.stdout = stream
      demo_liu_hu_lexicon('PLACE YOUR TEXT HERE') 
      sys.stdout = stdout_ 
      sentiment = stream.getvalue()     
      sentiment = sentiment[:-1]
      

      【讨论】:

      • 请考虑对代码进行一些解释(原因、方法和见解),以教导发布问题的人深入理解它。
      猜你喜欢
      • 2011-06-06
      • 2011-10-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-10-02
      相关资源
      最近更新 更多