【问题标题】:How to replace tokens if they are used together?如果它们一起使用,如何替换令牌?
【发布时间】:2020-09-01 14:29:37
【问题描述】:

我想使用 python 对 COVID-19 主题进行情感分析。问题出现了,像“positive testing”这样的条目接收一个正极性,尽管这个声明是一个否定声明。我目前的代码如下:

import nltk
from textblob import TextBlob
from nltk.stem import WordNetLemmatizer

# Setting the test string
test_string = "He was tested positive on Covid-19"

tokens = nltk.word_tokenize(test_string)

# Lemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

tokens_lem_list = []
for word in tokens:
    lem_tokens = wordnet_lemmatizer.lemmatize(word, pos="v")
    tokens_lem_list.append(lem_tokens)

# List to string
tokens_lem_str = ' '.join(tokens_lem_list)

# Print the polarity of the string
print(TextBlob(tokens_lem_str).sentiment.polarity)

输出如下:

0.22727272727272727

Process finished with exit code 0

因此,我想删除标记“test”和“positive”,如果它们一起使用,并用单词“ill”替换它们。我应该使用循环还是只会用大量文本消耗我的计算能力?

非常感谢您的帮助!

【问题讨论】:

  • 您的具体问题是什么?关于将positive testtest positive改成disease的代码,或者关于时间复杂度问题?
  • 而是第一个。但我已经解决了。感谢您的留言。 :)

标签: python nlp nltk token sentiment-analysis


【解决方案1】:

我的问题已解决如下:

# Producing a loop which finds "positive" and "negative" tested string entries
matches_positive = ["test", "positive"]
matches_negative = ["test", "negative"]

replaced_testing_term_sentence = []
for sentence_lem in sentences_list_lem:
    # Constrain to replace "positive tested" by "not healthy"
    if all(x in sentence_lem for x in matches_positive):
        sentence_lem = [word.replace("positive", "not healthy") for word in sentence_lem]
        sentence_lem.remove("test")
        replaced_testing_term_sentence.append(sentence_lem)
    # Constrain to replace "negative tested" by "not ill"
    elif all(x in sentence_lem for x in matches_negative):
        sentence_lem = [word.replace("negative", "not ill") for word in sentence_lem]
        sentence_lem.remove("test")
        replaced_testing_term_sentence.append(sentence_lem)
    # Constrain to remain not matching sentences in the data sample
    else:
        replaced_testing_term_sentence.append(sentence_lem)

它完成了这项工作。选定的替换术语是故意选择的。如果有人看到优化的潜力,我将不胜感激。

【讨论】:

    猜你喜欢
    • 2012-04-15
    • 2011-08-26
    • 1970-01-01
    • 1970-01-01
    • 2019-05-09
    • 1970-01-01
    • 2012-01-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多