【问题标题】:Spacy Dependency Parsing with Pandas dataframe使用 Pandas 数据框进行 Spacy 依赖解析
【发布时间】:2021-04-18 16:35:56
【问题描述】:

我想在我的 pandas 数据帧上使用 Spacy 的 Dependency 解析器为基于方面的情感分析提取名词-形容词对。我在来自 Kaggle 的亚马逊美食评论数据集上尝试了这段代码:Named Entity Recognition in aspect-opinion extraction using dependency rule matching

但是,我将 pandas 数据框提供给 spacy 的方式似乎有问题。我的结果不是我期望的那样。有人可以帮我调试一下吗。非常感谢。

!python -m spacy download en_core_web_lg
import nltk
nltk.download('vader_lexicon')

import spacy
nlp = spacy.load("en_core_web_lg")

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()


def find_sentiment(doc):
    # find roots of all entities in the text
  for i in df['Text'].tolist():
    doc = nlp(i)
    ner_heads = {ent.root.idx: ent for ent in doc.ents}
    rule3_pairs = []
    for token in doc:
        children = token.children
        A = "999999"
        M = "999999"
        add_neg_pfx = False
        for child in children:
            if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
                if child.idx in ner_heads:
                    A = ner_heads[child.idx].text
                else:
                    A = child.text
            if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
                M = child.text
            # example - 'this could have been better' -> (this, not better)
            if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
                neg_prefix = "not"
                add_neg_pfx = True
            if(child.dep_ == "neg"): # neg is negation
                neg_prefix = child.text
                add_neg_pfx = True
        if (add_neg_pfx and M != "999999"):
            M = neg_prefix + " " + M
        if(A != "999999" and M != "999999"):
            rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
    return rule3_pairs
df['three_tuples'] = df['Text'].apply(find_sentiment) 
df.head()

我的结果是这样的,这显然意味着我的循环有问题:

【问题讨论】:

    标签: python pandas nlp spacy sentiment-analysis


    【解决方案1】:

    如果您在 df['Text'] 上调用 apply,那么您实际上是在遍历该列中的每个值并将该值传递给函数。

    但是,在这里,您的函数本身会迭代您应用该函数的同一数据框列,同时还会覆盖在函数早期传递给它的值。

    所以我将首先重写函数,看看它是否产生预期的结果。我不能肯定地说,因为你没有发布任何示例数据,但这至少应该让球向前移动:

    def find_sentiment(text):
        doc = nlp(text)
        ner_heads = {ent.root.idx: ent for ent in doc.ents}
        rule3_pairs = []
        for token in doc:
            children = token.children
            A = "999999"
            M = "999999"
            add_neg_pfx = False
            for child in children:
                if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
                    if child.idx in ner_heads:
                        A = ner_heads[child.idx].text
                    else:
                        A = child.text
                if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
                    M = child.text
                # example - 'this could have been better' -> (this, not better)
                if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
                    neg_prefix = "not"
                    add_neg_pfx = True
                if(child.dep_ == "neg"): # neg is negation
                    neg_prefix = child.text
                    add_neg_pfx = True
            if (add_neg_pfx and M != "999999"):
                M = neg_prefix + " " + M
            if(A != "999999" and M != "999999"):
                rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
        return rule3_pairs
    
    

    【讨论】:

    • 是的,它奏效了。非常感谢编辑后的代码和解释
    猜你喜欢
    • 2018-12-17
    • 1970-01-01
    • 1970-01-01
    • 2023-03-15
    • 1970-01-01
    • 2021-09-06
    • 1970-01-01
    • 2023-03-16
    • 2014-06-10
    相关资源
    最近更新 更多