【问题标题】:Python - Update tuple string element errorPython - 更新元组字符串元素错误
【发布时间】:2021-11-17 10:40:44
【问题描述】:

我有一个数据框,其中每一行都是一个元组列表,即:tuple = (word, pos_tag)。在每一行中,我想通过标记更改某些元组的word,然后用标记的单词更新元组。例如:

初始数据框行:

[('This', 'DET'), ('is', 'VERB'), ('an', 'DET'), ('example', 'NOUN'), ('text', 'NOUN'), ('that', 'DET'), ('I', 'PRON'), ('use', 'VERB'), ('in', 'ADP'), ('order', 'NOUN'), ('to', 'PART'), ('get', 'VERB'), ('an', 'DET'), ('answer', 'NOUN')]

更新词:

updated_word : <IN>example</IN>
updated_word  : <TAR>answer</TAR>

期望的输出:

[('This', 'DET'), ('is', 'VERB'), ('an', 'DET'), ('<IN>example</IN>', 'NOUN'), ('text', 'NOUN'), ('that', 'DET'), ('I', 'PRON'), ('use', 'VERB'), ('in', 'ADP'), ('order', 'NOUN'), ('to', 'PART'), ('get', 'VERB'), ('an', 'DET'), ('<TAR>answer</TAR>', 'NOUN')]

但我收到一个错误,TypeError: 'tuple' object is not callable。有人可以帮忙吗?这是代码:

for idx, row in df.iterrows():
    doc = nlp(row['title'])
    pos_tags = [(token.text, token.pos_) for token in doc if not token.pos_ == "PUNCT"]

    for position, tuple in enumerate(pos_tags, start=1):
        word = tuple[0]
        spacy_pos_tag = tuple[1]
        word = re.sub(r'[^\w\s]', '', word)
        for col in cols:
            if position in row[col]:
                word = f'<{col.upper()}>{word}</{col.upper()}>'
            else:
                continue
            tuple = tuple(word, spacy_pos_tag)
            print(tuple)


 >>>>   Traceback (most recent call last):
 >>>>   tuple = tuple(word, spacy_pos_tag)
 >>>>   TypeError: 'tuple' object is not callable

更新问题

我已按照建议将tuple 替换为tuple_,但我仍然无法取回所需的输出,即每行中的元组列表。有人可以帮助如何更新数据框行吗?这是更新的代码:

for idx, row in df.iterrows():
    doc = nlp(row['title'])
    pos_tags = [(token.text, token.pos_) for token in doc if not token.pos_ == "PUNCT"]
    # print(idx, "tokens, pos : ", pos_tags, "\n")

    for position, tuple_ in enumerate(pos_tags, start=1):
        word = tuple_[0]
        spacy_pos_tag = tuple_[1]
        word = re.sub(r'[^\w\s]', '', word)
        for col in cols:
            if position in row[col]:
                word = f'<{col.upper()}>{word}</{col.upper()}>'
            else:
                continue
            tuple_ = (word, spacy_pos_tag)
        pos_tags.append(' '.join(position, tuple_))
    # pos_tags.append(' '.join(tuple_))
    print(idx, "tokens, pos : ", pos_tags, "\n")
    
    
>>>> Traceback (most recent call last):
>>>> pos_tag(df=df_matched)
>>>> pos_tags.append(' '.join(position, tuple_))
>>>> TypeError: join() takes exactly one argument (2 given)

【问题讨论】:

    标签: python pandas nlp tuples


    【解决方案1】:

    不要使用tuple作为变量名,因为它是一个内置的python类型名。请尝试以下方法:

        for position, tuple_ in enumerate(pos_tags, start=1):
            word = tuple_[0]
            spacy_pos_tag = tuple_[1]
            word = re.sub(r'[^\w\s]', '', word)
            for col in cols:
                if position in row[col]:
                    word = f'<{col.upper()}>{word}</{col.upper()}>'
                else:
                    continue
    
                tuple_ = (word, spacy_pos_tag)
                print(tuple_)
    

    【讨论】:

    • 好的,我现在没有这个错误,但是我没有得到想要的输出,它是一个元组列表。相反,我分别获取每个元组
    • @joasa 请考虑 tlentali 关于如何在 pandas 中正确实现这一点的答案。这个答案只是解决了你的主要问题
    • No @tlentali 的回答没有帮助,因为在每一行中,哪个单词将被替换是随机的。你的答案是有帮助的,虽然我想知道我将如何在每一行中有一个元组列表,就像我的问题的期望输出一样
    【解决方案2】:

    不要使用“元组”作为变量名。这是一个类型名称

    【讨论】:

    • 您的答案是重复的答案。
    猜你喜欢
    • 1970-01-01
    • 2013-10-29
    • 1970-01-01
    • 2022-06-17
    • 1970-01-01
    • 2018-12-31
    • 1970-01-01
    • 2019-03-09
    • 2011-12-22
    相关资源
    最近更新 更多