【问题标题】:Python: Value returned by function not getting updated in pandas dataframePython:函数返回的值未在熊猫数据框中更新
【发布时间】:2021-06-06 12:47:21
【问题描述】:

我有一个带有列的fruits 数据框:(Name, Color) 和一个带有列的sentence 数据框:(Sentence)

水果数据框

          Name   Color
0        Apple     Red
1        Mango  Yellow
2       Grapes   Green
3   Strawberry    Pink

句子数据框

                      Sentence
0  I like Apple, Mango, Grapes
1            I like ripe Mango
2             Grapes are juicy
3           Oranges are citric

我需要将水果数据帧的每一行与句子数据帧的每一行进行比较,如果水果名称在句子中完全如此,请将其颜色连接到句子中水果名称之前。

这是我使用dataframe.apply()所做的:

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color']) 
print(fruit_df)

# create sentence dataframe 
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy'] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence']) 
print(sentence_df)


def search(desc, name, color):

    if re.findall(r"\b" + name + r"\b", desc):
             
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
            return new_desc 

def compare(name, color):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

我得到的结果是:

                      Sentence     Result
0  I like Apple, Mango, Grapes       None
1            I like ripe Mango       None
2             Grapes are juicy       None
3           Oranges are citric       None

预期结果:

                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric       

我也尝试使用itertuples() 遍历fruits_df,但结果仍然相同

for row in fruit_df.itertuples():
   result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
   print(result)

我不明白为什么search 函数返回的值没有存储在新列中。这是正确的做法还是我错过了什么?

【问题讨论】:

    标签: python regex pandas function dataframe


    【解决方案1】:

    问题是您为Fruit 的每一行调用compare,但每次传递都使用相同的输入。

    我刚刚在compare 函数中添加了一些调试打印以了解发生了什么:

    def compare(name, color):
        print(name, color)
        sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
        print(sentence_df['Result'])
    

    得到:

    Apple Red
    0    I like Red Apple, Mango, Grapes
    1                               None
    2                               None
    Name: Result, dtype: object
    Mango Yellow
    0    I like Apple, Yellow Mango, Grapes
    1              I like ripe Yellow Mango
    2                                  None
    Name: Result, dtype: object
    Grapes Green
    0    I like Apple, Mango, Green Grapes
    1                                 None
    2               Green Grapes are juicy
    Name: Result, dtype: object
    

    因此,当水果存在时您成功添加颜色,但在不存在时返回 None,并且每次通过时从原始列开始,因此只保留最后一个。

    如何解决:

    1. 首先在搜索中添加一个缺少的return desc,以避免出现None 结果

       def search(desc, name, color):
      
           if re.findall(r"\b" + name + r"\b", desc):
                   ...                 
                   new_desc = ''.join(arr)
                   return new_desc
           return desc
      
    2. 在应用比较之前初始化df['Result'],并将其用作输入:

       def compare(name, color):
           sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color))
      
       sentence_df['Result'] = sentence_df['Sentence']
       fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
      

    最终达到预期:

    The final result is: 
    0    I like Red Apple, Yellow Mango, Green Grapes
    1                        I like ripe Yellow Mango
    2                          Green Grapes are juicy
    Name: Result, dtype: object
    

    【讨论】:

    • 很好的解释!
    • 感谢您的解决方案!初始化结果列就可以了。
    【解决方案2】:

    我们可以在fruits 数据框的帮助下创建一个mapping 系列,然后使用这个mapping 系列和Series.replace 替换出现在Sentence 列中的水果名称与 mapping 系列中的相应替换 (Color + Fruit name):

    fruit = r'\b' + fruits['Name'] + r'\b'
    fruit_replacement = list(fruits['Color'] + ' ' + fruits['Name'])
    
    mapping = pd.Series(fruit_replacement, index=fruit)
    sentence['Result'] = sentence['Sentence'].replace(mapping, regex=True)
    

    >>> sentence
                          Sentence                                        Result
    0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
    1            I like ripe Mango                      I like ripe Yellow Mango
    2             Grapes are juicy                        Green Grapes are juicy
    3           Oranges are citric                            Oranges are citric
    

    【讨论】:

    • 感谢您的解决方案!这种方法比我目前的方法耗时更少。
    • @Animeartist 编码快乐!
    【解决方案3】:

    创建地图字典,然后替换。

    尝试:

    di = {fr: f"{co} {fr}" for fr, co in fruit_df.values}
    res = sentence_df.replace(di, regex=True)
    

    分辨率:

        Sentence
    0   I like Red Apple, Yellow Mango, Green Grapes
    1   I like ripe Yellow Mango
    2   Green Grapes are juicy
    

    【讨论】:

    • 感谢您的解决方案。
    猜你喜欢
    • 2018-05-13
    • 2022-01-21
    • 2017-03-23
    • 1970-01-01
    • 2014-07-04
    • 1970-01-01
    • 2022-07-22
    • 2017-10-14
    • 2021-11-24
    相关资源
    最近更新 更多