【问题标题】:How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?如何比较 Pandas Dataframe 中的两列以找到匹配百分比并根据该逻辑返回一个值?
【发布时间】:2019-08-26 03:51:34
【问题描述】:

我需要比较 Pandas 数据框中的两列并进行模糊匹配。

如果模糊匹配高于某个百分比(例如 85),我需要返回那个百分比,或者一个字符串说 "Partial Match"

如果完全匹配,返回"Full Match"

如果不匹配,返回"No Match"

我尝试过的解决方案:

尝试 #1

 conditions = [
     (df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80, 
      fuzz.ratio((df['one'],df['two'])) <= 80]

  choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]

df['result'] = np.select(condition,choices, default = np.nan)

================================================ ======================

尝试 #2

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')

 import pandas as pd
 import numpy as np
 from fuzzywuzzy import fuzz
 import os


 df = pd.read_csv('data.csv')

 >x = fuzz.ratio(df['one'], df['two']) >= 85

 df['result'] = np.where(x, "Match", 'No Match')'''

预期结果

         one          two    result
 0    apple        Apple     Partial Match
 1  banana       bannana     Partial Match
 2     kiwi  dragonfruit     No Match
 3    mango        mango     Full Match

================================================ =====================

错误信息:

尝试 #1

IndexError: 元组索引超出范围

尝试 #2

ValueError:Series 的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

【问题讨论】:

  • 您可以发布您的数据样本吗?

标签: python pandas numpy dataframe


【解决方案1】:

尝试将最后两个命令合并为一个

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')

【讨论】:

    【解决方案2】:

    我认为这可以解决问题:

    from difflib import SequenceMatcher
    
    def similar(a, b):
        match_score = SequenceMatcher(None, a, b).ratio()
        if match_score == 1.0:
            result = "Full Match"
        elif match_score >= .85:
            result = "Partial Match"
        else:
            result = "No Match"
        return result
    
    df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-06-10
      • 2019-10-30
      • 1970-01-01
      • 1970-01-01
      • 2015-12-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多