打印 Python 列表中的最佳匹配，其中每个元素在内部分隔答案

【问题标题】：Print the best match in the Python list, where each element is separated internally打印 Python 列表中的最佳匹配，其中每个元素在内部分隔
【发布时间】：2015-01-11 17:52:55
【问题描述】：

我基于文件中的元素创建了一个 Python 列表，即当 row[3] 中存在 row[0] 的元素时，将这两行都附加到列表 'matches' 中，反之亦然，当 row[3] 的元素是在row[0] 中，将它们附加到'matches'。列表如下所示

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']

我只想打印每个元素的第一个输出或完美匹配，无论下面的情况如何：

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

如果您注意到，列表中的每个元素都由";" 分隔。我试图以此为标准并进行比较。我只想要基于";" 之后的单词/单词的每个元素的第一次出现，或者当两边的单词相同时。例如，对于 Peripheral Blood Mononuclear Cells，它选择了第一个出现，而对于白种人，它选择了第二个，因为它完美匹配。在投票之前，我非常感谢任何帮助。

【问题讨论】：

试试list(set(my_list))
@PadraicCunningham 我发布的第二个列表是我需要的输出，即列表元素中的单词之间的第一次出现或完全匹配，用分号分隔。
所以您只想要唯一的值？
@Hackaholic Set 无法满足我的需求。
@PadraicCunningham 是的，但没有设置。它也应该不区分大小写。

标签： python regex list python-2.7 compare

【解决方案1】：

您需要跟踪所有看到的完整字符串和拆分的子字符串，并且只将我们没有看到的内容添加到 res：

l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
    a,b = ele.split(";",1)
    # make sure we don't have not seen the full string nor left/right hand substring
    # or we find exact matches both sides and we don't already have that perfect match added
    if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
        res.append(ele)
    # keep track of all full strings and left/right substrings 
    seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

【讨论】：

谢谢。但是我如何包含完美匹配的第二个条件。这里的输出缺少“白种人”部分。
抱歉，是的，我错过了那一秒
我想我可以通过包含 if a.lower() == b.lower() 来做到这一点
@dan if a == b 但你会得到双打，这是你想要的吗？尝试编辑
使用 'or' 让我多次出现同一个元素，这两个条件都满足，所以我说 if a==b: then your if statement.