为了简化示例,我选择使用包含您的虚拟数据的列表。你需要让它适应你的问题。
此外,我将您的句子“期望的结果是保留 df['full_string'] 中包含 df['substring'] 中的文本的项目”解释为 text = word。
full_str = ['apples and bananas', 'applesandbananasamongstothers', 'something else',
'ApplesandBananas', 'apples and Bananas', 'bananas']
sub_str = ['apples and bananas', 'red and blue']
# Extract words from sub strings
words_in_sub = [elt.split() for elt in sub_str]
# Flatten and remove duplicates
words_in_sub = list(set([item for sublist in words_in_sub for item in sublist]))
# Init output
output = list()
# Loop on the strings in full string
for full_s in full_str:
# Loop on the words to look for
for word in words_in_sub:
if word.lower() in full_s.lower():
output.append(full_s)
break
输出:
In: output
Out:
['apples and bananas',
'applesandbananasamongstothers',
'ApplesandBananas',
'apples and Bananas',
'bananas']
在 if 条件中处理小写/大写。间距由in 语句处理。 full_s 中其他文本的存在由 in 语句处理。如果单词出现在字符串中的某处,in 语句将返回 True。当单词可能被认为存在于字符串中时,它会返回 False 的唯一情况是单词被空格分成两个,例如'bana naan dapp les'。此示例不会保留在输出列表中。
编辑:多行。您也可以将列表展平并使用第一个代码。
full_str = [['apples and bananas', 'applesandbananasamongstothers', 'something else'],
['ApplesandBananas', 'apples and Bananas', 'bananas']]
sub_str = [['apples and bananas'], ['apples and bananas']]
# Assuming same number of rows between full_str and sub_str
# And you want to keep element of full_str[k] according to sub strings in sub_str[k]
number_of_rows = len(full_str)
for k in range(number_of_rows):
# Extract words from sub strings
words_in_sub = [elt.split() for elt in sub_str[k]]
# Flatten and remove duplicates
words_in_sub = list(set([item for sublist in words_in_sub for item in sublist]))
# Init output
output = list()
# Loop on the strings in full string
for full_s in full_str[k]:
# Loop on the words to look for
for word in words_in_sub:
if word.lower() in full_s.lower():
output.append(full_s)
break