Python：返回字符串中恰好出现一次的单词答案

【问题标题】：Python: Return the words in a string that occur exactly oncePython：返回字符串中恰好出现一次的单词
【发布时间】：2017-10-03 22:16:57
【问题描述】：

假设我有一个接收某个字符串的函数，然后我需要返回该字符串中恰好出现一次的单词集。这样做的最佳方法是什么？使用 dict 会有帮助吗？我尝试了一些伪代码，例如：

counter = {}
def FindWords(string):
    for word in string.split()
        if (word is unique): counter.append(word)
return counter

有没有更好的方法来实现这一点？谢谢！

编辑：

假设我有：“那个男孩跳过了另一个男孩”。我想返回“jumped”、“over”和“other”。

另外，我想把它作为一个集合返回，而不是一个列表。

【问题讨论】：

你有哪组词？
假设我有一组词，例如：“那个男孩跳过了另一个男孩”。我想返回“jumped”、“over”和“other”。

标签： python string set find-occurrences

【解决方案1】：

您可以使用collections 中的Counter 并返回一组只出现一次的单词。

from collections import Counter

sent = 'this is my sentence string this is also my test string'

def find_single_words(s):
    c = Counter(s.split(' '))
    return set(k for k,v in c.items() if v==1)

find_single_words(sent)
# returns:
{'also', 'sentence', 'test'}

要仅使用基本 Python 实用程序执行此操作，您可以使用字典来记录出现次数，复制 Counter 的功能。

sent = 'this is my sentence string this is also my test string'

def find_single_words(s):
    c = {}
    for word in s.split(' '):
        if not word in c:
             c[word] = 1
        else:
             c[word] = c[word] + 1
    return [k for k,v in c.items() if v==1]

find_single_words(sent)
# returns:
['sentence', 'also', 'test']

【讨论】：

有没有办法在不导出 Counter 等外部工具的情况下做到这一点？
@J.P. collections 是标准库的一部分，它并不是真正的外部工具
@J.P.我在答案中添加了一个附加部分，见上文
您好，谢谢！如果你想返回一个集合而不是一个列表，你知道如何改变它吗？代替 c.items()，你能返回一个集合吗？
@J.P.当然，我修改了答案的第二部分以返回一个集合

【解决方案2】：

这可能是您的想法。

>>> counts = {}
>>> sentence =  "The boy jumped over the other boy"
>>> for word in sentence.lower().split():
...     if word in counts:
...         counts[word]+=1
...     else:
...         counts[word]=1
...         
>>> [word for word in counts if counts[word]==1]
['other', 'jumped', 'over']
>>> set([word for word in counts if counts[word]==1])
{'other', 'jumped', 'over'}

但正如其他人建议的那样，使用 Collections 中的 defaultdict 会更好。

【讨论】：

Uniques 不应该是“the”或“boy”。它应该只给出“jumped”、“over”和“other”。
谢谢！你知道如何将它作为一个集合而不是一个列表返回吗？
添加了。set() 将列表更改为集合。

【解决方案3】：

你可以试试这个：

s = "The boy jumped over the other boy"
s1 = {"jumped", "over", "other"}
final_counts = [s.count(i) for i in s1]

输出：

[1, 1, 1]

【讨论】：

【解决方案4】：

s='The boy jumped over the other boy'
def func(s):
    l=[]
    s=s.split(' ')  #edit for case-sensitivity here
    for i in range(len(s)):
        if s[i] not in s[i+1:] and s[i] not in s[i-1::-1]:
            l.append(s[i])
    return set(l)  #convert to set and return
print(func(s))

这应该工作得很好。

检查每个元素是否有任何元素在它前面或后面的列表中匹配它，如果没有，则追加它。

如果您不想区分大小写，则可以在拆分之前添加s=s.lower() 或s=s.upper()。

【讨论】：

遍历每个单词的整个单词列表使得这是一个 O(n^2) 算法，随着输入变大，它会变得非常慢。使用字典来计算出现次数会更好地扩展到大型输入。

【解决方案5】：

试试这个。

>>> sentence = "The boy jumped over the other boy"
>>> set(word for word in sentence.lower().split() if sentence.count(word) == 1)
{'other', 'over', 'jumped'}
>>>

编辑：这更容易阅读：

>>> sentence = 'The boy jumped over the other boy'
>>> words = sentence.lower().split()
>>> uniques = {word for word in words if words.count(word) == 1}
>>> uniques
{'over', 'other', 'jumped'}
>>> type(uniques)
<class 'set'>

【讨论】：