在dict python中获取字长[重复]答案

【问题标题】：Get word length in dict python [duplicate]在dict python中获取字长[重复]
【发布时间】：2017-10-18 23:09:35
【问题描述】：

例子：

 def test_get_word_len_dict():
    text = "May your coffee be strong and your Monday be short"
    the_dict = get_word_len_dict(text)
    print_dict_in_key_order(the_dict)
    print()

    text = 'why does someone believe you when you say there are four billion stars but they have to check when you say the paint is wet'
    the_dict = get_word_len_dict(text)
    print_dict_in_key_order(the_dict)

我想得到一个字典，其中的键是整数和对应的值，它们是唯一单词的列表。与键对应的单词列表包含来自的所有唯一单词长度等于键值的文本。相应的唯一单词列表应按字母顺序排序。我不知道如何修复我的功能

def get_len_word_dict():
     new_text = text.split()
     for k,v in new_text.items():
        if len(v) = len(k):
           return k,v

预期：

 2 : ['be']
 3 : ['May', 'and']
 4 : ['your']
 5 : ['short']
 6 : ['Monday', 'coffee', 'strong']

 2 : ['is', 'to']
 3 : ['are', 'but', 'say', 'the', 'wet', 'why', 'you']
 4 : ['does', 'four', 'have', 'they', 'when']
 5 : ['check', 'paint', 'stars', 'there']
 7 : ['believe', 'billion', 'someone']

【问题讨论】：

一方面，您在 if 语句中有一个等号。你能解释一下你期望你的函数做什么，为什么？现在，即使您将= 更改为==，它也只会查找两个长度相同的单词并返回它们。
只需使用同一问题的先前答案之一，并在最后对 dict 中的列表进行排序。没有必要一直问同样的问题

标签： python

【解决方案1】：

defaultdict 的 sets 应该这样做。首先，让我们定义一个函数。

from collections import defaultdict

def get_word_counts(text):
    d = defaultdict(set)

    for word in text.split():
        d[len(word)].add(word)    # observe this bit carefully

    return {k : list(v) for k, v in d.items()}

这个想法是找到每个单词的长度，并将其插入到它所属的列表/集合中。一旦你定义了函数，你可以随意调用它。

text = "May your coffee be strong and your Monday be short"
print(get_word_counts(text))
{2: ['be'], 3: ['and', 'May'], 4: ['your'], 5: ['short'], 6: ['coffee', 'strong', 'Monday']}

【讨论】：

Dict 函数无效。
@Kee 除了将defaultdict 转换为dict，它只会在打印时看起来更具可读性。 ;-)
如何使用列表函数达到相同的结果并对输出进行排序？

【解决方案2】：

您还可以将itertools.groupby 与sorted 一起使用以获得类似的“惰性”结果：

a = 'a long list of words in nice'
x = groupby(sorted(a.split(), key=len), len)  # word counts
print(dict((a, list(b)) for a, b in x))
>>> {1: ['a'], 2: ['of', 'in'], 4: ['long', 'list', 'nice'], 5: ['words']}

“懒惰”是指事情不会开始实际计算（例如，如果你有一个非常大的字符串），直到你开始迭代它。不过要小心groupby() 返回的迭代器！这很容易意外清空它们然后尝试第二次读取（并获得空列表）。

返回的组本身就是一个迭代器，它共享底层可使用 groupby() 进行迭代。因为源是共享的，当 groupby() 对象是高级的，以前的组不再可见。因此，如果以后需要该数据，则应将其存储为列表

【讨论】：