【问题标题】:Check if dict key is a substring of any other element in the dictionary in Python?检查dict键是否是Python字典中任何其他元素的子字符串?
【发布时间】:2016-12-04 15:09:36
【问题描述】:

我有一本字典ngram_list如下:

ngram_list = dict_items([
    ('back to back breeding', {'wordcount': 4, 'count': 3}),
    ('back breeding', {'wordcount': 2, 'count': 5}),
    ('several consecutive heats', {'wordcount': 3, 'count': 2}),
    ('how often should', {'wordcount': 3, 'count': 2}),
    ('often when breeding', {'wordcount': 3, 'count': 1})
])

我想将列表从最短字数排序到最大,然后遍历字典,如果键是任何其他项的子字符串,则将其删除(子字符串项)。

预期输出:

ngram_list = dict_items([
    ('several consecutive heats', {'wordcount': 3, 'count': 2}),
    ('how often should', {'wordcount': 3, 'count': 2}),
    ('often when breeding', {'wordcount': 3, 'count': 1}),
    ('back to back breeding', {'wordcount': 4, 'count': 3})
])

【问题讨论】:

  • 你的最终预期输出字典是什么?
  • @Skycc 更新抱歉
  • 所以你希望你的输出作为字典或像 dict.items() 这样的元组列表返回?您将需要 OrderedDict 用于按顺序排序的项目
  • 您是否也想替换一个键是另一个键的子字符串,但是是不同的词?就像“猫”和“灾难”一样?
  • @tobias_k 只有完整的单词/ngrams/表达式,而不是单词的一部分

标签: python python-2.7 python-3.x dictionary


【解决方案1】:

首先过滤输入的dict以去除不需要的项目,然后使用带有key的sorted函数按字数对项目进行排序,最后使用OrderedDict构建dict

使用简单的in 仅检查子字符串,如果要注意精确的全字边界匹配,可能需要使用regex

from collections import OrderedDict
ngram_dict = {
    'back to back breeding': {'wordcount': 4, 'count': 3},
    'back breeding': {'wordcount': 2, 'count': 5},
    'several consecutive heats': {'wordcount': 3, 'count': 2},
    'how often should': {'wordcount': 3, 'count': 2},
    'often when breeding': {'wordcount': 3, 'count': 1}
}

# ngram items with unwanted items filter out
ngram_filter = [i for i in ngram_dict.items() if not any(i[0] in k and i[0] != k for k in ngram_dict.keys())]
final_dict = OrderedDict( sorted(ngram_filter, key=lambda x:x[1].get('wordcount')) )

# final_dict = OrderedDict([('several consecutive heats', {'count': 2, 'wordcount': 3}), ('how often should', {'count': 2, 'wordcount': 3}), ('often when breeding', {'count': 1, 'wordcount': 3}), ('back to back breeding', {'count': 3, 'wordcount': 4})])

所有这些都可以装入 1 个衬垫中,如下所示

from collections import OrderedDict
final_dict = OrderedDict( 
sorted((i for i in ngram_dict.items() if not any(i[0] in k and i[0] != k for k in ngram_dict.keys())), 
key=lambda x:x[1].get('wordcount')) )

【讨论】:

    猜你喜欢
    • 2011-11-03
    • 1970-01-01
    • 2015-11-12
    • 1970-01-01
    • 2020-06-01
    • 2014-10-14
    • 2014-09-24
    • 2022-11-19
    相关资源
    最近更新 更多