Top K常用词卡在一个部分[重复]答案

【问题标题】：Top K frequent words stuck in one part [duplicate]Top K常用词卡在一个部分[重复]
【发布时间】：2020-06-03 13:04:55
【问题描述】：

这里指的是leetcode问题：https://leetcode.com/problems/top-k-frequent-words/ 这是我的代码：

import heapq

class Solution:
# def topKFrequent(self, words: List[str], k: int) -> List[str]:
def topKFrequent(self, words, k):
    results = []
    wordTable = {}
    for word in words:
        if (wordTable.get(word) is None):
            wordTable[word] = 1
            continue
        wordTable[word] = (wordTable.get(word)) + 1

    heap = []
    # print(wordTable)
    heapSize = 0

    for word in wordTable.keys():
        node = [wordTable[word], word]
        if(heapSize<k):
            heapq.heappush(heap,node)
            heapSize += 1
            continue
        if(heapSize>=k):
            if (heap[0][0]< node[0]):
                heapq.heappushpop(heap,node)
                heapSize -= 1
                continue
            if heap[0][0] == node[0] and heap[0][1]>node[1]:
                heapq.heappop(heap)
                heapq.heappush(heap,node)
                heapSize -= 1
                continue

    # heap.sort(key = lambda x: x.freq, reverse=True);
    print(heap)

    for i in reversed(range(k)):
        results.append(heap[i][1])
    return results

如果所有单词的频率不同，该代码就可以工作，因为它使用最小堆。但是，如果它们具有相同的频率，则它不起作用，因为它以相反的顺序进行，因此字母顺序较大的单词首先出现，这是不被接受的（例如，如果我有 4 个频率相同的单词，并且假设它们是 a、b、c、d：我的结果将是 d、c、b、a，这是不可接受的）我不确定如何解释这种情况，并且在这个问题上被困了 3 个小时。有人可以帮忙吗？

【问题讨论】：

标签： python

【解决方案1】：

使用functools.cmp_to_key。

from functools import cmp_to_key

def cmp(a, b):
    if a[0] == b[0]:
         return -1 if a[1] < b[1] else 1
    return -1 if a[0] > b[0] else 1
return sorted(heap, key=cmp_to_key(cmp))

【讨论】：

【解决方案2】：

这一个班轮应该可以帮助你。

from collections import Counter

topk = lambda words, k: [t[0] for t in Counter(list(sorted(words))).most_common(k)]

print(topk(["i", "love", "leetcode", "i", "love", "coding"], k=2))
print(topk(["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k=4))

# Output
['i', 'love']
['the', 'is', 'sunny', 'day']

第一步是预先使用排序列表 list(sorted(words))。
计数器将list 转换为频率。它的内置喜欢 heapq。
most_common(k)顾名思义给你最多常用的词。但请注意，我们已经对它们进行了排序按字典顺序排列。
最后的外部 for 循环仅使用第一个 most_common(k) 函数返回的元组的值

【讨论】：