按频率排序字符串数组的最有效方法答案

【问题标题】：Most efficient way to order an array of Strings by frequency按频率排序字符串数组的最有效方法
【发布时间】：2013-09-10 17:19:40
【问题描述】：

我有一个字符串数组：

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

按照每个String 的频率顺序，将其排序为较小的Collection 的最快/最有效方法是什么？

我虽然关于使用 String 作为 HashMap<String,Integer> 中的键，但这不会按频率排序

我考虑的另一种方法是使用带有该整数的字符串列表的TreeMap<Integer, String[]>，但似乎涉及很多检查..

如果可能的话，我试图避免使用多个循环，因为我的String 数组将比上面的数组大得多。谢谢！

编辑我想要的只是能够按频率顺序输出字符串，并且最好能够将该字符串与其在数组中的频率配对，例如两个输出数组：

["x", "y", "z", "a"]
[3,2,1,1]

如果速度不是问题，这将是一个非常简单的问题，这就是为什么我在这里问伟大的思想 :)

【问题讨论】：

您可以使用HashMap。将每个字符串保留为键，并在每次获得键时将1 添加到值中。创建结果集合只不过是按值排序并添加键值次（如果键 x 具有值 5，则打印 x 5 次）。
这个问题的第一个答案应该让你知道如何做到这一点：stackoverflow.com/questions/6712587/…

标签： java arrays string mode

【解决方案1】：

使用HashMap<String,Integer> 来维护您的计数。这将是处理任意字符串列表的最有效方式。

从地图的entrySet() 创建一个ArrayList<Map.Entry<String,Integer>>。

使用Collections.sort() 和自定义比较器对该列表进行排序。

不要沉迷于微优化。

【讨论】：

【解决方案2】：

你可以分两步解决这个问题：

创建一个 counter 对象 - 一个 Map<String, Integer> 列出每个字符串在输入中出现的次数：换句话说，它是一个频率图。这是O(n)，因为您只需要遍历输入一次即可构建地图
使用之前的地图，创建一个包含其键的列表，使用项目的频率（地图中的值）作为排序标准进行排序。这是O(n log n)，您可以调用Collections.sort()，使用Comparator 使用字符串频率进行比较

这就是我的意思：

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

final Map<String, Integer> counter = new HashMap<String, Integer>();
for (String str : stringArray)
    counter.put(str, 1 + (counter.containsKey(str) ? counter.get(str) : 0));

List<String> list = new ArrayList<String>(counter.keySet());
Collections.sort(list, new Comparator<String>() {
    @Override
    public int compare(String x, String y) {
        return counter.get(y) - counter.get(x);
    }
});

上述代码执行后，变量list将包含以下值（未指定相同频率元素之间的顺序）：

[x, y, a, z]

将列表转换为数组很简单：

list.toArray(new String[list.size()])

如果您需要找出每个字符串的频率，只需遍历已排序的键：

for (String str : list) {
    int frequency = counter.get(str);
    System.out.print(str + ":" + frequency + ", ");
}

【讨论】：

频率相同的元素如何排序？您的比较器确保具有相同频率的元素按字母顺序排列？
再读一遍："未指定同频元素之间的顺序。"
对此很抱歉，但在像this one 这样的示例中，具有相同频率的元素的排序似乎是隐含的，我并不完全清楚为什么。

【解决方案3】：

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

List<String> list = Arrays.asList(stringArray);
Collections.sort(list);

HashMap<String, Integer> map = new HashMap<String, Integer>();

for(int i = 0; i < list.size();) {

    String s = list.get(i); //get the string to count

    int count = list.lastIndexOf(s) - list.indexOf(s) + 1; //count it

    map.put(s, count); // add it

    i = list.lastIndexOf(s) + 1; // skip to the next string

}

我认为这是一个优雅的解决方案，但我不知道它的性能如何。如果你想对它进行排序，请使用 TreeMap，但这真的很慢。

之后你可以这样排序：

TreeMap<String, Integer> sortedMap = new TreeMap<String, Integer>(unsortedMap);

但请注意，将 Integer 作为密钥不起作用！因为 a 键是唯一的，如果 a 和 b 出现一次，a 将被踢出！

【讨论】：

好主意，我没想到！
我考虑使用整数作为键，并使用字符串数组/列表作为具有该整数频率的每个字符串的值。您必须将其从一个列表中删除并将其添加到另一个列表中，但我不知道这有多有效
如果你知道这有多快，你能告诉我吗？我很好奇
我将它与上面的答案进行比较，并让你知道一个长字符串数组
Arraylist.indexOf() 线性搜索。所以你的算法顺序变成O(nlogn+n^2) => O(n^2)。这可以改进。

【解决方案4】：

如果第三方库是公平的游戏，那么使用 Guava 的以下单行代码是渐近最优的：

Multisets.copyHighestCountFirst(ImmutableMultiset.copyOf(array))
   .elementSet().toArray(new String[0]);

【讨论】：

引用 elementSet() 文档：“未指定元素集中元素的顺序”。虽然上面的代码有效，但更安全的选择是这样的：Multisets.copyHighestCountFirst(ImmutableMultiset.copyOf(array)).stream().distinct().collect(...)
@JacobEckel, Multisets.copyHighestCountFirst 返回一个 ImmutableMultiset，它确实具有确定性排序。（为了它的价值，我写了很多文档。）

【解决方案5】：

打印结果： 1) 按 desc 顺序排序的具有不同出现的字符串。 2) 出现相同的字符串，按 char 按 asce 顺序排序。

 public static void sortStringByOccurance(String[] stringArray) {
    // O(n)
    Map<String, Integer> map = new HashMap<>();
    for (String str : stringArray) {
        map.put(str, map.containsKey(str)? map.get(str)+1 : 1);
    }

    // O(n)
    TreeMap<Integer, TreeSet<String>> treemap = new TreeMap<>();
    for (String key : map.keySet()) {
        if (treemap.containsKey(map.get(key))) {
            treemap.get(map.get(key)).add(key);
        }
        else {
            TreeSet<String> set = new TreeSet<>();
            set.add(key);
            treemap.put(map.get(key), set);
        }
    }

    // O(n)
    Map<Integer, TreeSet<String>> result = treemap.descendingMap();
    for (int count : result.keySet()) {
        TreeSet<String> set = result.get(count);
        for (String word : set) {
            System.out.println(word + ":" + count);
        }
    }
}

【讨论】：

您的第二个循环是 O(n log n)，而不是 O(n)，因为每个 TreeMap 操作（实现为 RB 树）只能保证在 O(log n) 中运行。跨度>

【解决方案6】：

用最少的代码行是可能的：

String[] s = {"x", "y", "z", "x", "x", "y", "a"};
HashMap<String,Integer> hm = new HashMap<String,Integer>();
for(int i=0;i<s.length;i++){
    int count = hm.containsKey(s[i]) ? hm.get(s[i]) : 0;
    hm.put(s[i], count + 1);            
}

【讨论】：

【解决方案7】：

另一种解决方案：

String[] s = {"x", "y", "z", "x", "x", "y", "a"};
HashMap<String,Integer> hm = new HashMap<String,Integer>();

for(int i=0;i<s.length;i++){
    hm.putIfAbsent(s[i], 0);
    hm.put(s[i], hm.get(s[i]) + 1);
}
System.out.println(hm);

【讨论】：