过滤值不区分大小写的字符串集合答案

【问题标题】：filter collection of strings with values case insensitive过滤值不区分大小写的字符串集合
【发布时间】：2021-02-08 12:45:51
【问题描述】：

如何使用值过滤我的集合以检查是否相等，忽略大小写。

示例：我有

["Value1", "vALue1", "vALue2", "valUE2"]

我需要

["Value1", "vALue2"]

任何解决方案都会很好。

例如，如果我已经有一个等于忽略大小写的字符串，我可以禁止添加新字符串

或者我可以收集并过滤它以摆脱等于忽略大小写的字符串

【问题讨论】：

如果我找到Value1和VALUE1，我应该保留哪一个？？？？
换句话说，您希望每组不区分大小写相等的值的第一个？
是的，我想保留我在收藏中找到的第一个值（如果还有其他值等于忽略大小写）
这应该可以解决您的问题stackoverflow.com/questions/51552616/…

标签： java collections java-stream

【解决方案1】：

这似乎不是（很容易）单独使用 Streams¹，但您可以在 Set（O(1) 查找）和 filter 中跟踪已经看到的元素元素的小写形式是否已经在该集合中（Set.add 将返回 false）。

List<String> values = List.of("Value1", "vALue1", "vALue2", "valUE2");
Set<String> seen = new HashSet<>();
List<String> res = values.stream().filter(s -> seen.add(s.toLowerCase()))
                                  .collect(Collectors.toList());
System.out.println(res);  // [Value1, vALue2]

¹⁾ 例如，distinct 不接受映射函数，Collectors.groupingBy 可能不保留顺序。

【讨论】：

为什么不使用seen = new TreeSet<>(String.CASE_INSENSITIVE_ORDER); 并删除昂贵的toLowerCase() 调用...

【解决方案2】：

一些提供distinctBy 功能的Java 库可用于解决此任务。

例如，StreamEx library (GitHub, Maven Repo)，代表 Stream API 的扩展，可以这样使用：

import java.util.*;
import one.util.streamex.*;

public class MyClass {
    public static void main(String args[]) {
        String[] data = {
            "Value1", "vALue1", "vALue2", "valUE2"
        };
        List<String> noDups = StreamEx.of(data)
                .distinct(String::toLowerCase)
                .toList();
        System.out.println(noDups);
    }
}

输出：

[Value1, vALue2]

【讨论】：

【解决方案3】：

这是一个关于它如何工作的最小代码示例：

String[] array = new String[] {"Value1", "vALue1", "vALue2", "valUE2"};
ArrayList<String> finalArray = new ArrayList<>();
for(String entry : array) {
    boolean alreadyContained = finalArray.stream().anyMatch(entry::equalsIgnoreCase);
    if(!alreadyContained) {
        finalArray.add(entry);
    }
}

基本上，您创建一个包含所有非重复条目的 ArrayList。对于每个条目，检查它是否已经包含在 ArrayList 中（忽略大小写），否则添加它。

【讨论】：

【解决方案4】：

编辑

这是tobias_k 的response 的通用示例：

import java.util.*;
import java.util.stream.Collectors;

public class ArrayUtils {
    public static void main(String[] args) {
        List<String> values = List.of("Value1", "vALue1", "vALue2", "valUE2");
        List<String> deduped = dedupeCaseInsensitive(values);

        System.out.println(deduped);  // [Value1, vALue2]
    }

    /* Higher-order function */
    public static List<String> dedupeCaseInsensitive(List<String> collection) {
        return dedupeWith(collection, String.CASE_INSENSITIVE_ORDER);
    }

    public static <E> List<E> dedupeWith(List<E> list, Comparator<E> comparator) {
        Set<E> seen = new TreeSet<>(comparator);
        return list.stream().filter(s -> seen.add(s)).collect(Collectors.toList());
    }
}

原始编辑

这是一个流版本：

import java.util.*;
import java.util.function.Function;
import java.util.stream.Collectors;

public class ArrayUtils {
    public static void main(String[] args) {
        String[] items = {"Value1", "vALue1", "vALue2", "valUE2"};
        String[] result = dedupeCaseInsensitive(items);

        // Print the resulting array.
        System.out.println(Arrays.toString(result));
    }

    public static String[] dedupeCaseInsensitive(String[] items) {
        return Arrays.stream(items)
            .collect(Collectors.toMap(
                String::toLowerCase,
                Function.identity(),
                (o1, o2) -> o1,
                LinkedHashMap::new))
            .values()
            .stream()
            .toArray(String[]::new);
    }
}

原始回复

您可以通过填充 Map、获取其值并对它们进行排序来使用不区分大小写的逻辑进行重复数据删除。

import java.util.*;

public class ArrayUtils {
    public static void main(String[] args) {
        String[] items = {"Value1", "vALue1", "vALue2", "valUE2"};
        String[] result = dedupeCaseInsensitive(items);

        // Print the resulting array.
        System.out.println(Arrays.toString(result));
    }

    public static String[] dedupeCaseInsensitive(String[] items) {
        Map<String, String> map = new HashMap<String, String>();

        // Filter the values using a map of key being the transformation,
        // and the value being the original value.
        for (String item : items) {
            map.putIfAbsent(item.toLowerCase(), item);
        }

        List<String> filtered = new ArrayList<>(map.values());

        // Sort the filtered values by the original positions.
        Collections.sort(filtered,
                Comparator.comparingInt(str -> findIndex(items, str)));

        return collectionToArray(filtered);
    }

    /* Convenience methods */

    public static String[] collectionToArray(Collection<String> collection) {
        return collection.toArray(new String[collection.size()]);
    }

    public static int findIndex(String arr[], String t) {
        return Arrays.binarySearch(arr, t);
    }
}

如果您使用LinkedHashMap，则不需要排序，因为项目保留其插入顺序。

import java.util.*;

public class ArrayUtils {
    public static void main(String[] args) {
        String[] items = {"Value1", "vALue1", "vALue2", "valUE2"};
        String[] result = dedupeCaseInsensitive(items);

        // Print the resulting array.
        System.out.println(Arrays.toString(result));
    }

    public static String[] dedupeCaseInsensitive(String[] items) {
        Map<String, String> map = new LinkedHashMap<String, String>();

        // Filter the values using a map of key being the transformation,
        // and the value being the original value.
        for (String item : items) {
            map.putIfAbsent(item.toLowerCase(), item);
        }

        return collectionToArray(map.values());
    }

    /* Convenience methods */

    public static String[] collectionToArray(Collection<String> collection) {
        return collection.toArray(new String[collection.size()]);
    }
}}

【讨论】：

您仍然可以接受Collection<E> 作为输入，从而使该方法可用于更多情况。顺便说一句，.filter(s -> seen.add(s))也可以写成.filter(seen::add)。事实上，由于:: 左边的部分只被评估一次并捕获结果，你甚至可以写.filter(new TreeSet<>(comparator)::add)，它会做预期的事情。