如何计算 ArrayList 中的唯一值？答案

【问题标题】：How to Count Unique Values in an ArrayList?如何计算 ArrayList 中的唯一值？
【发布时间】：2012-09-25 01:26:38
【问题描述】：

我必须使用 Java 计算文本文档中唯一单词的数量。首先，我必须去掉所有单词中的标点符号。我使用Scanner 类扫描文档中的每个单词并输入一个字符串ArrayList。

所以，下一步就是我遇到问题的地方！如何创建一个可以计算数组中唯一字符串数量的方法？

例如，如果数组包含apple、bob、apple、jim、bob；此数组中唯一值的数量为 3。

public countWords() {
    try {
        Scanner scan = new Scanner(in);
        while (scan.hasNext()) {
            String words = scan.next();
            if (words.contains(".")) {
                words.replace(".", "");
            }
            if (words.contains("!")) {
                words.replace("!", "");
            }
            if (words.contains(":")) {
                words.replace(":", "");
            }
            if (words.contains(",")) {
                words.replace(",", "");
            }
            if (words.contains("'")) {
                words.replace("?", "");
            }
            if (words.contains("-")) {
                words.replace("-", "");
            }
            if (words.contains("‘")) {
                words.replace("‘", "");
            }
            wordStore.add(words.toLowerCase());
        }
    } catch (FileNotFoundException e) {
        System.out.println("File Not Found");
    }
    System.out.println("The total number of words is: " + wordStore.size());
}

【问题讨论】：

有什么限制可以使用或不能使用吗？
不，他们没有限制！

标签： java string arraylist unique

【解决方案1】：

你可以使用 Set 吗？如果是这样，您HashSet 可能会解决您的问题。 HashSet 不接受重复。

HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();

size() 方法返回唯一词的数量。

如果你必须真的只使用ArrayList，那么实现的一种方法可能是，

1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList

【讨论】：

是的，我可以使用 HashSet。你能告诉我如何使用HashSet吗？
我不必只使用 ArrayList，我可以使用任何有效的方法。我可以创建一个新的 HashSet 并添加 ArrayList 中的所有字符串值吗？
是的，您可以（或）直接将元素添加到 Set，这样您甚至不需要 ArrayList。

【解决方案2】：

从 Java 8 开始，您可以使用Stream：

在ArrayList 中添加元素后：

long n = wordStore.stream().distinct().count();

它将您的 ArrayList 转换为流，然后只计算不同的元素。

【讨论】：

【解决方案3】：

我建议使用HashSet。这会在调用add 方法时自动过滤重复项。

【讨论】：

【解决方案4】：

虽然我认为集合是最简单的解决方案，但您仍然可以使用原始解决方案，只需添加一个 if 语句来检查值是否已存在于列表中，然后再进行添加。

if( !wordstore.contains( words.toLowerCase() )
   wordStore.add(words.toLowerCase());

那么你列表中的单词数就是唯一单词的总数（即：wordStore.size()）

【讨论】：

感谢您的帮助！ - HashSet 不是更高效吗，因为它默认不允许以前的值。
绝对应该。但是，我想为您提供一个不会导致您更改现有代码的选项。真的，你只是错过了一个“if”语句。

【解决方案5】：

此通用解决方案利用了 Set 抽象数据类型不允许重复的事实。 Set.add() 方法特别有用，因为它返回一个布尔标志，指示“添加”操作是否成功。 HashMap 用于跟踪每个原始元素的出现。该算法可适用于此类问题的变体。该解决方案产生 O(n) 性能..

public static void main(String args[])
{
  String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
  System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}

public static HashMap<String, Integer> duplicates(String arr[])
{

    HashSet<String> distinctKeySet = new HashSet<String>();
    HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();

    for(int i = 0; i < arr.length; i++)
    {
        if(distinctKeySet.add(arr[i]))
            keyCountMap.put(arr[i], 1); // unique value or first occurrence
        else
            keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
    }     

    return keyCountMap; 
}

结果：

RAW: [abc, def, mno, xyz, pqr, xyz, def] ;已处理：{pqr=1, abc=1, def=2, xyz=2, mno=1}

【讨论】：

你真的在引用什么吗？如果不是，请不要使用引号格式。如果你引用了一些东西，你需要正确地注明它。
这个 4 年前的问题已经有了使用 HashSet 的 O(1) 性能的答案。您计算字符串数组中单词出现次数的算法没有回答 OP 的问题（您没有计算 ArrayList 中的唯一值）；它也没有改进当前的解决方案。也许你误解了这个问题？
感谢您的反馈。我为混乱道歉。我只是想分享一个计算数组中不同元素的解决方案，我认为这很有趣/不同，并且可能对将来可能正在研究类似问题的解决方案的其他人有用。我可能应该将解决方案添加到更合适的线程中。

【解决方案6】：

您也可以创建 HashTable 或 HashMap。键将是您的输入字符串，值将是该字符串在您的输入数组中出现的次数。 O(N) 时间和空间。

解决方案2：

对输入列表进行排序。相似的字符串将彼此相邻。比较 list(i) 和 list(i+1) 并计算重复的数量。

【讨论】：

【解决方案7】：

简而言之，您可以按以下方式进行操作...

    ArrayList<String> duplicateList = new ArrayList<String>();
    duplicateList.add("one");
    duplicateList.add("two");
    duplicateList.add("one");
    duplicateList.add("three");

    System.out.println(duplicateList); // prints [one, two, one, three]

    HashSet<String> uniqueSet = new HashSet<String>();

    uniqueSet.addAll(duplicateList);
    System.out.println(uniqueSet); // prints [two, one, three]

    duplicateList.clear();
    System.out.println(duplicateList);// prints []


    duplicateList.addAll(uniqueSet);
    System.out.println(duplicateList);// prints [two, one, three]

【讨论】：

就个人而言，我不明白为什么我会使用你的速记方法。我可以创建循环以在 HashSet 中添加字符串值；默认情况下，HashSet 不允许使用以前的值。
这里我已经提到了提取数组列表的唯一值。认为速记方法使用起来更方便。但选择最好的方法是您的偏好... :)

【解决方案8】：

public class UniqueinArrayList {

    public static void main(String[] args) { 
        StringBuffer sb=new StringBuffer();
        List al=new ArrayList();
        al.add("Stack");
        al.add("Stack");
        al.add("over");
        al.add("over");
        al.add("flow");
        al.add("flow");
        System.out.println(al);
        Set s=new LinkedHashSet(al);
        System.out.println(s);
        Iterator itr=s.iterator();
        while(itr.hasNext()){
            sb.append(itr.next()+" ");
        }
        System.out.println(sb.toString().trim());
    }

}

【讨论】：

【解决方案9】：

3 种不同的可能解决方案：

按照上面的建议使用 HashSet。

创建一个临时的ArrayList 并只存储唯一的元素，如下所示：

public static int getUniqueElement(List<String> data) {
    List<String> newList = new ArrayList<>();
    for (String eachWord : data)
    if (!newList.contains(eachWord))
        newList.add(eachWord);
    return newList.size();
}

Java 8 解决方案

long count = data.stream().distinct().count();

【讨论】：

我强烈建议不要使用方法 2。与方法 1 和 3 相比，它的效率非常低，尤其是当列表变得更大时。方法 2 是 O(n^2)，而方法 1 和 3 是 O(n)。这是因为对newList.contains 的调用是 O(n)，并且该调用本身在一个也是 O(n) 的循环中，因此总体复杂度为 O(n^2)。