LinkedHashSet 上的迭代是否比 ArrayList 上的迭代更快答案

【问题标题】：Is iteration over LinkedHashSet faster than iteration over ArrayListLinkedHashSet 上的迭代是否比 ArrayList 上的迭代更快
【发布时间】：2017-12-21 21:19:39
【问题描述】：

我想迭代 2 个集合，每个集合大约 600 条记录。我想将集合 1 的每个元素与集合 2 中的所有其他元素进行比较。如果我选择我的集合为 LinkedHashSet，那么我必须在每个集合上调用迭代器并有两个 while（内部和外部）循环。而对于 ArrayList 的选择，我将有两个 for 循环（内部和外部）从每个集合中读取数据。

我选择LinkedHashSet主要是因为我读到LinkedHashSet有更好的性能，我也更喜欢用set来删除重复，但是看到它运行得很慢，大约需要2个小时才能完成，我想也许它会更好复制设置为 ArrayList，然后迭代 ArrayList 而不是 LinkedHashSet。我想知道哪一个有更好的选择来加快运行时间。

public ArrayList> processDataSourcesV2(LinkedHashMap> ppmsFinalResult,LinkedHashMap> productDBFinalResult) { //每个参数都是一个hashmap，包含key(id)和value（唯一参数集） ArrayList> 结果 = 新的 ArrayList>();

  Iterator<Entry<RecordId, LinkedHashSet<String>>> ppmsIterator = ppmsFinalResult.entrySet().iterator();
  Iterator<Entry<RecordId, LinkedHashSet<String>>> productIdIterator =null;
  //pair of id from each list
  ArrayList<Pair> listOfIdPair = new ArrayList<Pair>();
  while (ppmsIterator.hasNext()) {
      //RecordId object is an object containing the id and which list this id belongs to
      Entry<RecordId, LinkedHashSet<String>> currentPpmsPair = ppmsIterator.next();
      RecordId currentPpmsIDObj = currentPpmsPair.getKey(); 
      //set of unique string
      LinkedHashSet<String> currentPpmsCleanedTerms = (LinkedHashSet<String>)currentPpmsPair.getValue();
      productIdIterator = productDBFinalResult.entrySet().iterator();

      while (productIdIterator.hasNext()) {

          Entry<RecordId, LinkedHashSet<String>> currentProductDBPair = productIdIterator.next();
          RecordId currentProductIDObj = currentProductDBPair.getKey();
          LinkedHashSet<String> currentProductCleanedTerms = (LinkedHashSet<String>)currentProductDBPair.getValue();
          ArrayList<Object> listOfRowByRowProcess = new ArrayList <Object>();
          Pair currentIDPair = new Pair(currentPpmsIDObj.getIdValue(),currentProductIDObj.getIdValue());              
          //check for duplicates 
          if ((currentPpmsIDObj.getIdValue()).equals(currentProductIDObj.getIdValue()) || listOfIdPair.contains(currentIDPair.reverse()) ) {
              continue;
          }
          else {
              LinkedHashSet<String> commonTerms = getCommonTerms(currentPpmsCleanedTerms,currentProductCleanedTerms);
              listOfIdPair.add(currentIDPair.reverse());
              if (commonTerms.size()>0) {
                  listOfRowByRowProcess.add(currentPpmsIDObj);
                  listOfRowByRowProcess.add(currentProductIDObj);
                  listOfRowByRowProcess.add(commonTerms);

                  result.add(listOfRowByRowProcess); 
              }
          }

      }


  }

  return result;
}



 public LinkedHashSet<String> getCommonTerms(LinkedHashSet<String> setOne, LinkedHashSet<String> setTwo){
     Iterator<String> setOneIt = setOne.iterator();
     LinkedHashSet<String> setOfCommon = new LinkedHashSet<String>();
     //making hard copy
     while (setOneIt.hasNext()) {
         setOfCommon.add(setOneIt.next());
     }
     setOfCommon.retainAll(setTwo);
     return setOfCommon;
 }

【问题讨论】：

首先，除非您进行了数十亿次的比较，否则您不会看到两者之间的任何性能差异。其次，如果您根据要比较的属性构建集合，则不必执行嵌套循环。您的问题非常不清楚，因此您应该显示要使用的对象的类定义并解释“比较”的含义。然后展示你已经编写的代码。
您是否只想找到集合 1 中也在集合 2 中的所有元素？或者可能是集合 1 中所有不在集合 2 中的元素？
@Bohemian：是的，没错
@user1836957 那是哪一个：当他们在两个或当他们只在一个时？
@Bohemian，我想找到集合 1 中也包含在集合 2 中的所有元素

标签： java big-o

【解决方案1】：

数组在迭代时比任何其他结构都快（所有元素都按顺序存储在内存中），另一方面，在删除和插入元素时它更慢，因为它必须确保顺序存储。迭代链表比较慢，因为你可能会遇到页面错误......所以这取决于你选择哪一个。

【讨论】：

【解决方案2】：

如果您想找出两个集合中都有哪些元素，请将其中一个设为Set 并获取其与另一个集合的交集：

Collection<T> collection1, collection2; // given these

Set<T> intersection = new HashSet<T>(collection1);
intersection.retainAll(collection2);

这将在 O(n) 时间内执行，其中 n 是 collection2 的大小，因为在 HashSet 中查找元素是在恒定时间内执行的。

我的猜测是您正在检查 collection1 的每个元素和 collection2 的每个元素，其时间复杂度为 O(n²)。

【讨论】：