【问题标题】:Improving Duplicate Finder and For Loop Performance提高重复查找器和 For 循环性能
【发布时间】:2017-09-26 13:18:08
【问题描述】:

我制作了一个脚本,可以在对象数组中找到一个键的重复项。如果找到重复项,则将新键和值(“Duplicate”:true)添加到具有重复键值的对象中。

数据示例

{
    "Id": "1",
    "NI Number": "NG111111A",
    "Full Name": "Test Test Tester",
    "Address Line 1": "My House",
    "Address Line 2": "My Road",
    "Address Line 3": "My Suburb",
    "City / Town": "My Town",
    "Country": "United Kingdom",
    "PostCode": "",
    "Creation Date": "24 December 2014"
},
{
    "Id": "2",
    "NI Number": "NM123405C",
    "Full Name": "A Dummy",
    "Address Line 1": "Dummy 1",
    "Address Line 2": "Dummy 2",
    "Address Line 3": "Dummy 3",
    "City / Town": "Dummy 4",
    "Country": "United Kingdom",
    "PostCode": "",
    "Creation Date": "09 February 2015"
}

脚本

for (let i = 0, len = cleanedData.length; i < len; i++) {

    let foundDuplicate = false;

    if (cleanedData[i]["Duplicate"] === "false" || cleanedData[i]["Duplicate"] === undefined) {

        for (let t = i + 1, len = cleanedData.length; t < len; t++) {

            if (cleanedData[i]["NI Number"] === cleanedData[t]["NI Number"]) {
                foundDuplicate = true;
                cleanedData[t]["Duplicate"] = true;
            }

        }

        if (foundDuplicate === true) {
            cleanedData[i]["Duplicate"] = true;
        } else {
             cleanedData[i]["Duplicate"] = false;
        }
    }

}

我正在尝试在 33,000 条记录中查找重复的“NI 编号”。 NI 编号可以重复多次。该脚本当前按预期工作,但运行时间超过 70 秒。如果可能的话,我想将其缩短到 35 秒。

我是 JavaScript 新手,但从我所阅读的内容来看,使用缓存长度的 for 循环是迭代数组的快速方法。我已阅读该地图,设置可以提高性能,但我不确定如何将它们实现到我的脚本中。

有什么方法可以提高我的代码的性能吗?

【问题讨论】:

  • 你有一些问题,例如cleanedData[i]["Duplicate"] === "false" 将始终为 false,因为您将值设置为布尔值,然后与字符串进行严格比较。考虑if (!cleanedData[i].Duplicate) {...}
  • 你可以异步编写它,但是你必须把它分成两个函数,一个是回调函数,所以对象中的每个变量都会同时检查重复。
  • @RobG 谢谢,很好发现。
  • 您的脚本运行 n^2 次。一个简单的解决方案是创建一个查找变量,遍历您的数据,检查查找变量中是否存在NI Number;如果存在,则将当前项目标记为重复,否则在查找变量中添加键。这会循环 n 次。
  • @SalmanA — 第一个重复项也必须标记为重复项,因此查找还需要存储该值的第一个实例的索引,例如{"NG111111A":0,"NM123405C":1,...}.

标签: javascript arrays object for-loop duplicates


【解决方案1】:

您尚未提供任何基准测试工具,因此希望以下内容有所帮助。

您可以减少测试次数,并尽可能少地进行查找。此外,当您分配布尔值但针对字符串进行测试时,您的一些测试是错误的。希望 cmets 足够了。

// Reduce test data to minimum required, increase sample size
var cleanedData = [
  {"Id": "1","NI Number": "NG111111A"},
  {"Id": "2","NI Number": "NM123405C"}, // Duplicate
  {"Id": "3","NI Number": "NM123405D"},
  {"Id": "4","NI Number": "NM123405E"}, // Duplicate
  {"Id": "5","NI Number": "NM123405C"}, // Duplicate
  {"Id": "4","NI Number": "NM123405E"}, // Duplicate
  {"Id": "4","NI Number": "NM123405F"}, 
  {"Id": "4","NI Number": "NM123405E"}  // Duplicate
 ];

// Use var for better compatibility unless you really need let
for (var i = 0, iLen = cleanedData.length; i < iLen; i++) {
   
  // Store ref to current object for performance
  var obj = cleanedData[i];

  // If doesn't have Duplicate property, add and set to false
  // Only test if not already marked as a duplicate
  if (!('Duplicate' in obj)) {
    obj.Duplicate = false;

    // Reuse iLen, don't need to get length of array again
    for (var t = i + 1; t < iLen; t++) {

      // Store ref to test obj for performance
      var tObj = cleanedData[t];

      // Simplify assignment to both objects if duplicate found
      if (obj['NI Number'] === tObj['NI Number']) {
        obj.Duplicate = true;
        tObj.Duplicate = true;
      }
    }
  }
}

console.log(cleanedData)

如果它有助于提高性能,请告诉我。您也可以使用 forEach 之类的数组方法,但我认为 for 循环在某些主机中至少同样快而且快得多。

按照 Salman A 的建议实施索引将类似于以下内容,它简洁(嗯,比第一种方法少一行)并且可能非常快,因为它只循环一次,但它确实对 索引。

var cleanedData = [
  {"Id": "1","NI Number": "NG111111A"},
  {"Id": "2","NI Number": "NM123405C"}, // Duplicate
  {"Id": "3","NI Number": "NM123405D"},
  {"Id": "4","NI Number": "NM123405E"}, // Duplicate
  {"Id": "5","NI Number": "NM123405C"}, // Duplicate
  {"Id": "4","NI Number": "NM123405E"}, // Duplicate
  {"Id": "4","NI Number": "NM123405F"}, 
  {"Id": "4","NI Number": "NM123405E"}  // Duplicate
 ];
 
// Store NI Number value and index first found
var index = {};

for (var i=0, iLen=cleanedData.length; i<iLen; i++) {
  var obj = cleanedData[i];
  var value = obj['NI Number'];

  // If obj doesn't have Duplicate property, add as false
  if (!('Duplicate' in obj)) obj.Duplicate = false;

  // If value is in index, mark duplicates
  if (value in index) {
    obj.Duplicate = true;
    cleanedData[index[value]].Duplicate = true;
    
   // Otherwise, add value to index
  } else {
    index[value] = i;
  }
}

console.log(cleanedData);

【讨论】:

  • 您的代码将性​​能提高了近 20%,这非常好,谢谢。现在只需 50 多秒即可完成。我在我的电脑上使用 nodejs 运行它。您认为我们还能获得其他性能提升吗?附:我确实尝试过为你的答案投票,但我没有足够的代表。
  • 哇...现在需要 1 秒才能完成。我的问题解决了,再次感谢!
猜你喜欢
  • 2018-11-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-12-12
  • 1970-01-01
  • 2021-09-26
相关资源
最近更新 更多