【问题标题】：Array algorithm too slow数组算法太慢
【发布时间】：2020-12-13 23:39:02
【问题描述】：

我编写了一个算法，可以在一个巨大的对象数组结构中找到一个六边形的每一行。

数组包含大约 80.000 - 100.000 个元素（从开始到结束的线坐标）。

六边形由 6 个线点组成。所以这个数组有大约 15000 个六边形的信息。

对象的结构（未排序！！！）如下所示：

const stamps = [
  { 
    vertices: [
      {x: 114.5116411118, y: 342.9815785601},
      {x: 115.6663416502, y: 344.9815785601}
    ]
  },
  {
    vertices: [
      {x: 115.6663416502, y: 340.9815785601},
      {x: 114.5116411118, y: 342.9815785601}
    ]
  },
  {
    vertices: [
      {x: 122.6663416502, y: 364.9815785601},
      {x: 147.9757427269, y: 314.9815785601},
    ]
  },
  {
    vertices: [
      {x: 117.9757427269, y: 340.9815785601},
      {x: 115.6663416502, y: 340.9815785601},
    ]
  },
  {
    vertices: [
      {x: 119.1304432653, y: 342.9815785601},
      {x: 117.9757427269, y: 340.9815785601},
    ]
  },
  {
    vertices: [
      {x: 117.9757427269, y: 344.9815785601},
      {x: 119.1304432653, y: 342.9815785601},
    ]
  },
  {
    vertices: [
      {x: 115.6663416502, y: 344.9815785601},
      {x: 117.9757427269, y: 344.9815785601},
    ]
  },
];

要找到每条线六边形，我的想法是必须有 2 个元素具有相同的坐标。如果是这种情况，我将跳转到该元素的索引并重复该过程，直到我拥有六边形的所有 6 行。

它是这样工作的，但它真的非常慢。对于包含 80.000 个元素的数组，大约需要 3 分钟。

算法：

function findHexPolyPoints() {
  const hexCoordinates = [];
  let activeArrayPos = 0;
  let position = 0;
  while (1) {
    let foundPair = false;
    if (stamps.length < 6) break;
    for (let k = 0; k < stamps.length; ++k) {
      if (k === position) continue;
      if (stamps[position].vertices[0].x === stamps[k].vertices[1].x && stamps[position].vertices[0].y === stamps[k].vertices[1].y) {
        if (hexCoordinates[activeArrayPos]) {
          hexCoordinates[activeArrayPos].push(stamps[k].vertices[0].x, stamps[k].vertices[0].y);
        } else {
          hexCoordinates.push([stamps[position].vertices[0].x, stamps[position].vertices[0].y, stamps[k].vertices[0].x, stamps[k].vertices[0].y]);
        }
        foundPair = true;
      } else if (stamps[position].vertices[1].x === stamps[k].vertices[0].x && stamps[position].vertices[1].y === stamps[k].vertices[0].y) {
        if (hexCoordinates[activeArrayPos]) {
          hexCoordinates[activeArrayPos].push(stamps[k].vertices[1].x, stamps[k].vertices[1].y);
        } else {
          hexCoordinates.push([stamps[position].vertices[1].x, stamps[position].vertices[1].y, stamps[k].vertices[1].x, stamps[k].vertices[1].y]);
        }
        foundPair = true;
      }
      if (foundPair) {
        stamps.splice(position, 1);
        if (k > position) {
          position = k - 1;
        } else {
          position = k;
        }
        if (hexCoordinates[activeArrayPos].length < 12) break;
      }
      if (hexCoordinates[activeArrayPos] && hexCoordinates[activeArrayPos].length === 12) {
        if (k > position) stamps.splice(k - 1, 1);
        else stamps.splice(k, 1);
        activeArrayPos += 1;
        position = 0;
        break;
      }
      if (k === stamps.length - 1) {
        stamps.splice(position, 1);
        break;
      }
    }
  }
  sortHexagons(hexCoordinates);
}

有什么方法可以加快我的算法？我读过一个简单的 for 循环仍然比一些 js 排序函数（如 .map .filter 或类似函数）更快。

【问题讨论】：

那么什么查询包含-line？协调？答案应该是什么？
sortHexagons 是做什么的？另外，所有的六边形都有相同的顶点坐标吗？
@WillJenkins 这几乎无关紧要。之后我只是对数组进行排序。
当我向下滚动时，哇，我不想阅读你的代码。圈复杂度非常高
再次...六边形是否具有相同的顶点坐标，如果不是，有多少个不同的集合？

标签： javascript algorithm performance for-loop

【解决方案1】：

算法

由于这个算法是O(n^2)，n很大，最大的改进是重新思考算法。如果我正确理解了这个问题，一种返工方法是：

将列表分成两个列表，一个用于行的每个点
将索引添加到另一个列表中的对应点，在两个列表中
按 xy 值对列表进行排序，保持索引更新
循环遍历点，使用二分查找查找其他列表中是否有可用点
根据需要标记（不删除以免与索引混淆）点，循环时跳过这些点

这可能会产生最大的加速，因为准备（排序）可以在rather quickly 完成。

一般

一些一般性能改进

stamps[position].vertices[0/1].x/y 正在为内部循环的每次迭代读取。由于它没有在循环中设置，因此您可以通过在开始时执行此操作来节省大量内存访问。

while (1) {
    let foundPair = false;
    if (stamps.length < 6) break;
    // current positions vertices, only needs access once per loopthrough
    const pv0x = stamps[position].vertices[0].x;
    const pv0y = stamps[position].vertices[0].y;
    const pv1x = stamps[position].vertices[1].x;
    const pv1y = stamps[position].vertices[1].y;

同样使用要测试的变量，这显然不会产生很大的影响，但一致性很好：

 for (let k = 0; k < stamps.length; ++k) {
   if (k === position) continue;
   const sv0x = stamps[k].vertices[0].x;
   const sv0y = stamps[k].vertices[0].y;
   const sv1x = stamps[k].vertices[1].x;
   const sv1y = stamps[k].vertices[1].y;

调整数组大小很昂贵，应该避免 in 循环，简单的替换

if (pv0x === sv1x && pv0y === sv1y) {
    if (arrayPos > 0) {
      hexCoordinates[activeArrayPos][arrayPos++] = sv0x;
      hexCoordinates[activeArrayPos][arrayPos++] = sv0y;
    } else {
      hexCoordinates[activeArrayPos][arrayPos++] = pv0x;
      hexCoordinates[activeArrayPos][arrayPos++] = pv0y;
      hexCoordinates[activeArrayPos][arrayPos++] = sv0x;
      hexCoordinates[activeArrayPos][arrayPos++] = sv0y;
    }
...
    if (arrayPos === 12) {
        stamps.splice(k > position ? k - 1 : k, 1);
        activeArrayPos += 1;
        // Also at the start of the loop hexCoordinates[0] = new Float32Array(12);
        hexCoordinates[activeArrayPos] = new Float32Array(12);
        arrayPos = 0;

更高效的方法是在开始时创建一个大型类型数组，使用hexCoordinate[activeArrayPos*12 + arrayPos] 访问它，但这会改变输出形式

出于同样的原因，使用查找const removedLine= Array(stamps.length).fill(0)，并将其设置为 1 在要删除的图章索引上，而不是拼接它（这里您还需要保留剩余数量的计数器）。基本上

let k = 0;
while(idx < nRemainingStamps) {
    if(removedLine[k++]) continue
    ...rest of loop
    idx++;
}

【讨论】：

【解决方案2】：

以下 O(n) 算法假设

两个不同的六边形没有重叠的顶点
数组中相同的顶点不超过两个（意思是，可能有一个不属于任何六边形的孤儿，但其坐标不应等于任何六边形顶点）
坐标中不存在浮点误差（意味着两个应该相等的顶点，=== 完全相等）
假设 6 个连接的顶点形成一个六边形...（没有重心计算和检查以确保它实际上是一个六边形）

（如果点 1. 和 2. 不正确，则算法需要更多工作，以尝试所有可能性（在overt[x_y] 数组中，见下文）以避免非六边形或重叠坐标，根据寻找重叠六边形或孤儿的期望，复杂性可能会超过 O(n))

使用map的概念（从键中获取对象值），这被认为是O(1)。

为了方便使用顶点坐标，我们可以将x和y拼接成一个字符串

x:123.456, y:789.321

给予

x_y = "123.456_789.321"

让我们创建 3 个变量avert = []、overt = {}、hexas = []

avert 是所有顶点的数组，avert[index] 是 x_y 顶点坐标的数组(2)
overt 是一个对象，对于每个 x_y 坐标给出avert 中的索引数组（大小不应超过 2，如上所述（并且没有 >2 检查））
hexas 是array(6) 的数组，找到的六边形列表（每个坐标的格式为x_y）

在第一个 forEach、avert 和 overt 中创建。

下一个forEach 处理所有avert 顶点[x_y1, x_y2]

从第一个顶点开始，尝试找到6个点组成一个六边形
将每个顶点添加到一个六边形数组hexa，从下一个开始（在第一个之后）
假设坐标没有排序，因此确保我们不会回到之前的顶点
跳过使用的顶点（六边形）
确保找到的最后一个顶点与origin（第一个）具有相同的坐标

初始化

let avert = [], overt = {}, hexas = [];

stamps.forEach(function(e, i){
    let xy1 = e['vertices'][0]['x']+'_'+e['vertices'][0]['y'];
    let xy2 = e['vertices'][1]['x']+'_'+e['vertices'][1]['y'];
    // overt[XY] (array) should have two elements at most (no overlapping),
    // one if orphan
    if ( ! overt[xy1]) overt[xy1] = [];
    overt[xy1].push( i );
    if ( ! overt[xy2]) overt[xy2] = [];
    overt[xy2].push( i );
    avert.push([ xy1, xy2 ]);
});

处理

avert.forEach(function (e){
    let j,coord = e[0];   // first coords x_y
    let origin = coord;
    let hexa = [];
    let lastindex = -1;   // avoid going back!
    // try to find 5 connected vertices + origin
    for(j=0 ; j<6 ; j++) {
        let o = overt[coord];
        if ( o === undefined || o.length < 2 ) {
           break; // not found(already processed), or orphan!
        }
        let index = o[0] == lastindex ? o[1] : o[0];  // no turn back!
        lastindex = index;
        coord = avert[index][0] === coord ? avert[index][1] : avert[index][0];
        hexa.push(coord);
    }
    if (j >= 6) { // found all vertices
        // check that the last coord is the starting point
        if (hexa[5] === origin) { // got it
             hexas.push( hexa );
             hexa.forEach(function(h){ // delete used vertices
                delete overt[h];
             });
        }
    }
});

所有的六边形都应该在hexas

【讨论】：

哇。我真的很震惊这个算法的速度有多快。使用我的方法大约需要 3 分钟。使用您的算法（稍作改动）只需 2-3 秒。真的，真的很棒。谢谢！
也是非常有趣的方法。这真的提升了我在编码（高性能）算法方面的知识。
感谢大卫的好心cmets。

【解决方案3】：

您可以通过使用哈希映射来避免所有数据的嵌套循环。使用散列（例如它们的 JSON 表示）对各个顶点进行键控，并将相应的 x、y 坐标与相邻对象列表一起存储。

一旦你有了它，就很容易遍历该图并识别六边形。

使用您提供的示例数据可运行 sn-p：

function findHexPolyPoints(stamps) {
    // Create graph
    let map = new Map;
    for (let {vertices} of stamps) {
        // Get unique identifier for each vertex (its JSON notation)
        let keys = vertices.map(JSON.stringify);
        // Create "nodes" for each vertex, keyed by their key
        let nodes = keys.map(function (key, i) {
            let {x, y} = vertices[i];
            let node = map.get(key);
            if (!node) map.set(key, node = { key, x, y, neighbors: [] });
            return node;
        });
        // Link these two nodes in both directions
        nodes[0].neighbors.push(nodes[1]);
        nodes[1].neighbors.push(nodes[0]);
    }
    
    // Walk through the graph to detect and collect hexagons
    let hexagons = [];
    for (let [key, vertex] of map) {
        let hexagon = [];
        while (vertex && hexagon.push(vertex) < 6) {
            vertex = vertex.neighbors.find(neighbor => !hexagon.includes(neighbor));
        }
        // Remove collected nodes so they don't get visited a second time
        for (let {key} of hexagon) map.delete(key);
        // Verify that they form indeed a hexagon:
        if (vertex && vertex.neighbors.includes(hexagon[0])) {
            // Simplify the hexagon to only coordinates (12 coordinates)
            hexagons.push(hexagon.flatMap(({x, y}) => [x, y]));
        }
    }
    return hexagons;
}

// Demo. Just replace `stamps` with your actual data.
const stamps = [{vertices: [{x: 114.5116411118, y: 342.9815785601},{x: 115.6663416502, y: 344.9815785601}]},{vertices: [{x: 115.6663416502, y: 340.9815785601},{x: 114.5116411118, y: 342.9815785601}]},{vertices: [{x: 122.6663416502, y: 364.9815785601},{x: 147.9757427269, y: 314.9815785601},]},{vertices: [{x: 117.9757427269, y: 340.9815785601},{x: 115.6663416502, y: 340.9815785601},]},{vertices: [{x: 119.1304432653, y: 342.9815785601},{x: 117.9757427269, y: 340.9815785601},]},{vertices: [{x: 117.9757427269, y: 344.9815785601},{x: 119.1304432653, y: 342.9815785601},]},{vertices: [{x: 115.6663416502, y: 344.9815785601},{x: 117.9757427269, y: 344.9815785601},]},];
let hexagons = findHexPolyPoints(stamps);
console.log(hexagons);

确实，普通的旧 for 循环比 .map、.forEach、.reduce、.find 等要快一些，但在这里我一直在使用它们，因为主要的加速确实是来自使用哈希映射。

【讨论】：