查找重复顶点 Obj、Colladae 文件的更快方法答案

【问题标题】：A faster way to look for duplicate vertices Obj,Colladae files查找重复顶点 Obj、Colladae 文件的更快方法
【发布时间】：2021-04-05 22:31:18
【问题描述】：

我已阅读 this 关于正确加载纹理坐标的帖子，一切正常，但问题在于速度。

基本上，这个想法是寻找一个先前处理过的顶点，它可能具有与当前正在处理的顶点完全相同的属性值，如果存在这样的顶点，则将该顶点索引值用于您的 indexBuffer 并继续。非常简单的概念和实现，这就是我的做法

class Vertex
{
 //The index values read from a file either by processing the f attribute in obj files or the <p> attribute for meshes in colladae files
 private final int
 vertexIndex,
 texCoordIndex,
 normalIndex,
 colorIndex;

 //The actual values for each attribute used in the mesh
 private final Vector3f
 vertex=new Vector3f(),
 normal=new Vector3f(),
 color=new Vector3f();
 private final Vector2f texCoord=new Vector2f();

 @Override
 public boolean equals(Object obj)//The key method used for finding duplicate vertices from the list
 {
  Vertex v=(Vertex)obj;

  //Check if every attribute of both are same
  return    this.vertexIndex==v.VertexIndex
         && this.texCoordIndex==v.texCoordIndex
         && this.normalIndex==v.normalIndex
         && this.colorIndex==v.colorIndex;
 }
}

最后我们有一个数组列表

ArrayList<Vertex> vertices=new ArrayList();

对于从文件中读取的每个顶点，这个想法很简单

Vertex newVertex=readFromFile();

int prev=vertices.indexOf(newVertex);
if(prev!=-1)//We have found an similar vertex from before so use that 
{
 indexBuffer.add(prev); //the indexBuffer will use the vertex at that position
}
else
{
 vertices.add(newVertex); //Add  new vertex
 indexBuffer.add(vertices.size()-1);//New Vertex Index is simply the index of last element in the list
}

虽然这会产生正确的结果，但问题在于性能，因为对于添加的每个第 n 个顶点，我们都必须执行“线性搜索！！！”在之前添加的 n-1 个顶点上找到我们的重复顶点，这很糟糕，因为我花了 7 秒来加载 Standford 龙模型，但如果我完全放弃查找过程并只使用重复项，则只需 1.5 秒。

我想到的一个优化是因为我使用 java 是利用 java 14 的并行流的力量来寻找这样的重复。

Optional<Vertex> possibleDuplicate=vertices.stream()
                                           .parallel()
                                           .filter(prev->prev.equals(newVertex))
                                           .findFirst();

但这是一个更糟糕的想法，因为我现在需要 12 秒才能加载。一个可能的原因可能是为每个要处理的 newVertex 生成 100 个线程是一个巨大的开销。

他在帖子中提到，他在已排序的顶点上使用二进制搜索来更快地查找重复项，但对此有一些疑问

当顶点有多个属性时，我根据什么属性对顶点进行排序？

对 ArrayList 进行二分搜索的一种方法是使用内置的 collections 框架，但我如何告诉比较器一个顶点是否小于还是比其他的更大？

对于大型模型，它变得如此缓慢，我不得不让用户选择使用标志消除重复。

有没有更好的办法？

【问题讨论】：

标签： java duplicates vertex

【解决方案1】：

搜索顶点列表会非常慢，不确定大 O 表示会是什么，但我想它不会很漂亮。

改为使用某种散列机制来查找现有顶点 - 这是我实现的代码片段，用于从包含重复顶点的 OBJ 文件构建模型：

public static class IndexedBuilder extends Builder {
    private final List<Integer> index = new ArrayList<>();
    private final Map<Vertex, Integer> map = new HashMap<>();

    @Override
    public IndexedBuilder add(Vertex vertex) {
        // Lookup existing vertex index
        final Integer prev = map.get(vertex);

        // Add new vertices
        if(prev == null) {
            // Add new vertex
            final int next = index.size();
            index.add(next);
            map.put(vertex, next);
            super.add(vertex);
        }
        else {
            // Existing vertex
            index.add(prev);
        }

        return this;
    }
}

map 本质上是一个带有相关索引的顶点表。

对每个顶点执行的唯一工作是计算哈希码，这将比搜索快得多（而且要简单得多）。

编辑：显然这需要你在你的顶点类上实现一个像样的哈希码实现，像这样：

class Point {
    public final float x, y, z;

    @Override
    public int hashCode() {
        return Objects.hash(x, y, z);
    }
}

// Similar for normals & texture coordinates

class Vertex {
    private final Point point;
    private final Vector normal;
    private final TextureCoordinate coords;

    @Override
    public int hashCode() {
        return Objects.hash(point, normal, coords);
    }
}

【讨论】：

很好的建议，但就像你说的那样，哈希函数本身也可能是另一个问题。我如何取 4 个整数值并计算一个唯一的哈希值，它不仅在值上而且在顺序上也不同提供了哪些值？例如 1,2,3,4 和 1,3,2,4 必须输出不同的哈希值，因为第一个顶点引用位置 1 和法线 2，而第二个顶点引用位置 1 和法线 3 只讨论前 2 个属性。知道什么好的算法吗？
假设您的顶点实现由点、法线、纹理坐标等组成，那么每个顶点都可以使用docs.oracle.com/javase/8/docs/api/java/util/… 计算哈希码，并且顶点对该顶点中的每个“组件”执行相同的操作.这个内置函数委托给一个隐式考虑“顺序”的方法。对答案添加了一个小编辑来说明。
如果我们只使用从文件中读取的每个属性的整数索引值会更快，但我会尝试两者并让你知道
在我发布的代码中计算（而不是使用文件中的实际索引值）的原因是，在我尝试的几乎所有 OBJ 模型中index 值是重复的来源！通常它们只是一个增量值，坦率地说可能会被忽略。但有兴趣听听您从实验中得出的结论。