最接近的点对 O(nlogn) 算法 - c++ 实现中的一些数据问题答案

【问题标题】：Closest pair of points O(nlogn) algorithm - problem with some data in c++ implementation最接近的点对 O(nlogn) 算法 - c++ 实现中的一些数据问题
【发布时间】：2019-02-12 17:26:58
【问题描述】：

我对寻找最近点的分而治之算法有疑问。我在此页面上检查了 C++ 实现https://www.geeksforgeeks.org/closest-pair-of-points-onlogn-implementation/ 但是这段代码有问题。在大多数情况下它工作正常，但对于某些数据，此实现返回其他结果，而不是蛮力方法。例如，我们取十个点 (x, y)：

(795 981)
(1905 4574)
(8891 665)
(6370 1396)
(93 8603)
(302 7099)
(326 5318)
(4493 3977)
(429 8687)
(9198 1558)

对于此数据，O(n log n) 算法返回 944.298 而不是暴力给出的 346.341。为什么会这样？

这正是 geeksforgeeks 用我的示例数据实现的：

#include <iostream>
#include <float.h>
#include <stdlib.h>
#include <math.h>
using namespace std;

struct Point
{
    int x, y;
};

int compareX(const void* a, const void* b)
{
    Point *p1 = (Point *)a,  *p2 = (Point *)b;
    return (p1->x - p2->x);
}

int compareY(const void* a, const void* b)
{
    Point *p1 = (Point *)a,   *p2 = (Point *)b;
    return (p1->y - p2->y);
}


float dist(Point p1, Point p2)
{
    return sqrt( (p1.x - p2.x)*(p1.x - p2.x) +
                 (p1.y - p2.y)*(p1.y - p2.y)
    );
}

float bruteForce(Point P[], int n)
{
    float min = FLT_MAX;
    for (int i = 0; i < n; ++i)
        for (int j = i+1; j < n; ++j)
            if (dist(P[i], P[j]) < min)
                min = dist(P[i], P[j]);
    return min;
}

float min(float x, float y)
{
    return (x < y)? x : y;
}


float stripClosest(Point strip[], int size, float d)
{
    float min = d;  // Initialize the minimum distance as d 

    for (int i = 0; i < size; ++i)
        for (int j = i+1; j < size && (strip[j].y - strip[i].y) < min; ++j)
            if (dist(strip[i],strip[j]) < min)
                min = dist(strip[i], strip[j]);

    return min;
}

float closestUtil(Point Px[], Point Py[], int n)
{
    // If there are 2 or 3 points, then use brute force 
    if (n <= 3)
        return bruteForce(Px, n);

    // Find the middle point 
    int mid = n/2;
    Point midPoint = Px[mid];


    Point Pyl[mid+1];   // y sorted points on left of vertical line 
    Point Pyr[n-mid-1];  // y sorted points on right of vertical line 
    int li = 0, ri = 0;  // indexes of left and right subarrays 
    for (int i = 0; i < n; i++)
    {
        if (Py[i].x <= midPoint.x)
            Pyl[li++] = Py[i];
        else
            Pyr[ri++] = Py[i];
    }

    float dl = closestUtil(Px, Pyl, mid);
    float dr = closestUtil(Px + mid, Pyr, n-mid);

    // Find the smaller of two distances 
    float d = min(dl, dr);

    Point strip[n];
    int j = 0;
    for (int i = 0; i < n; i++)
        if (abs(Py[i].x - midPoint.x) < d)
            strip[j] = Py[i], j++;

    return min(d, stripClosest(strip, j, d) );
}

float closest(Point P[], int n)
{
    Point Px[n];
    Point Py[n];
    for (int i = 0; i < n; i++)
    {
        Px[i] = P[i];
        Py[i] = P[i];
    }

    qsort(Px, n, sizeof(Point), compareX);
    qsort(Py, n, sizeof(Point), compareY);

    // Use recursive function closestUtil() to find the smallest distance 
    return closestUtil(Px, Py, n);
}

// Driver program to test above functions 
int main()
{
    Point P[] = {{795, 981}, {1905, 4574}, {8891, 665}, {6370, 1396}, {93, 8603}, {302, 7099},
                 {326, 5318}, {4493, 3977}, {429, 8687}, {9198, 1558}};
    int n = sizeof(P) / sizeof(P[0]);
    cout << closest(P, n) << std::endl;
    cout << bruteForce(P, n) << std::endl;
    return 0;
}

有人知道这里出了什么问题吗？我已经尝试修复它几个小时，但我真的不明白为什么会出现这个问题。

【问题讨论】：

无法编译。看起来您使用 VLA（可变长度数组是非标准扩展）
哪两个点最接近？我想通过数学来验证。
它不需要-std=c++11，它需要一个为C++语言（如gcc）提供非标准VLA扩展的编译器来处理，例如Point Pyl[mid+1]; 和 Point Px[n];（小心你在互联网上找到的代码，尤其是从像极客这样的地方......大部分 C++ 只是将 C 重新命名为 C++）
他们的实现包含一个或两个错误，并且它不会处理具有相同 x 坐标的多个点的输入。它也是用极其糟糕的 C++ 编写的。 How to debug small programs

标签： c++ points closest

【解决方案1】：

由于Pyl和Pyr的大小分别为mid+1和n-mid-1，下面两行

float dl = closestUtil(Px,     Pyl, mid  );
float dr = closestUtil(Px+mid, Pyr, n-mid);

应该改写如下：

float dl = closestUtil(Px,       Pyl, mid+1  );
float dr = closestUtil(Px+mid+1, Pyr, n-mid-1);

另外，正如this site的源代码中所注释的，在上面的代码中假设所有的x坐标都是不同的。例如，如果所有 x 坐标都相同，则 li 从 0 递增到 n 并且在 Pyl[li++] = Py[i]; 发生意外行为时 li >= mid+1。

顺便说一句，C++ 规范中根本不存在 VLA（可变 Vength 数组）。由于数组Px、Py、Pyl 和Pyr 在堆栈上创建并具有自动存储持续时间，因此它们的大小应在编译时确定。但是包括 GNU 编译器在内的一些 C++ 编译器支持 VLA 作为编译器扩展，并允许声明具有动态长度的 C 样式数组。（如何为 VLA 分配内存是特定于实现的。）但是 C++ 通过std::vector 提供了动态数组功能，这可能使我们的代码更具可读性、可移植性和健壮性。

【讨论】：

感谢您的回答:)