使用 Surf 进行对象检测答案

【问题标题】：Object Detection using Surf使用 Surf 进行对象检测
【发布时间】：2013-07-10 12:16:04
【问题描述】：

我正在尝试从视频中检测车辆，我将在实时应用程序中执行此操作，但目前为了更好地理解我正在视频中执行此操作，代码如下：

void surf_detection(Mat img_1,Mat img_2); /** @function main */

int main( int argc, char** argv )
{

 int i;
 int key;

 CvCapture* capture = cvCaptureFromAVI("try2.avi");// Read the video file

 if (!capture){

     std::cout <<" Error in capture video file";
     return -1;
 }

 Mat img_template = imread("images.jpg"); // read template image

int numFrames = (int) cvGetCaptureProperty(capture,  CV_CAP_PROP_FRAME_COUNT);



IplImage* img = 0; 

for(i=0;i<numFrames;i++){
  cvGrabFrame(capture);          // capture a frame
  img=cvRetrieveFrame(capture);  // retrieve the captured frame


  surf_detection (img_template,img);

  cvShowImage("mainWin", img); 
  key=cvWaitKey(20);           

}

 return 0;
 }

void surf_detection(Mat img_1,Mat img_2)
{ 

if( !img_1.data || !img_2.data )
{ 
    std::cout<< " --(!) Error reading images " << std::endl; 

}




//-- Step 1: Detect the keypoints using SURF Detector
int minHessian = 400;
SurfFeatureDetector detector( minHessian );
std::vector<KeyPoint> keypoints_1, keypoints_2;

std::vector< DMatch > good_matches;

do{ 

detector.detect( img_1, keypoints_1 );
detector.detect( img_2, keypoints_2 );

//-- Draw keypoints

Mat img_keypoints_1; Mat img_keypoints_2;
drawKeypoints( img_1, keypoints_1, img_keypoints_1, Scalar::all(-1), DrawMatchesFlags::DEFAULT );
drawKeypoints( img_2, keypoints_2, img_keypoints_2, Scalar::all(-1), DrawMatchesFlags::DEFAULT );

//-- Step 2: Calculate descriptors (feature vectors)
SurfDescriptorExtractor extractor;
Mat descriptors_1, descriptors_2;
extractor.compute( img_1, keypoints_1, descriptors_1 );
extractor.compute( img_2, keypoints_2, descriptors_2 );


//-- Step 3: Matching descriptor vectors using FLANN matcher
FlannBasedMatcher matcher;
std::vector< DMatch > matches;
matcher.match( descriptors_1, descriptors_2, matches );
double max_dist = 0; 
double min_dist = 100;

//-- Quick calculation of max and min distances between keypoints
for( int i = 0; i < descriptors_1.rows; i++ )
{ 
    double dist = matches[i].distance;
if( dist < min_dist )
    min_dist = dist;
if( dist > max_dist ) 
    max_dist = dist;
}


//-- Draw only "good" matches (i.e. whose distance is less than 2*min_dist )


for( int i = 0; i < descriptors_1.rows; i++ )
{ 
    if( matches[i].distance < 2*min_dist )
        { 
                good_matches.push_back( matches[i]);
        }
}

}while(good_matches.size()<100);

//-- Draw only "good" matches
Mat img_matches;
drawMatches( img_1, keypoints_1, img_2, keypoints_2,good_matches, img_matches, Scalar::all(-1), Scalar::all(-1),
vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );

//-- Localize the object
std::vector<Point2f> obj;
std::vector<Point2f> scene;
for( int i = 0; i < good_matches.size(); i++ )
{
//-- Get the keypoints from the good matches
obj.push_back( keypoints_1[ good_matches[i].queryIdx ].pt );
scene.push_back( keypoints_2[ good_matches[i].trainIdx ].pt );
}


Mat H = findHomography( obj, scene, CV_RANSAC );


//-- Get the corners from the image_1 ( the object to be "detected" )
std::vector<Point2f> obj_corners(4);
obj_corners[0] = Point2f(0,0); 
obj_corners[1] = Point2f( img_1.cols, 0 );
obj_corners[2] = Point2f( img_1.cols, img_1.rows ); 
obj_corners[3] = Point2f( 0, img_1.rows );
std::vector<Point2f> scene_corners(4);

perspectiveTransform( obj_corners, scene_corners, H);

//-- Draw lines between the corners (the mapped object in the scene - image_2 )
line( img_matches, scene_corners[0] , scene_corners[1] , Scalar(0, 255, 0), 4 );
line( img_matches, scene_corners[1], scene_corners[2], Scalar( 0, 255, 0), 4 );
line( img_matches, scene_corners[2] , scene_corners[3], Scalar( 0, 255, 0), 4 );
line( img_matches, scene_corners[3] , scene_corners[0], Scalar( 0, 255, 0), 4 );
imshow( "Good Matches & Object detection", img_matches );

}

我得到以下输出

和 std::cout

H 值：

但我的问题是为什么它不在检测到的对象上绘制矩形：

我在简单的视频和图像上这样做，但是当我在静止相机上这样做时，如果没有那个矩形可能会很困难

【问题讨论】：

在下面的答案中提到这个问题是stackoverflow.com/questions/11049081/…的副本
@masad 我认为这个答案对我不起作用，你可以看看
检查单应矩阵 H 并在此处发布结果。使用新的 opencv 接口，它可以作为 cout
@mrgloom 我更新了 Mat H 的结果
您不能对图片i.stack.imgur.com/RfrYH.png 上的“不同”对象应用 sift 特征，但您可以尝试基于 SIFT 特征的星座模型（将涉及机器学习）

标签： opencv image-processing computer-vision

【解决方案1】：

首先，在您显示的图像中，根本没有绘制矩形。你能在图像中间画一个矩形吗？

然后，看下面的代码：

int x1 , x2 , y1 , y2 ;
x1 = scene_corners[0].x + Point2f( img_1.cols, 0).x ; 
y1 = scene_corners[0].y + Point2f( img_1.cols, 0).y ; 
x2 = scene_corners[0].x + Point2f( img_1.cols, 0).x + in_box.width ; 
y2 = scene_corners[0].y + Point2f( img_1.cols, 0).y + in_box.height ;

我不明白您为什么将in_box.width 和in_box.height 添加到每个角落（它们在哪里定义？）。您应该改用scene_corners[2]。但是注释行应该在某处打印一个矩形。

既然您要求提供更多详细信息，让我们看看您的代码中发生了什么。

首先，你怎么去`perspectiveTransform()`？

您使用detector.detect 检测特征点。它会为您提供两幅图像的兴趣点。
您使用extractor.compute描述这些功能。它为您提供了一种比较兴趣点的方法。比较两个特征的描述符回答了这个问题：这些点有多相似？*
实际上，您将第一张图像上的每个特征与第二张图像中的所有特征（有点）进行比较，并为每个特征保持最佳匹配。此时，您就知道了看起来最相似的特征对。
您只保留good_matches。因为它可能发生这样的情况，对于一个特征，另一张图像中最相似的那个实际上是完全不同的（它仍然是最相似的，因为你没有更好的选择）。这是删除错误匹配项的第一个过滤器。
您会找到与您找到的匹配项相对应的单应变换。这意味着您尝试找到如何将第一张图像中的点投影到第二张图像中。然后，您获得的单应矩阵允许您将第一张图像的任何点投影到第二张图像中的对应关系上。

第二，你用这个做什么？

现在变得有趣了。 您有一个单应矩阵，可让您将第一张图像的任何点投影到第二张图像中的对应位置。因此，您可以决定在对象周围绘制一个矩形（即obj_corners），并将其投影到第二张图像（perspectiveTransform( obj_corners, scene_corners, H);）上。结果在scene_corners。

现在您想使用scene_corners 绘制一个矩形。但还有一点：drawMatches() 显然将您的两个图像放在img_matches 中。但是投影（单应矩阵）是在图像上单独计算的！这意味着每个scene_corner 都必须进行相应的翻译。由于场景图像是在物体图像的右侧绘制的，因此您必须将物体图像的宽度添加到每个scene_corner，以便将它们平移到右侧。

这就是您将0 添加到y1 和y2 的原因，因为您不必垂直翻译它们。但是对于x1 和x2，您必须添加img_1.cols。

//-- Draw lines between the corners (the mapped object in the scene - image_2 )
line( img_matches, scene_corners[0] + Point2f( img_1.cols, 0), scene_corners[1] + Point2f( img_1.cols, 0), Scalar(0, 255, 0), 4 );
line( img_matches, scene_corners[1] + Point2f( img_1.cols, 0), scene_corners[2] + Point2f( img_1.cols, 0), Scalar( 0, 255, 0), 4 );
line( img_matches, scene_corners[2] + Point2f( img_1.cols, 0), scene_corners[3] + Point2f( img_1.cols, 0), Scalar( 0, 255, 0), 4 );
line( img_matches, scene_corners[3] + Point2f( img_1.cols, 0), scene_corners[0] + Point2f( img_1.cols, 0), Scalar( 0, 255, 0), 4 );

所以我建议您取消注释这些行并查看是否绘制了一个矩形。如果没有，请尝试硬编码值（例如Point2f(0, 0) 和Point2f(100, 100)），直到您的矩形绘制成功。也许您的问题来自于一起使用cvPoint 和Point2f。也尝试使用Scalar(0, 255, 0, 255)...

希望对你有帮助。

*必须理解，两个点可能看起来完全一样，但实际上并不对应于同一点。想想一个真正重复的图案，比如建筑物窗户的角落。所有窗口看起来都一样，因此两个不同窗口的角看起来可能非常相似，即使这显然是错误的匹配。

【讨论】：

谢谢你这么好的解释，我正在等待，+1，你要求取消注释的那些行在我的实现中没有注释，我想在检测到的对象周围画一个正方形，就像我在上面的示例图片中展示的那样。
您是否按照我的建议成功绘制了硬编码矩形？您的图像绝对没有显示矩形，这让人认为绘图本身不起作用。尝试在点 (0, 0) 和 (100, 100) 之间画一个矩形，如果可行，请告诉我。
是的，当我使用点 (0,0) 或其他值时，它会在图像上的某个位置绘制一个点，我正在从视频中检测车辆，当它检测到我应该检测到的车辆时我想要它将其标记为正方形或反应角，视频继续播放，我更新我的代码
你能告诉我scene_corners 的值在perspectiveTransform() 之后是什么吗？
他想在输出视频中写什么？不明白这一点

【解决方案2】：

您执行了以下步骤：

匹配 2 张图像中的关键点。
假设匹配正确，计算单应性（投影矩阵）。
使用单应性投影原始图像的角以绘制四边形（您在下面称为矩形）透视变换。

您遇到的问题是，当第 1 步失败时，您会在第 2 步中得到错误的单应性（错误的矩阵），并且当您在第 3 步中投影角时，它们可能会从图像中掉出来，而您看不到行。

您真正想要的是一种知道您计算的单应性是否具有正确形式的方法。为此，请在此处查看答案：How to check if obtained homography matrix is good? 用它来测试你的单应性是否正确。如果不是，您知道匹配导致失败。如果正确，你可以画一个矩形，你会看到它，但如果关键点之间的匹配不准确，它可能就不那么准确了。

最后，我认为您的算法方法是错误的。通过将车辆与来自正面视图的车辆图像进行匹配来从顶视图识别/检测车辆是一个死胡同。您根本不应该使用关键点匹配。只需手动标记图像上的所有车辆并将其发送到 SVM。如果工作量太大，请使用 Mechanical Turk 平台自动标记车辆。总之 - 关键点匹配是一种不适合您需求的方法，因为它强烈假设两个图像中的汽车外观相似。在您的情况下，这些图像差异太大（由于汽车的 3D 结构和不同的视角）

【讨论】：

我不确定 surf 是不是正确的功能。为什么您认为 SURF 功能会捕捉汽车之间的差异？ SURF 捕捉到非常小的局部变化（如仪表板上的手机，或驾驶员座位上的人），而汽车标记在大尺度（形状/大小）上有所不同。我会使用大特征，比如汽车的宽度与长度的比例、颜色、汽车轮廓的形状。汽车前部和后部的图像补丁，捕获符号的汽车前部的补丁等...如果使用局部特征，则计算直方图并将其用作描述符

【解决方案3】：

您实际上在做的是在图像中找到参考点（关键点）并将它们相互比较以发现它们在另一个图像中重新出现（基于 SURF 特征向量）。这是对象检测和识别的重要步骤，但不要误认为图像分割 (http://en.wikipedia.org/wiki/Image_segmentation) 或对象定位，您可以在其中找到所需对象的确切轮廓（或一组像素或超像素）。

获取对象的边界矩形，尤其是在您的示例中透视的对象，并不是一项简单的任务。您可以从已找到的关键点的边界框开始。但是，这只会覆盖对象的一部分。特别是如果没有图像的 3D 配准，即知道图像中每个像素的第 3 维值（z 值、深度），您的示例中的透视边界框可能很难找到。

【讨论】：

我看到了 Surf 文档，我发现那个矩形作为输出结果，没有提到分割

【解决方案4】：

和这个一样吗？ Drawing rectangle around detected object using SURF

据我所知，未绘制大纲的唯一原因是执行此操作的代码部分已被注释掉，因此请取消注释。这部分代码为我概述了一个测试图像：

/*   
//-- Draw lines between the corners (the mapped object in the scene - image_2 )
line( img_matches, scene_corners[0] + Point2f( img_1.cols, 0), scene_corners[1] + Point2f( img_1.cols, 0), Scalar(0, 255, 0), 4 );
line( img_matches, scene_corners[1] + Point2f( img_1.cols, 0), scene_corners[2] + Point2f( img_1.cols, 0), Scalar( 0, 255, 0), 4 );
line( img_matches, scene_corners[2] + Point2f( img_1.cols, 0), scene_corners[3] + Point2f( img_1.cols, 0), Scalar( 0, 255, 0), 4 );
line( img_matches, scene_corners[3] + Point2f( img_1.cols, 0), scene_corners[0] + Point2f( img_1.cols, 0), Scalar( 0, 255, 0), 4 );   */

您可能不想在视频图像中的匹配模板周围绘制一个矩形，因为它可能会变形。将扭曲的scene_corners 用线连接起来。我会删除所有 x1, x2, y1, y2 和 cvRect square 的东西。

请注意，scene_corners 不会为您提供矩形，因为该对象在视频中的旋转方式可能与在模板图像中的旋转方式不同。上面发布的手机图片就是一个很好的例子，手机屏幕周围的绿色轮廓是一个四边形。如果您想使用包含整个对象的矩形 ROI，您可以考虑在视频中找到包含整个对象的边界矩形。以下是我的做法：

// draw the *rectangle* that contains the entire detected object (a quadrilateral)
// i.e. bounding box in the scene (not the corners)

// upper left corner of bounding box
cv::Point2f low_bound = cv::Point2f( min(scene_corners[0].x, scene_corners[3].x) , min(scene_corners[0].y, scene_corners[1].y) );

// lower right corner of bounding box
cv::Point2f high_bound = cv::Point2f( max(scene_corners[2].x, scene_corners[1].x) , max(scene_corners[2].y, scene_corners[3].y) );

// bounding box offset introduced by displaying the images side-by-side
// *only for side-by-side display*
cv::Point2f matches_offset = cv::Point2f( img_1.cols, 0);

// draw the bounding rectangle in the side-by-side display
cv::rectangle( img_matches , low_bound +  matches_offset , high_bound + matches_offset , cv::Scalar::all(255) , 2 );

/* 
if you want the rectangle around the object in the original video images, don't add the
offset and use the following line instead:

cv::rectangle( img_matches , low_bound , high_bound , cv::Scalar::all(255) , 2 );
*/

// Here is the actual rectangle, you can use as the ROI in you video images:
cv::Rect video_rect = cv::Rect( low_bound , high_bound );

上面代码块中的最后一行可能包含您在最初发布的代码中尝试获取的矩形。它应该是视频图像中的矩形，img。您可以使用它来处理包含对象（ROI）的图像子集。

正如 Anum 所提到的，您还混合了新旧 OpenCV 风格。您可以通过始终使用Point2f 而不是cvPoint 等方式进行清理。

【讨论】：

Grigole 什么是 low_bound 、 high_bound 、matches_offset ？
查看上面的更新。 scene_corners 不一定在视频中形成一个直立的矩形 - 对象可能已经转动或旋转。 low_bound 和 high_bound 给出包含您的对象的边界矩形的左上角和右下角。 matches_offset 用于在调用findMatches 后并排显示图像时的矩形。如果您想在原始视频图像中添加矩形，则无需添加。

首先，你怎么去perspectiveTransform()？

第二，你用这个做什么？

首先，你怎么去`perspectiveTransform()`？