Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

这篇论文和之前看过的论文不一样。本文是基于传统的机器学习、统计学和频域分析等方法，而之前看的论文都是基于深度学习且都是卷积神经网络。主要是由于这篇论文是2013年的。

1 Abstract

Our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region.
We employ a global consistency constraint on counts using Markov Random Field.

我们与之前的方法不同，我们会在一个新的数据集上训练我们的方法。这个新的数据集包含50张图片6万4千个人，一张图片的人数是94到4543这个范围，而之前那些方法用的图片只有十几个人。

3. Framework

3.1. Counting in Patches

我们通过三种不同且互补的源（sources）来计算人数。The three sources are later combined to obtain a single estimate of count for that patch using the individual counts and confidences.

3.1.1 HOG based Head Detections

For each patch, we use number of detections, ηH, mean and variance of scale µH,s, σH,s and confidence µH,c, σH,c. The consistency in scale and confidence is a measure of how reliable head detections are in that patch.n. There are many false negatives and positives since the images are inherently difficult (see Fig. 2).

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

3.1.2 Fourier Analysis

人群密度很大的图，一个头可能只占几个像素，再加上一些扭曲，从远处看，没法分辨谁是谁，就有一点像一个人重复的出现在图片中（A crowd is inherently repetitive in nature, since all humans appear the same from a distance），那么从频域图看，峰值就对应人头出现的位置，并且峰值成周期出现，如图3所示（Crowd density in the patch is uniform, can be captured by Fourier Transform, f(ξ), where the periodic occurrence of heads shows as peaks in the frequency domain）。

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

通过给的碎片，我们计算得到梯度图片（gradient image, ∇(P)），然后通过一个低通滤波器，去除掉非常高频的部分。

3.1.3 Interest Points based Counting

天空、建筑物和树等无关的信息常常出现在户外的图片中，而傅里叶分析是crowd-blind，这些信息会影响检测头部的位置。所有有必要放弃这些信息，选择我们感兴趣的区域去计算。为了得到稀疏SIFT特征，我们使用支持向量回归来计算数量（In order to obtain counts or densities using sparse SIFT features, we use Support Vector Regression using the counts computed at each patch from ground truth）。

泊松分布

N(I) = N(P1 ∪ P2 . . . Pn) = N(P1) + N(P2) + . . . + N(Pn), (1)

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记 (2)

The above equation gives us a confidence for presence of crowd in a patch. The resulting confidence maps are shown in Fig. 4 for two images.

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

3.2. Fusion of Three Sources

Computing counts and confidences from the three sources, we scale individual features and regress using ϵSVR, with the counts computed from the annotations.

3.3. Counting in Images

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

The graph can be represented with Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记 and N are the four neighbors at the same level and intermediate nodes that connect a patch to layers above and below it.

energy function

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

where labeling ℓ assigns a label ℓp ∈ L = {0, 1, 2, ..., Cmax} for every every patch p ∈ P.

The inference starts by sweeping in four directions at the bottom level using Eq. 4

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

The beliefs are then evaluated for each patch using Eq. 5.

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

Fig. 6 shows three instances where the estimated count of patch was improved based on neighbors (both spatial and layer).

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images(CVPR2013)——论文笔记

——20190411