Training Region-based Object Detectors with Online Hard Example Mining

DOI: CVPR 2016

the training set is distinguished by a large imbalance between the number of annotated objects and the number of background examples

Make training more effective and efficient.
OHEM is a simple and intuitive algorithm that eliminates several heuristics and hyperparameters in common use.
The candidate examples are subsampled according to a distribution that favors diverse, high loss instances.
It removes the need for several heuristics and hyperparameters commonly used in region-based ConvNets.
It yields a consistent and significant boosts in mean average precision
Its effectiveness increases as the training set becomes larger and more difficult, as demonstrated by results on the MS COCO dataset

Training Region-based Object Detectors with Online Hard Example Mining

Training Region-based Object Detectors with Online Hard Example Mining

OHEM eliminates several heuristics and hyperparameters in common use by automatically selecting hard examples, thus simplifying training.
Though we used Fast R-CNN throughout this paper, OHEM can be used for training any region-based ConvNet detector.

bootstrapping = hard negative mining rely on aforementioned alternation template:(a) for some period of time a fixed model is used to find new examples to add to the active training set; (b) then, for some period of time the model is trained on the fixed active training set.
hard positive example = false positive example

However, there is a small caveat: co-located RoIs with high overlap are likely to have correlated losses.
- use standard non-maximum suppression (NMS) to perform deduplication

SGD is not suitable for bootstrapping template
2 methods of hard example mining
- remove easy example and then add some hard example
- add false positives to dataset to train the model again
proposal’s IOU with ground truth is in the interval [bg_lo, 0.5), bg_lo = 0.1 is helpful but ignore some infrequent, but import, difficult background regions.
OHEM is robust in case one needs fewer images per batch in order to reduce GPU memory usage.