Skip links

Wider Perspective on the Progress in Object Detection

Object Detection is one of the most mature fields in computer vision. In the last year alone we have seen many novel ideas in object detection that have introduced significant improvements in detection accuracy.

I’ve gathered, in my opinion, the 9 most important and useful papers (since October 2016) for a talk I recently presented at several conferences. I thought it might be helpful for the people in the field to see the full picture of the progress, and have the list with references.

If you’re interested in learning more about detection and getting a solid intuition to the leading algorithms, listen to my introductory talk from PyData Conference about the subject – https://youtu.be/51HU2Z3J3G4

I’ve divided the leading algorithms by the module they improved (architecture [feature extractor], meta-architecture [detection algorithm] and post-processing), although the division between architecture and meta-architecture improvements can be sometimes argued.

If you have any thoughts about this list or if you think something significant is not on it, please leave a comment. Note: I did not include new “feature extractor” generations, such as squeeze-and-excitation, on purpose. They are not unique contributions to detection, even though most of them do improve detection accuracy.deep learning medical imagingShort paragraph about each of the improvements –

* Detection without pre-training – demonstrated comparable performance to state of the art in certain cases without pre training on imagenet classification.

* Deformable Convolutions and ROI-Pooling – Enables the 3×3 convolution kernel to have any shape (non-rectangular), and learns the optimal shape from the data. Used in the entry that won 2nd place in COCO detection 2016.

* Focal Loss – Novel loss function that gives a higher weight to hard-examples. Demonstrated the best single-model detection performance to date (together with several other improvements).

* Multi-Task Learning – Improve the detection accuracy by having a single network learn both detection and instance-segmentation, which are both done in a fully-convolutional (efficient) manner. Won first place in the COCO instance segmentation 2016 and 3rd place in detection.

* Feature-Pyramid Networks – Most effective and efficient way demonstrated to date for using feature maps of several depths for improving detection of smaller objects. Used in current best single model.

* Detection on 9,000 classes – The COCO detection dataset contains only 80 object categories, scaling up the number of classes is very expensive. They introduced a clever way to train the detection algorithm on Imagenet classification in parallel and enable detection on 9,000 classes in real time (currently with relatively low accuracy).

* Soft NMS – Improve the traditional detection post processing (NMS) to better detect different objects that partially overlap with each other.

* Learned NMS – NMS is currently one of the last components of the detection meta-architecture which is not learned end-to-end, and this paper proposes a way to change it.

Join the Discussion