Object Detection
Papers
Deep Neural Networks for Object Detection
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
method | ILSVRC 2013 mAP |
---|---|
OverFeat | 24.3% |
- intro: A deep version of the sliding window method, predicts bounding box directly from each location of the topmost feature map after knowing the confidences of the underlying object categories.
- arXiv: http://arxiv.org/abs/1312.6229
- code: https://github.com/sermanet/OverFeat
- code: http://cilvr.nyu.edu/doku.php?id=software:overfeat:start
R-CNN
Rich feature hierarchies for accurate object detection and semantic segmentation(R-CNN)
method | VOC 2007 mAP | VOC 2010 mAP | VOC 2012 mAP | ILSVRC 2013 mAP |
---|---|---|---|---|
R-CNN,AlexNet | 54.2% | 50.2% | 49.6% | |
R-CNN,bbox reg,AlexNet | 58.5% | 53.7% | 53.3% | 31.4% |
R-CNN,bbox reg,ZFNet | 59.2% | |||
R-CNN,VGG-Net | 62.2% | |||
R-CNN,bbox reg,VGG-Net | 66.0% |
- arXiv: http://arxiv.org/abs/1311.2524
- slides: http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf
- slides: http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf
- code: https://github.com/rbgirshick/rcnn
- notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/
- caffe-pr(“Make R-CNN the Caffe detection example”): https://github.com/BVLC/caffe/pull/482
MultiBox
Scalable Object Detection using Deep Neural Networks (MultiBox)
- intro: Train a CNN to predict Region of Interest.
- arXiv: http://arxiv.org/abs/1312.2249
- code: https://github.com/google/multibox
SPP-Net
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
method | VOC 2007 mAP | ILSVRC 2013 mAP |
---|---|---|
SPP_net(ZF-5),1-model | 54.2% | 31.84% |
SPP_net(ZF-5),2-model | 60.9% | |
SPP_net(ZF-5),6-model | 35.11% |
- arXiv: http://arxiv.org/abs/1406.4729
- code: https://github.com/ShaoqingRen/SPP_net
- notes: http://zhangliliang.com/2014/09/13/paper-note-sppnet/
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
Scalable, High-Quality Object Detection
DeepID-Net
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
method | VOC 2007 mAP | ILSVRC 2013 mAP |
---|---|---|
DeepID-Net | 64.1% | 50.3% |
Object Detection Networks on Convolutional Feature Maps
method | Trained on | mAP |
---|---|---|
NoC | 07+12 | 68.8% |
NoC,bb | 07+12 | 71.6% |
NoC,+EB | 07+12 | 71.8% |
NoC,+EB,bb | 07+12 | 73.3% |
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
Model | BBoxReg? | VOC 2007 mAP(IoU>0.5) |
---|---|---|
R-CNN(AlexNet) | No | 54.2% |
R-CNN(VGG) | No | 60.6% |
+StructObj | No | 61.2% |
+StructObj-FT | No | 62.3% |
+FGS | No | 64.8% |
+StructObj+FGS | No | 65.9% |
+StructObj-FT+FGS | No | 66.5% |
Model | BBoxReg? | VOC 2007 mAP(IoU>0.5) |
---|---|---|
R-CNN(AlexNet) | Yes | 58.5% |
R-CNN(VGG) | Yes | 65.4% |
+StructObj | Yes | 66.6% |
+StructObj-FT | Yes | 66.9% |
+FGS | Yes | 67.2% |
+StructObj+FGS | Yes | 68.5% |
+StructObj-FT+FGS | Yes | 68.4% |
- arXiv: http://arxiv.org/abs/1504.03293
- slides: http://www.ytzhang.net/files/publications/2015-cvpr-det-slides.pdf
- code: https://github.com/YutingZhang/fgs-obj
Fast R-CNN
Fast R-CNN
method | data | VOC 2007 mAP |
---|---|---|
FRCN,VGG16 | 07 | 66.9% |
FRCN,VGG16 | 07+12 | 70.0% |
method | data | VOC 2010 mAP |
---|---|---|
FRCN,VGG16 | 12 | 66.1% |
FRCN,VGG16 | 07++12 | 68.8% |
method | data | VOC 2012 mAP |
---|---|---|
FRCN,VGG16 | 12 | 65.7% |
FRCN,VGG16 | 07++12 | 68.4% |
- arXiv: http://arxiv.org/abs/1504.08083
- slides: http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
- github: https://github.com/rbgirshick/fast-rcnn
- webcam demo: https://github.com/rbgirshick/fast-rcnn/pull/29
- notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/
- notes: http://blog.csdn.net/linj_m/article/details/48930179
- github(“Train Fast-RCNN on Another Dataset”): https://github.com/zeyuanxy/fast-rcnn/tree/master/help/train
DeepBox
DeepBox: Learning Objectness with Convolutional Networks
MR-CNN
Object detection via a multi-region & semantic segmentation-aware CNN model (MR-CNN)
Model | Trained on | VOC 2007 mAP |
---|---|---|
VGG-net | 07+12 | 78.2% |
VGG-net | 07 | 74.9% |
Model | Trained on | VOC 2012 mAP |
---|---|---|
VGG-net | 07+12 | 73.9% |
VGG-net | 12 | 70.7% |
- arXiv: http://arxiv.org/abs/1505.01749
- code: “Pdf and code will appear here shortly – stay tuned”
http://imagine.enpc.fr/~komodakn/ - notes: http://zhangliliang.com/2015/05/17/paper-note-ms-cnn/
- notes: http://blog.cvmarcher.com/posts/2015/05/17/multi-region-semantic-segmentation-aware-cnn/
Faster R-CNN
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(NIPS 2015)
training data | test data | mAP | time/img | |
---|---|---|---|---|
Faster RCNN, VGG-16 | 07 | VOC 2007 test | 69.9% | 198ms |
Faster RCNN, VGG-16 | 07+12 | VOC 2007 test | 73.2% | 198ms |
Faster RCNN, VGG-16 | 12 | VOC 2007 test | 67.0% | 198ms |
Faster RCNN, VGG-16 | 07++12 | VOC 2007 test | 70.4% | 198ms |
- arXiv: http://arxiv.org/abs/1506.01497
- github: https://github.com/ShaoqingRen/faster_rcnn
- github: https://github.com/rbgirshick/py-faster-rcnn
YOLO
You Only Look Once: Unified, Real-Time Object Detection(YOLO)
- intro: YOLO uses the whole topmost feature map to predict both confidences for multiple categories and bounding boxes (which are shared for these categories).
- arXiv: http://arxiv.org/abs/1506.02640
- code: http://pjreddie.com/darknet/yolo/
- github: https://github.com/pjreddie/darknet
- reddit: https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/
- github(YOLO_tensorflow): https://github.com/gliese581gg/YOLO_tensorflow
R-CNN minus R
DenseBox
DenseBox: Unifying Landmark Localization with End to End Object Detection
- arXiv: http://arxiv.org/abs/1509.04874
- demo: http://pan.baidu.com/s/1mgoWWsS
- KITTI result: http://www.cvlibs.net/datasets/kitti/eval_object.php
SSD
SSD: Single Shot MultiBox Detector
- arXiv: http://arxiv.org/abs/1512.02325
- github: https://github.com/weiliu89/caffe/tree/ssd
- video: http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973
Inside-Outside Net
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
Detection results on VOC 2007 test:
Method | R | S | W | D | Train | mAP |
---|---|---|---|---|---|---|
FRCN | 07+12 | 70.0 | ||||
RPN | 07+12 | 73.2 | ||||
MR-CNN | √ | 07+12 | 78.2 | |||
ION | 07+12 | 74.6 | ||||
ION | √ | 07+12 | 75.6 | |||
ION | √ | √ | 07+12+S | 76.5 | ||
ION | √ | √ | √ | 07+12+S | 78.5 | |
ION | √ | √ | √ | √ | 07+12+S | 79.2 |
Detection results on VOC 2012 test:
Method | R | S | W | D | Train | mAP |
---|---|---|---|---|---|---|
FRCN | 07++12 | 68.4 | ||||
RPN | 07++12 | 70.4 | ||||
FRCN+YOLO | 07++12 | 70.4 | ||||
HyperNet | 07++12 | 71.4 | ||||
MR-CNN | √ | 07+12 | 73.9 | |||
ION | √ | √ | √ | √ | 07+12+S | 76.4 |
- intro: “0.8s per image on a Titan X GPU (excluding proposal generation) without two-stage bounding-box regression and 1.15s per image with it”.
- arxiv: http://arxiv.org/abs/1512.04143
- slides: http://www.seanbell.ca/tmp/ion-coco-talk-bell2015.pdf
- coco-leaderboard: http://mscoco.org/dataset/#detections-leaderboard
G-CNN
G-CNN: an Iterative Grid Based Object Detector
Learning Deep Features for Discriminative Localization
- homepage: http://cnnlocalization.csail.mit.edu/
- arxiv: http://arxiv.org/abs/1512.04150
Factors in Finetuning Deep Model for object detection
We don’t need no bounding-boxes: Training object class detectors using only human verification
Specific Object Deteciton
End-to-end people detection in crowded scenes
- arXiv: http://arxiv.org/abs/1506.04878
- code: https://github.com/Russell91/reinspect
- ipn: http://nbviewer.ipython.org/github/Russell91/ReInspect/blob/master/evaluation_reinspect.ipynb
Tutorials
Convolutional Feature Maps: Elements of efficient (and accurate) CNN-based object detection
Codes
TensorBox: a simple framework for training neural networks to detect objects in images
- intro: “The basic model implements the simple and robust GoogLeNet-OverFeat algorithm. We additionally provide an implementation of the ReInspect algorithm”
- github: https://github.com/Russell91/TensorBox
Blogs
Convolutional Neural Networks for Object Detection
http://rnd.azoft.com/convolutional-neural-networks-object-detection/