DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

Download Report

Transcript DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

DenseNet:

Replacing HOG with Deep Convnet Pyramids for Object Detection

Forrest Iandola

, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor Darrell

forresti@eecs.berkeley.edu

University of California, Berkeley

Overview Object Detection • • • Selective Search + ConvNets Multiscale Pyramid Descriptors DenseNet: ConvNet Pyramids for improved efficiency • DenseNet code is available – give it a try in

your

pipeline Forrest Iandola forresti@eecs.berkeley.edu

Deep Convolutional Neural Networks 1989: high-quality digit recognition (Bell Labs – LeCun) 2012: best ImageNet Classification (Toronto) 2013: best PASCAL Detection (Berkeley)

2014: efficient detection + replace HOG with ConvNets

Forrest Iandola forresti@eecs.berkeley.edu

Regions with CNN Features Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

Regions with CNN Features Forrest Iandola forresti@eecs.berkeley.edu

Caffe

– efficient ConvNet GPU implementation from Berkeley http://caffe.berkeleyvision.org

Regions with CNN Features

Linear Classifier > 50% mAP on PASCAL 07 detection

Forrest Iandola forresti@eecs.berkeley.edu

Efficiency Issues with R-CNN Forrest Iandola forresti@eecs.berkeley.edu

Efficiency Issues with R-CNN

2000 windows = 100x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on HOG Pyramids Can add parts, if desired Forrest Iandola forresti@eecs.berkeley.edu

33% mAP on PASCAL 07 detection

Efficiency of HOG Pyramids

Pyramid = 8x the input image size

Typical settings: 5 octaves 10 scales per octave Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Efficiency of HOG + Accuracy of Deep Learning Easy to use:

pyra = convnet _featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 27 Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids Easy to use:

pyra = convnet _featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

Computational Performance

Selective Search Pyramids

Forrest Iandola forresti@eecs.berkeley.edu

Computational Performance

Selective Search Pyramids

Forrest Iandola forresti@eecs.berkeley.edu

Computational Performance

Selective Search

2000 windows = 100x the input image size

1/10 fps Pyramids

Pyramid = 8x the input image size

1fps

34 Forrest Iandola forresti@eecs.berkeley.edu

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid

Forrest Iandola forresti@eecs.berkeley.edu

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)

Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 36 Forrest Iandola forresti@eecs.berkeley.edu

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)

Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 37 Forrest Iandola forresti@eecs.berkeley.edu

DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

Transcript DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

Directory