DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

Download Report

Transcript DenseNet: Replacing HOG with Deep Convnet Pyramids for Object Detection Forrest Iandola, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor.

DenseNet:

Replacing HOG with Deep Convnet Pyramids for Object Detection

Forrest Iandola

, Sergey Karayev, Ross Girshick, Matt Moskewicz, Yangqing Jia, Kurt Keutzer, and Trevor Darrell

forresti@eecs.berkeley.edu

University of California, Berkeley

1

Overview Object Detection • • • Selective Search + ConvNets Multiscale Pyramid Descriptors DenseNet: ConvNet Pyramids for improved efficiency • DenseNet code is available – give it a try in

your

pipeline Forrest Iandola forresti@eecs.berkeley.edu

2

Deep Convolutional Neural Networks 1989: high-quality digit recognition (Bell Labs – LeCun) 2012: best ImageNet Classification (Toronto) 2013: best PASCAL Detection (Berkeley)

2014: efficient detection + replace HOG with ConvNets

Forrest Iandola forresti@eecs.berkeley.edu

3

Regions with CNN Features Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

4

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

5

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

6

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

7

Regions with CNN Features "Selective Search" region proposals (Uijlings et al, IJCV 2013) Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich feature hierarchies for accurate object detection and semantic segmentation

. ArXiv 2013.

Forrest Iandola forresti@eecs.berkeley.edu

8

Regions with CNN Features Forrest Iandola forresti@eecs.berkeley.edu

Caffe

– efficient ConvNet GPU implementation from Berkeley http://caffe.berkeleyvision.org

9

Regions with CNN Features

Linear Classifier > 50% mAP on PASCAL 07 detection

Forrest Iandola forresti@eecs.berkeley.edu

10

Efficiency Issues with R-CNN Forrest Iandola forresti@eecs.berkeley.edu

11

Efficiency Issues with R-CNN

2000 windows = 100x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

12

Sliding-Window Detection on HOG Pyramids Forrest Iandola forresti@eecs.berkeley.edu

13

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

14

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

15

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

16

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

17

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

18

Sliding-Window Detection on HOG Pyramids

pyra = featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

19

Sliding-Window Detection on HOG Pyramids Can add parts, if desired Forrest Iandola forresti@eecs.berkeley.edu

33% mAP on PASCAL 07 detection

20

Efficiency of HOG Pyramids

Pyramid = 8x the input image size

Typical settings: 5 octaves 10 scales per octave Forrest Iandola forresti@eecs.berkeley.edu

21

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

22

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

23

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Forrest Iandola forresti@eecs.berkeley.edu

24

Sliding-Window Detection on

ConvNet

Pyramids

Pyramid = 8x the input image size

Efficiency of HOG + Accuracy of Deep Learning Easy to use:

pyra = convnet _featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

25

Implementing ConvNet Pyramids Forrest Iandola forresti@eecs.berkeley.edu

26

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 27 Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 28 Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 29 Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids State-of-the-art ConvNet implementations (e.g. Caffe): • Can handle any input image size • BUT, need batches of same-sized images to saturate GPU 30 Forrest Iandola forresti@eecs.berkeley.edu

Implementing ConvNet Pyramids Easy to use:

pyra = convnet _featpyramid(image)

Forrest Iandola forresti@eecs.berkeley.edu

31

Computational Performance

Selective Search Pyramids

Forrest Iandola forresti@eecs.berkeley.edu

32

Computational Performance

Selective Search Pyramids

Forrest Iandola forresti@eecs.berkeley.edu

33

Computational Performance

Selective Search

2000 windows = 100x the input image size

1/10 fps Pyramids

Pyramid = 8x the input image size

1fps

34 Forrest Iandola forresti@eecs.berkeley.edu

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid

Forrest Iandola forresti@eecs.berkeley.edu

35

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)

Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 36 Forrest Iandola forresti@eecs.berkeley.edu

Future Applications

for each of the 6000 papers citing HOG: pyra = featpyramid(image) #HOG Pyramid pyra = convnet _featpyramid(image)

Exemplar-SVM (Alyosha Efros) RGB-D Recognition (Saurabh Gupta) Tracking Algorithms (TTI-Japan) 37 Forrest Iandola forresti@eecs.berkeley.edu