[CV2019/PaperSummary] EfficientDet: Scalable and Efficient Object Detection

  1. Bi Directional Feature Pyramid Network (BiFPN)
  2. Compound scaling method which uniformly scales resolution, depth and width for backbone,feature network and box/class prediction network at the same time .
  1. BiFPN for easy and fast multi feature fusion
  2. Compound scaling to jointly scale backbone , network and resolution
  3. EfficientDet = BiFPN + Compound Scaling for better accuracy and efficiency across wide variety resource constraints
  1. Multi-scale Feature Fusion
  2. Efficient bi-directional Cross Scale connections
  3. Weighted Feature Fusion
Fig1 Feature network design — (a) FPN introduces a top-down pathway to fuse multi-scale features from level 3 to 7 (P 3 — P 7 ); (b) PANet adds an additional bottom-up pathway on top of FPN;NAS-FPN use neural architecture search to find an irregular feature network topology; (d)-(f) are three alternatives studied in this paper. (d) adds expensive connections from all input feature to output features; (e) simplifies PANet by removing nodes if they only have one input edge; (f) is our BiFPN with better accuracy and efficiency trade-offs.
Eq-a: Multi-scale fusion formula
  1. Remove the nodes that have one single input edge , the intuition is since one input edge with no feature fusion then it will have less contribution to feature network that aims at fusion of different features.
  2. Connect an additional edge from the original input to output node if they are at the same level in order to fuse more features without adding cost
  3. We can bi-directional path i.e top-down and bottom-up as one feature network layer and repeat this multiple times to enable more high-level feature fusion , compound scaling method determines the number of layers for different resource constraint Fig1.(f)
  1. Unbound Fusion :
Figure 2: EfficientDet architecture — It employs EfficientNet [5] as the backbone network, BiFPN as the feature network,and shared class/box prediction network. Both BiFPN layers and class/box net layers are repeated multiple times based on different resource constraints as shown in Table 1.
  1. Backbone network : The author have reused the same width/depth scal-
    ing coefficients of EfficientNet-B0 to B6 [5] such that they can easily reuse their ImageNet-pretrained checkpoints.
  2. BiFPN network :The authors exponentially grow BiFPN width (#channels) as done in EfficientNets, but linearly increase the depth (#layers) since depth needs to be rounded to small integers.
Eq1: Width equation of BiFPN network
Eq2: Depth equation of BiFPN network
Eq3: Resolution equation of BiFPN network
Table 1: Scaling configs for EfficientDet D0-D7 — φ is the compound coefficient that controls all other scaling dimensions; BiFPN, box/class net, and input size are scaled up using equation 1, 2, 3 respectively. D7 has the same settings as D6 except using larger input size
Table2: EfficientDet performance on COCO — Results are for single-model single-scale. #Params and #FLOPS
denote the number of parameters and multiply-adds. LAT denotes inference latency with batch size 1. AA denotes auto-augmentation [6]. We group models together if they have similar accuracy, and compare the ratio or speedup between EfficientDet and other detectors in each group
Fig3: Model size and inference latency comparison — Latency is measured with batch size 1 on the same machine equipped with a Titan V GPU and Xeon CPU. AN denotes AmoebaNet + NAS-FPN trained with auto-augmentation [6].Our EfficientDet models are 4x — 6.6x smaller, 2.3x — 3.2x faster on GPU, and 5.2x — 8.1x faster on CPU than other detectors.
Table3: Disentangling backbone and BiFPN
Table 4: Comparison of different feature networks
Table 5: Comparison of different feature fusion
Fig4: Comparison of different scaling methods
  1. The proposed weighted bidirectional feature network and a customized compound scaling method, improves the accuracy and efficiency of the object detector model
  2. The optimizations technique helped to develop a new family of detectors, named Efficient Det,
  3. EfficientDet-D7 achieves state-of-the-art accuracy with an order-of-magnitude fewer parameters and FLOPS than the best existing detector.
  4. EfficientDet is also up to 3.2x faster on GPUs and 8.1x faster on CPUs.

Final Words ….



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store