[CVPR2021/PaperSummary]YOLOX: Exceeding YOLO Series in 2021

Fig1:Speed-accuracy trade-off of accurate models (top) and Size-accuracy curve of lite models


There is always a trade-off between optimal speed and accuracy for real-time object detection applications. Yolov5 holds the best trade-off performance with 48.2% AP on COCO at 13.7 ms, but with recent advancements from researchers the focus is to develop object detection architectures which are mentioned below.

2. Proposed Architecture

2.1. YOLOX-DarkNet53

  1. EMA updatation
  2. Cosine LR scheduler
  3. IOU loss with IOU aware branch
  4. BCE loss for training cls and obj branch
  5. IOU loss for training reg branch
  6. RandomHorizontalFlip, ColorJitter, and Multiscale are considered for data augmentation.
Tab 1: Roadmap of YOLOX-Darknet53 in terms of AP (%) on COCO val.
Fig2: Training curves for detectors with YOLOv3 head or decoupled head. We
Tab 2: The effect of decoupled head for end-to-end YOLO in terms of AP (%) on COCO
Fig3: Illustration of the difference between YOLOv3 head and the proposed decoupled head
Eq1:SimOTA cost function

2.2. Other Backbones

Besides Darknet-53 the authors have tested with different backbones and different sizes where YoloX recieves the best performance.

Table 3: Comparison of YOLOX and YOLOv5
Table 4: Comparison of YOLOX-Tiny and YOLOX-Nano
Table 5: Effects of data augmentation under different model sizes.

3. Comparison with the SOTA

Tab. 6 shows the SOTA comparing table with Fig. 1, plotting the somewhat controlled speed/accuracy curve,some high performance YOLO series with larger model sizes like Scale-YOLOv4 [5] and YOLOv5-P6 [11] are observed . And the current Transformer based detectors [9] push the accuracy-SOTA to ∼60 AP.

Table 6: Comparison of the speed and accuracy of different object detectors on COCO 2017 test-dev

4. Conclusion

The author has used some recent advancement techniques i.e., decoupled head,anchor-free, and advanced label assigning strategy, and implement it on YOLOv3 architecture, which is still one of the most widely used detectors in industry due to its broad compatibility and from new architecute ie.YOLOX achieves a better trade-off between speed and accuracy than other counterparts across all model sizes.

Writer’s Conclusion


  • Implementation of adavanced techniques which boost the performance .
  • Optimal trade off between accuracy and performance which can be used for real time applications.
  • The architecture implementation is similar across all series of models ie.for both large models and smaller models.
  • Incroporating of attention based methods would have boosted the performance much better .
  • Need to have good accuracy on Small Object Detection dataset and smaller objects .
  • Fails in night scenes and more occluded scenarios of the object.


[1]Alexey Bochkovskiy, Chien-Yao Wang, and HongYuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020. 1, 2, 3, 6



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store