Real Time object detection is a technique of detecting objects from video, there are many proposed network architecture that has been published over the years like we discussed EfficientDet in our previous article, which is already outperformed by YOLOv4, Today we are going to discuss YOLOv5.
YOLO refers to “You Only Look Once” is one of the most versatile and famous object detection models. For every real-time object detection work, YOLO is the first choice by Data Scientist and Machine learning engineers. YOLO algorithms divide all the given input images into the SxS grid system. Each grid is responsible for object detection. Now those Grid cells predict the boundary boxes for the detected object. For every box, we have five main attributes: x and y for coordinates, w and h for width and height of the object, and a confidence score for the probability that the box containing the object.
History
YOLO v1
YOLO v1 was introduced in 2016 by Joseph Redmon et al with a research paper called “You Only Look Once: Unified, Real-Time Object Detection”. This was the initial paper by Redmon that revolutionized the industry and changed the Real-Time Object detection methods totally.
By just looking at the image once, it can detect the objects with a speed of 45fps(frames per second), another YOLO v1 type, Fast YOLOv1 was able to achieve 155fps with little less accuracy.
Architecture
It used the Darknet framework that was trained on the ImageNet-1000 dataset. But YOLOv1 has many limitations like
- it can’t detect the objects properly when the objects are small
- it also can’t generalize the objects if the image is of different dimensions
YOLOv2(YOLO9000)
The second version of YOLOv2 was released in 2017 by Ali Farhadi and Joseph Redmon. This time Joseph collaborated with Ali for major bug fixes and accuracy increment. The research they published was “YOLO9000: Better, Faster, Stronger.” The name of the second version of YOLO was YOLO9000. The major competitor of YOLO9000 was Faster R-CNN, which was also an object detection algorithm that uses Region Proposal Network & (SSD)Single-shot Multbox Detector to identify the multiple objects from an image.
Some of the features of YOLOv2 are:
- YOLOv2 added Batch Normalization as an improvement that normalizes the input layer of the image by altering the activation functions.
- Higher-resolution input: input size has been increased from 224*224 to 448*448.
- Anchor boxes.
- Multi-Scale training.
- Darknet 19 architecture with 19 convolution layers and 5 Max Pooling layers.
YOLOv2 performance on MS COCO dataset
YOLOv3
After one year, on March 25, Joseph Redmon and Ali Farhadi came up with another version of YOLO and a research paper called: “YOLOv3: An Incremental improvement.”
At 320×320, YOLOv3 runs with 22ms at 28.2 mAP with great accuracy, as shown in the above video. It is three times faster than the previous SSD and four times faster than RetinaNet.
New YOLOv3 followed the methodology of the previous YOLOv2 version: YOLO9000. In this approach, Redmond uses Darknet 53 architecture, which was a significantly improved version and had 53 convolution layers.
Some of the new, improved features in YOLOv3 was:
- Class Predictions
- Feature Pyramid Networks(FPN)
- Darknet 53 architecture
YOLOv4
As Redmond was not currently working on the CV for a long time, a new team of three developers released YOLOv4. It was released by Alexey Bochoknovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Alexey is the one who developed the Windows version of YOLO back in the days.
YOLOv4 runs twice faster than EfficientDet with comparable performance, as shown in the below diagram, which was officially published on the YOLOv4 research paper.
Some of the new features of YOLOv4 is:
- Anyone with a 1080 Ti or 2080 ti GPU can run the YOLOv4 model easily.
- YOLOv4 includes CBN(Cross-iteration batch normalization) and PAN(Pan aggregation network) methods.
- Weighted-Residual-Connections(WRC).
- Cross-Stage-Partial connections(CSP), a new backbone to enhance CNN(convolution neural network)
- Self-adversarial-training(SAT): A new data augmentation technique
- DropBlock regularization.
YOLOv5
After a few days of the release of the YOLOv4 model on 27 May 2020, YOLOv5 got released by Glenn Jocher(Founder & CEO of Utralytics). It was publicly released on Github here. Glenn introduced the YOLOv5 Pytorch based approach, and Yes! YOLOv5 is written in the Pytorch framework.