Object Detection: The YOLO Algorithm in AI

Written by Mason Kramer

Have you heard of the YOLO algorithm? Interested in knowing how it is used in Object Detection? Read on to learn how it is used to track objects!

About Real Time Object Detection and the YOLO Algorithm

Object detection is a subset of computer vision with numerous applications. Object detection allows us to detect and track objects of a certain class. These programs are essential for the development of industries such as facial recognition, video surveillance, and autonomous vehicles.

Today, we see a variety of approaches to accomplish this task, yet none have stood out quite like the “You Only Look Once” method. Also known as YOLO, this method was originally developed in 2016 and continues to be improved and updated today.

The YOLO method designed to be real time object detection mimicking the human visual system. This program is fast and accurate with applications extending far beyond what was once thought possible with artificial intelligence.

The YOLO Algorithm and Self-Driving Cars

YOLO would allow computers to drive cars without any specialized sensors, conveying real time scene information to the user, as well as develop a truly responsive robotics system.

YOLO is the new wave of AI in the field of object detection. Therefore, it is critical to understand how the YOLO method works and what it does to improve upon the more traditional systems.

How the YOLO Algorithm Works

The YOLO algorithm is based on a convolutional neural network, a class of neural networks which is most commonly applied to analyze visual imagery.

The development of convolutional neural networks was inspired by the biological neuron patterns in the organization of an animal’s visual cortex. While these networks use relatively little pre-processing when compared to other image classification algorithms, the images are still required to be reformatted into 416 x 416 pixels.

To do this, the image is resized and a grey fill is added as not to change the aspect ratio. After the image is resized, it will be divided up into a grid of cells. Each cell will be responsible for making two detections, deciding what is in each of the cells and predicting one bounding box for the object which has been detected.

AI Object Identification

These detections are made at three different scales: each scale will be responsible for detecting a different size of object. The first detection will happen at the scale resolution of 13 x 13 and will detect the large objects. Next the 26 x 26 scale resolution detection will happen and detect objects of the middling variety.

Finally, the smallest detection will happen at the scale of 52 x 52 and shall detect the smallest objects located in on the image. After this step, the algorithm has detected what it thinks to be all of the objects in the image, yet it has still to predict the class and define the borders of the object.

The AI now must make a prediction.

AI Makes Prediction

This prediction is made using the aforementioned convolutional neural network. This network will use regression to try to classify each cell. This means that each cell will be rated with a list of probabilities for each class. The class with the highest probability will determine what the prediction of the YOLO algorithm will be. Now bounding boxes will be formed.

Bounding Boxes and Thresholds

Bounding boxes are highlighted borders which contain the identified object. Each bounding box will have six attributes. The shape of the bounding box will be defined with four attributes, the x and y value of the center coordinate and the height and width of the box. The other two attributes define the predicted class of the bounding box, and the AI’s confidence that the object is contained within. However, each cell can make many bounding boxes which is why we must filter some out.

Thresholds will be used to determine the bounding boxes which are worth keeping. The two thresholds which are used, are the “Object Threshold” and “Overlap Threshold.” The “Object Threshold” will be the minimum confidence the AI must have in its prediction. The “Overlap Threshold” will make sure that no two bounding boxes are made for the same image. This is important as multiple bounding boxes can be made for the same object. By finetuning the thresholds, the program will be left with a cleaner more accurate image.

After this step is completed, the YOLO algorithm has identified the locations, boundaries, and class of each object both quickly and accurately.

The YOLO Algorithm in Object Detection

The YOLO algorithm in Object Detection works 65 frames per second, which means that assuming you read this at the average reading speed of about 300 words per minute the YOLO algorithm would have been able to process over 9568 images. The applications for such a technology are limitless, spanning from autonomous driving to wildlife identification.

The YOLO algorithm will play a huge role in objection detection in the future, no matter the discipline to which they are applied.

Previous
Previous

High School Student Blog: Understanding Gradient Descent in AI (Part 1)

Next
Next

High School Student Blog: Project Bridging AI and Robotics