|
First, it divides the image into a 13×13 grid of cells.
The size of these 169 cells vary depending on the size of the input.
For a 416×416 input size that we used in our experiments, the cell size was 32×32.
Each cell is then responsible for predicting a number of boxes in the image.
For each bounding box, the network also predicts the confidence that the bounding box actually encloses an object, and the probability of the enclosed object being a particular class.
Most of these bounding boxes are eliminated because their confidence is low or because they are enclosing the same object as another bounding box with very high confidence score.
This technique is called non-maximum suppression.
The authors of YOLOv3, Joseph Redmon and Ali Farhadi, have made YOLOv3 faster and more accurate than their previous work YOLOv2. YOLOv3 handles multiple scales better.
FThey have also improved the network by making it bigger and taking it towards residual networks by adding shortcut connections.
|