January 18, 2025

General Segmentation Algorithms

  1. Semantic Segmentation
    • In semantic segmentation, the task is to predict which class each pixel in the image belongs to. The implementation is very simple, and it’s possible to classify all pixels with a single computational flow. Thus, one can say that the computation speed is fast. However, on the other hand, it cannot perform instance segmentation. Therefore, it can’t determine which object a pixel belongs to.
  2. Instance Segmentation
    • Instance segmentation, represented by algorithms like MaskRCNN, is a two-stage method where objects are detected and then cropped, and pixels within the cropped area are classified. By zooming in and segmenting, the accuracy of segmentation is very high. However, due to the two-stage computational method, it often takes more time to compute.

Segmentation in YOLOv8

However, YOLO’s computational procedure is entirely different, making it difficult to incorporate the aforementioned methods. Therefore, YOLOv8 employs a different approach for segmentation. This method seems to be the same as the one called “YOLOACT.” In YOLOACT, as illustrated below (though no diagram is provided), there are two computational flows, and the segmentation mask is estimated through the following steps:

  1. In one computational flow, it determines the object’s class, BBOX’s position, height, width, and mask coefficients.
  2. In another computational flow, it predicts several segmentation masks called prototypes.
  3. The predicted segmentation mask is multiplied by the mask coefficient, and through additive linear combination, the resultant segmentation mask is estimated.
mask_{result} = \sum_{i} c_i \times \text{mask}_{i}

This approach allows YOLOv8 to perform segmentation tasks effectively while maintaining the real-time capabilities characteristic of the YOLO architecture.