Validating YOLOv8 Detection, Segmentation, and Pose Accuracy#

Introduction#

This tutorial demonstrates how to validate the accuracy (mAP 0.50:0.95) of a pretrained YOLOv8 checkpoint on the COCO dataset. YOLOv8, developed by Ultralytics, is a state-of-the-art object detection model. The tutorial is suitable for users who wish to validate publicly available models or their own custom-trained ones.

Environment Setup#

First, create a new Python virtual environment with the MemryX SDK. Instructions can be found in Installing MemryX SDK Tools. This tutorial assumes Python 3.10 is being used. Next, install the ultralytics package:

pip install ultralytics

Download and Compile Model#

When initializing the YOLOv8 model, it will automatically download the specified checkpoint if it’s not already available. YOLOv8 models come in various sizes (‘n’, ‘s’, ‘m’, ‘l’, ‘x’). For this tutorial, we’ll use the medium size (‘m’):

model = YOLO(f"weights/yolov8m.pt")
model = YOLO(f"weights/yolov8m-seg.pt")
model = YOLO(f"weights/yolov8m-pose.pt")

You can optionally run validation on the COCO dataset using your CPU or GPU. If the COCO dataset is not already on your system, it will be downloaded (approximately 20GB). Running this validation will establish a baseline mAP 0.50:0.95 = 50.2% for the medium detection model:

model.val()

Sample output:

...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.502
...

Before compiling the model, export it to a supported format like ONNX. Run the following to export the model:

model.export(format='onnx', simplify=True, batch=1)

Note

A batch size of 1 is much slower but simplifies the rest of the implementation greatly. Running larger batches requires modifying the dataloader to yield only full batches and modifying the processing steps between the accelerator and onnx runtimes. Refer to High Precision Output Channels with YOLOv7 for an example.

To compile the model for use with MemryX accelerators, navigate to the weights folder and execute the following commands (alternatively, use the MemryX Python API):

mx_nc -v --autocrop -m yolov8m.onnx
mx_nc -v --autocrop -m yolov8m-seg.onnx
mx_nc -v --autocrop -m yolov8m-pose.onnx

For the detection model, this outputs yolov8m.dfp which contains the main body of the model to run on the accelerator and yolov8m_post.onnx which contains post-processing steps to run on the host.

Evaluation on MXA#

To run validation using MXA, we need to define a custom BaseValidator class that the ultralytics API can use. The only method we must override is BaseValidator.__call__ which contains the validation loop (reference). The key change is to use the MXA and Onnx runtimes to produce the model outputs instead of the torch model.

The implementations for each application are provided below. All three are largely the same as the original with some simplifications. These implementations are meant to be simple rather than optimal. As mentioned earlier, they can be sped up greatly with batching but become more difficult to understand.

import torch
import numpy as np
import json
from pathlib import Path

from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM

import memryx as mx
import onnxruntime as ort


class MxaDetectionValidator(DetectionValidator):
    """
    The Validator must be a child of BaseValidator which is the parent
    of DetectionValidator. The BaseValidator defines the __call__
    method which we need to override.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Set required attributes
        self.stride = 32
        self.training = False

        model_name = Path(self.args.model).stem
        LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")

        # Ensure your paths/naming scheme matches
        self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
        self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")

    def __call__(self, model):
        model.eval()

        # Create COCO dataloader
        self.data = check_det_dataset(self.args.data)
        self.dataloader = self.get_dataloader(
            self.data.get(self.args.split), self.args.batch
        )

        # Validation Loop
        self.init_metrics((model))
        self.jdict = []
        progress_bar = TQDM(
            self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
        )
        for batch in progress_bar:
            batch = self.preprocess(batch)
            preds = self.mxa_detect(batch["img"])
            preds = self.postprocess(preds)
            self.update_metrics(preds, batch)

        # Compute and print stats
        stats = self.get_stats()
        self.check_stats(stats)
        self.finalize_metrics()
        self.print_results()

        # Save predictions and evaluate on pycocotools
        with open(str(self.save_dir / "predictions.json"), "w") as f:
            LOGGER.info(f"Saving {f.name}...")
            json.dump(self.jdict, f)
        stats = self.eval_json(stats)

        return stats

    def mxa_detect(self, img):
        """
        Detection using MXA accelerator.

        Args:
            img (torch.Tensor): Input image. (1, 3, 640, 640)

        Returns:
            preds (list): List of length 2.
                preds[0] (torch.Tensor): Predictions. (1, 84, 8400)
                preds[1] (None): Unused fmaps
        Notes:
            Fj in (64, 80) and Fi in (80, 40, 20)
        """
        # Pass images through accelerator
        img = img.detach().cpu().numpy()  # (1, 3, 640, 640)
        img = np.transpose(img, (2, 3, 0, 1))  # (640, 640, 1, 3)
        accl_out = self.mxa.run(img)  # (6, Fi, Fi, Fj)

        # Process accl out for onnxruntime
        onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
        onnx_inps = [
            np.transpose(fmap, (2, 0, 1))[np.newaxis, ...] for fmap in accl_out
        ]  # (6, 1, Fj, Fi, Fi)
        input_feed = {k: v for k, v in zip(onnx_inp_names, onnx_inps)}

        # Pass fmaps through onnxruntime
        onnx_out = self.ort.run(None, input_feed)
        out = torch.from_numpy(onnx_out[0])  # (1, 84, 8400)

        preds = [out, None]
        return preds


import torch
import numpy as np
import json
from pathlib import Path

from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM

import memryx as mx
import onnxruntime as ort


class MxaSegmentationValidator(SegmentationValidator):
    """
    The Validator must be a child of BaseValidator which is the parent
    of SegmentationValidator. The BaseValidator defines the __call__
    method which we need to override.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Set required attributes
        self.stride = 32
        self.training = False
        self.args.plots = False

        model_name = Path(self.args.model).stem
        LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")

        # Ensure your paths/naming scheme matches
        self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
        self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")

    def __call__(self, model):
        model.eval()

        # Create COCO dataloader
        self.data = check_det_dataset(self.args.data)
        self.dataloader = self.get_dataloader(
            self.data.get(self.args.split), self.args.batch
        )

        # Validation Loop
        self.init_metrics((model))
        self.jdict = []
        progress_bar = TQDM(
            self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
        )
        for i, batch in enumerate(progress_bar):
            self.batch_i = i  # For plots
            batch = self.preprocess(batch)
            preds = self.mxa_segment(batch["img"])
            preds = self.postprocess(preds)
            self.update_metrics(preds, batch)

        # Compute and print stats
        stats = self.get_stats()
        self.check_stats(stats)
        self.finalize_metrics()
        self.print_results()

        # Save predictions and evaluate on pycocotools
        with open(str(self.save_dir / "predictions.json"), "w") as f:
            LOGGER.info(f"Saving {f.name}...")
            json.dump(self.jdict, f)
        stats = self.eval_json(stats)

        return stats

    def mxa_segment(self, img):
        """
        Segmentation using MXA accelerator.

        Args:
            img (torch.Tensor): Input image. (1, 3, 640, 640)

        Returns:
            preds (list): List of length 2.
                preds[0] (torch.Tensor): Boxes (1, 116, 8400)
                preds[1] (torch.Tensor): Masks (1, 32, 160, 160)

        Notes:
            For shapes: Fj in (64, 80) and Fi in (80, 40, 20)
        """
        # Pass images through accelerator
        img = img.detach().cpu().numpy()  # (1, 3, 640, 640)
        img = np.transpose(img, (2, 3, 0, 1))  # (640, 640, 1, 3)
        accl_out = self.mxa.run(img)  # (10, ...)

        # Prepare accelerator output as input to onnx post-processor
        # Reorder names to match accl output order
        onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
        for i, j in [(3, 7), (6, 8)]:
            onnx_inp_names.insert(i, onnx_inp_names.pop(j))

        # Trailing reshapes need to be handled manually
        onnx_inps = [
            np.transpose(fmap, (2, 0, 1))[np.newaxis, ...] for fmap in accl_out
        ]  # (10, 1, ...)
        input_feed = {
            name: (
                fmap
                if "Reshape" not in name
                else np.reshape(fmap, (1, fmap.shape[1], -1))
            )
            for name, fmap in zip(onnx_inp_names, onnx_inps)
        }

        onnx_out = self.ort.run(None, input_feed)
        preds = [
            torch.from_numpy(onnx_out[1]),  # Boxes (1, 116, 8400)
            torch.from_numpy(onnx_out[0]),  # Masks (1, 32, 160, 160)
        ]
        return preds


import torch
import numpy as np
import json
from pathlib import Path

from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM

import memryx as mx
import onnxruntime as ort


class MxaPoseValidator(PoseValidator):
    """
    The Validator must be a child of BaseValidator which is the parent
    of PoseValidator. The BaseValidator defines the __call__ method
    which we need to override.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Set required attributes
        self.stride = 32
        self.training = False

        model_name = Path(self.args.model).stem
        LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")

        # Create MXA and Onnx runtimes
        self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
        self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")

    def __call__(self, model):
        model.eval()

        # Create COCO dataloader
        self.data = check_det_dataset(self.args.data)
        self.dataloader = self.get_dataloader(
            self.data.get(self.args.split), self.args.batch
        )

        # Validation Loop
        self.init_metrics((model))
        self.jdict = []
        progress_bar = TQDM(
            self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
        )
        for batch in progress_bar:
            batch = self.preprocess(batch)
            preds = self.mxa_pose(batch["img"])
            preds = self.postprocess(preds)
            self.update_metrics(preds, batch)

        # Compute and print stats
        stats = self.get_stats()
        self.check_stats(stats)
        self.finalize_metrics()
        self.print_results()

        # Save predictions and evaluate on pycocotools
        with open(str(self.save_dir / "predictions.json"), "w") as f:
            LOGGER.info(f"Saving {f.name}...")
            json.dump(self.jdict, f)
        stats = self.eval_json(stats)

        return stats

    def mxa_pose(self, img):
        """
        Pose Estimation using MXA accelerator.

        Args:
            img (torch.Tensor): Input image. 1, 3, 640, 640)

        Returns:
            preds (list): List of length 2.
                preds[0] (torch.Tensor): Predictions. (1, 56, 8400)
                preds[1] (None): Unused loss output
        """
        # Pass images through accelerator
        img = img.detach().cpu().numpy()  # (1, 3, 640, 640)
        img = np.transpose(img, (2, 3, 0, 1))  # (640, 640, 1, 3)
        accl_out = self.mxa.run(img)  # (9, ...)

        # Process accl out for onnxruntime
        # Reorder names to match accl output order
        onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
        for i, j in [(2, 6), (5, 7)]:
            onnx_inp_names.insert(i, onnx_inp_names.pop(j))
        # Trailing reshapes need to be handled manually
        onnx_inps = [
            np.transpose(fmap, (2, 0, 1))[np.newaxis, ...] for fmap in accl_out
        ]  # (9, 1, ...)
        input_feed = {
            name: (
                fmap
                if "Reshape" not in name
                else np.reshape(fmap, (1, fmap.shape[1], -1))
            )
            for name, fmap in zip(onnx_inp_names, onnx_inps)
        }

        # Pass fmaps through onnxruntime
        onnx_out = self.ort.run(None, input_feed)
        out = torch.from_numpy(onnx_out[0])  # (1, 56, 8400)

        preds = [out, None]
        return preds


Once the validator is set up, you can run the following command to validate the model on MXA. You should see a mAP 0.50:0.95 of approximately 49.9% for the medium detection model:

model.val(validator=MxaDetectionValidator, batch=1, rect=False)
model.val(validator=MxaSegmentationValidator, batch=1, rect=False)
model.val(validator=MxaPoseValidator, data="coco-pose.yaml", batch=1, rect=False)

Sample output:

...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.499
...

Results#

YOLOv8 Detection Accuracy#

Model Size

mAP CUDA

mAP MXA

Nano

37.3

37.1

Small

44.9

44.7

Medium

50.2

49.9

Large

52.9

52.6

X-Large

53.9

53.6

YOLOv8 Segmentation Accuracy#

Model Size

mAP Box CUDA

mAP Box MXA

mAP Mask CUDA

mAP Mask MXA

Nano

36.7

36.3

30.5

30.0

Medium

49.9

49.6

40.8

40.3

Large

52.3

52.0

42.6

42.1

X-Large

53.4

53.2

43.4

42.8

YOLOv8 Pose Accuracy#

Model Size

mAP Pose CUDA

mAP Pose MXA

Nano

50.4

49.5

Small

60.0

59.0

Medium

65.0

63.5

Large

67.6

66.4

X-Large

69.2

68.0

Note

All mAP numbers refer to mAP 0.50:0.95.

Third-Party Licenses#

This tutorial utilizes models and APIs from ultralytics. The licenses for these dependencies are outlined below:

Summary#

In this tutorial, we demonstrated how to validate the accuracy of a pretrained YOLOv8 model on the COCO dataset using the MemryX accelerator. The results indicate a slight decrease in performance, with the mAP 0.50:0.95 dropping from 50.2% on CUDA to 49.9% on MXA for the medium detection model. As shown in the table above, consistent small reductions were observed across all model sizes within each application. This performance was achieved without any tuning or retraining of the model, simply by running it out-of-the-box on the MemryX hardware.