Validating YOLOv8 Detection, Segmentation, and Pose Accuracy#

Introduction#

This tutorial demonstrates how to validate the accuracy (mAP _0.50:0.95) of a pretrained YOLOv8 checkpoint on the COCO dataset. YOLOv8, developed by Ultralytics, is a state-of-the-art object detection model. The tutorial is suitable for users who wish to validate publicly available models or their own custom-trained ones.

Environment Setup#

First, create a new Python virtual environment with the MemryX SDK. Instructions can be found in Install Tools. This tutorial assumes Python 3.12 is being used.

1. Download and Compile Model#

When initializing the YOLOv8 model, it will automatically download the specified checkpoint if it’s not already available. YOLOv8 models come in various sizes (‘n’, ‘s’, ‘m’, ‘l’, ‘x’). For this tutorial, we’ll use the medium size (‘m’):

Detect

model = YOLO(f"weights/yolov8m.pt")

Segment

model = YOLO(f"weights/yolov8m-seg.pt")

Pose

model = YOLO(f"weights/yolov8m-pose.pt")

You can optionally run validation on the COCO dataset using your CPU or GPU. If the COCO dataset is not already on your system, it will be downloaded (approximately 20GB). Running this validation will establish a baseline mAP _0.50:0.95 = 50.2% for the medium detection model:

model.val()

Sample output:

...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.502
...

Before compiling the model, export it to a supported format like ONNX. Run the following to export the model:

model.export(format='onnx', simplify=True, batch=1)

Note

A batch size of 1 is much slower but simplifies the rest of the implementation greatly. Running larger batches requires modifying the dataloader to yield only full batches and modifying the processing steps between the accelerator and onnx runtimes. Refer to High Precision Output Channels with YOLOv7 for an example.

To compile the model for use with MemryX accelerators, navigate to the weights folder and execute the following commands (alternatively, use the MemryX Python API):

Detect

mx_nc -v --autocrop -m yolov8m.onnx

Segment

mx_nc -v --autocrop -m yolov8m-seg.onnx

Pose

mx_nc -v --autocrop -m yolov8m-pose.onnx

For the detection model, this outputs yolov8m.dfp which contains the main body of the model to run on the accelerator and yolov8m_post.onnx which contains post-processing steps to run on the host.

2. Evaluation on MXA#

To run validation using MXA, we need to define a custom BaseValidator class that the ultralytics API can use. The only method we must override is BaseValidator.__call__ which contains the validation loop (reference). The key change is to use the MXA and Onnx runtimes to produce the model outputs instead of the torch model.

The implementations for each application are provided below. All three are largely the same as the original with some simplifications. These implementations are meant to be simple rather than optimal. As mentioned earlier, they can be sped up greatly with batching but become more difficult to understand.

Detect

import torch
import numpy as np
import json
from pathlib import Path
import argparse, os  
import onnx

from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM

import memryx as mx
import onnxruntime as ort


class MxaDetectionValidator(DetectionValidator):
    """
    The Validator must be a child of BaseValidator which is the parent
    of DetectionValidator. The BaseValidator defines the __call__
    method which we need to override.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Set required attributes
        self.stride = 32
        self.training = False

        model_name = Path(self.args.model).stem
        LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")

        # Ensure your paths/naming scheme matches
        self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
        self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")

    def __call__(self, model):
        model.eval()

        # Create COCO dataloader
        self.data = check_det_dataset(self.args.data)
        self.dataloader = self.get_dataloader(
            self.data.get(self.args.split), self.args.batch
        )

        # Validation Loop
        self.init_metrics((model))
        self.jdict = []
        progress_bar = TQDM(
            self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
        )
        for batch in progress_bar:
            batch = self.preprocess(batch)
            preds = self.mxa_detect(batch["img"])
            preds = self.postprocess(preds)
            self.update_metrics(preds, batch)

        # Compute and print stats
        stats = self.get_stats()
        self.finalize_metrics()
        self.print_results()

        # Save predictions and evaluate on pycocotools
        with open(str(self.save_dir / "predictions.json"), "w") as f:
            LOGGER.info(f"Saving {f.name}...")
            json.dump(self.jdict, f)
        stats = self.eval_json(stats)

        return stats

    def mxa_detect(self, img):
        """
        Detection using MXA accelerator.

        Args:
            img (torch.Tensor): Input image. (1, 3, 640, 640)

        Returns:
            preds (list): List of length 2.
                preds[0] (torch.Tensor): Predictions. (1, 84, 8400)
                preds[1] (None): Unused fmaps
        Notes:
            Fj in (64, 80) and Fi in (80, 40, 20)
        """
        # Pass images through accelerator
        img = img.detach().cpu().numpy()  # (1, 3, 640, 640)
        accl_out = self.mxa.run(img)  

        # Process accl out for onnxruntime
        onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
        input_feed = {k: v for k, v in zip(onnx_inp_names, accl_out)}

        # Pass fmaps through onnxruntime
        onnx_out = self.ort.run(None, input_feed)
        out = torch.from_numpy(onnx_out[0])  # (1, 84, 8400)

        preds = [out, None]
        return preds

Segment

import torch
import numpy as np
import json
from pathlib import Path
import argparse, os  
import onnx

from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM

import memryx as mx
import onnxruntime as ort


class MxaSegmentationValidator(SegmentationValidator):
    """
    The Validator must be a child of BaseValidator which is the parent
    of SegmentationValidator. The BaseValidator defines the __call__
    method which we need to override.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Set required attributes
        self.stride = 32
        self.training = False
        self.args.plots = False

        model_name = Path(self.args.model).stem
        LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")

        # Ensure your paths/naming scheme matches
        self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
        self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")

    def __call__(self, model):
        model.eval()

        # Create COCO dataloader
        self.data = check_det_dataset(self.args.data)
        self.dataloader = self.get_dataloader(
            self.data.get(self.args.split), self.args.batch
        )

        # Validation Loop
        self.init_metrics((model))
        self.jdict = []
        progress_bar = TQDM(
            self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
        )
        for i, batch in enumerate(progress_bar):
            self.batch_i = i  # For plots
            batch = self.preprocess(batch)
            preds = self.mxa_segment(batch["img"])
            preds = self.postprocess(preds)
            self.update_metrics(preds, batch)

        # Compute and print stats
        stats = self.get_stats()
        self.finalize_metrics()
        self.print_results()

        # Save predictions and evaluate on pycocotools
        with open(str(self.save_dir / "predictions.json"), "w") as f:
            LOGGER.info(f"Saving {f.name}...")
            json.dump(self.jdict, f)
        stats = self.eval_json(stats)

        return stats

    def mxa_segment(self, img):
        """
        Segmentation using MXA accelerator.

        Args:
            img (torch.Tensor): Input image. (1, 3, 640, 640)

        Returns:
            preds (list): List of length 2.
                preds[0] (torch.Tensor): Boxes (1, 116, 8400)
                preds[1] (torch.Tensor): Masks (1, 32, 160, 160)

        Notes:
            For shapes: Fj in (64, 80) and Fi in (80, 40, 20)
        """
        # Pass images through accelerator
        img = img.detach().cpu().numpy()  # (1, 3, 640, 640)
        accl_out = self.mxa.run(img)  # (10, ...)

        # Prepare accelerator output as input to onnx post-processor
        # Reorder names to match accl output order
        onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
        for i, j in [(3, 7), (6, 8)]:
            onnx_inp_names.insert(i, onnx_inp_names.pop(j))

        # Trailing reshapes need to be handled manually
        input_feed = {
            name: (
                fmap
                if "Reshape" not in name
                else np.reshape(fmap, (1, fmap.shape[1], -1))
            )
            for name, fmap in zip(onnx_inp_names, accl_out)
        }

        onnx_out = self.ort.run(None, input_feed)
        preds = [
            torch.from_numpy(onnx_out[1]),  # Boxes (1, 116, 8400)
            torch.from_numpy(onnx_out[0]),  # Masks (1, 32, 160, 160)
        ]
        return preds

Pose

import torch
import numpy as np
import json
from pathlib import Path
import argparse, os  
import onnx

from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM

import memryx as mx
import onnxruntime as ort


class MxaPoseValidator(PoseValidator):
    """
    The Validator must be a child of BaseValidator which is the parent
    of PoseValidator. The BaseValidator defines the __call__ method
    which we need to override.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Set required attributes
        self.stride = 32
        self.training = False

        model_name = Path(self.args.model).stem
        LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")

        # Create MXA and Onnx runtimes
        self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
        self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")

    def __call__(self, model):
        model.eval()

        # Create COCO dataloader
        self.data = check_det_dataset(self.args.data)
        self.dataloader = self.get_dataloader(
            self.data.get(self.args.split), self.args.batch
        )

        # Validation Loop
        self.init_metrics((model))
        self.jdict = []
        progress_bar = TQDM(
            self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
        )
        for batch in progress_bar:
            batch = self.preprocess(batch)
            preds = self.mxa_pose(batch["img"])
            preds = self.postprocess(preds)
            self.update_metrics(preds, batch)

        # Compute and print stats
        stats = self.get_stats()
        self.finalize_metrics()
        self.print_results()

        # Save predictions and evaluate on pycocotools
        with open(str(self.save_dir / "predictions.json"), "w") as f:
            LOGGER.info(f"Saving {f.name}...")
            json.dump(self.jdict, f)
        stats = self.eval_json(stats)

        return stats

    def mxa_pose(self, img):
        """
        Pose Estimation using MXA accelerator.

        Args:
            img (torch.Tensor): Input image. 1, 3, 640, 640)

        Returns:
            preds (list): List of length 2.
                preds[0] (torch.Tensor): Predictions. (1, 56, 8400)
                preds[1] (None): Unused loss output
        """
        # Pass images through accelerator
        img = img.detach().cpu().numpy()  # (1, 3, 640, 640)
        accl_out = self.mxa.run(img)  # (9, ...)

        # Process accl out for onnxruntime
        # Reorder names to match accl output order
        onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
        for i, j in [(2, 6), (5, 7)]:
            onnx_inp_names.insert(i, onnx_inp_names.pop(j))
        # Trailing reshapes need to be handled manually
        input_feed = {
            name: (
                fmap
                if "Reshape" not in name
                else np.reshape(fmap, (1, fmap.shape[1], -1))
            )
            for name, fmap in zip(onnx_inp_names, accl_out)
        }

        # Pass fmaps through onnxruntime
        onnx_out = self.ort.run(None, input_feed)
        out = torch.from_numpy(onnx_out[0])  # (1, 56, 8400)

        preds = [out, None]
        return preds

Once the validator is set up, you can run the following command to validate the model on MXA. You should see a mAP _0.50:0.95 of approximately 49.9% for the medium detection model:

Detect

model.val(validator=MxaDetectionValidator, batch=1, rect=False)

Segment

model.val(validator=MxaSegmentationValidator, batch=1, rect=False)

Pose

model.val(validator=MxaPoseValidator, data="coco-pose.yaml", batch=1, rect=False)

Sample output:

...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.499
...

3. Results#

Detect

YOLOv8 Detection Accuracy#
Model Size	mAP CUDA	mAP MXA
Nano	37.3	37.1
Small	44.9	44.7
Medium	50.2	49.9
Large	52.9	52.6
X-Large	53.9	53.6

Segment

YOLOv8 Segmentation Accuracy#
Model Size	mAP Box CUDA	mAP Box MXA	mAP Mask CUDA	mAP Mask MXA
Nano	36.7	36.3	30.5	30.0
Medium	49.9	49.6	40.8	40.3
Large	52.3	52.0	42.6	42.1
X-Large	53.4	53.2	43.4	42.8

Pose

YOLOv8 Pose Accuracy#
Model Size	mAP Pose CUDA	mAP Pose MXA
Nano	50.4	49.5
Small	60.0	59.0
Medium	65.0	63.5
Large	67.6	66.4
X-Large	69.2	68.0

Note

All mAP numbers refer to mAP _0.50:0.95.

4. Third-Party Licenses#

This tutorial utilizes models and APIs from ultralytics. The licenses for these dependencies are outlined below:

Models: Yolov8 Models from Ultralytics
- AGPLv3
Code and Pre/Post-Processing: The Validator APIs were sourced from their GitHub
- AGPLv3

5. Summary#

In this tutorial, we demonstrated how to validate the accuracy of a pretrained YOLOv8 model on the COCO dataset using the MemryX accelerator. The results indicate a slight decrease in performance, with the mAP _0.50:0.95 dropping from 50.2% on CUDA to 49.9% on MXA for the medium detection model. As shown in the table above, consistent small reductions were observed across all model sizes within each application. This performance was achieved without any tuning or retraining of the model, simply by running it out-of-the-box on the MemryX hardware.