Validating YOLOv8 Detection, Segmentation, and Pose Accuracy#
Introduction#
This tutorial demonstrates how to validate the accuracy (mAP 0.50:0.95) of a pretrained YOLOv8 checkpoint on the COCO dataset. YOLOv8, developed by Ultralytics, is a state-of-the-art object detection model. The tutorial is suitable for users who wish to validate publicly available models or their own custom-trained ones.
Environment Setup#
First, create a new Python virtual environment with the MemryX SDK. Instructions can be found in Installing MemryX SDK Tools. This tutorial assumes Python 3.10 is being used. Next, install the ultralytics
package:
pip install ultralytics
Download and Compile Model#
When initializing the YOLOv8 model, it will automatically download the specified checkpoint if it’s not already available. YOLOv8 models come in various sizes (‘n’, ‘s’, ‘m’, ‘l’, ‘x’). For this tutorial, we’ll use the medium size (‘m’):
model = YOLO(f"weights/yolov8m.pt")
model = YOLO(f"weights/yolov8m-seg.pt")
model = YOLO(f"weights/yolov8m-pose.pt")
You can optionally run validation on the COCO dataset using your CPU or GPU. If the COCO dataset is not already on your system, it will be downloaded (approximately 20GB). Running this validation will establish a baseline mAP 0.50:0.95 = 50.2% for the medium detection model:
model.val()
Sample output:
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.502
...
Before compiling the model, export it to a supported format like ONNX. Run the following to export the model:
model.export(format='onnx', simplify=True, batch=1)
Note
A batch size of 1 is much slower but simplifies the rest of the implementation greatly. Running larger batches requires modifying the dataloader to yield only full batches and modifying the processing steps between the accelerator and onnx runtimes. Refer to High Precision Output Channels with YOLOv7 for an example.
To compile the model for use with MemryX accelerators, navigate to the weights
folder and execute the following commands (alternatively, use the MemryX Python API):
mx_nc -v --autocrop -m yolov8m.onnx
mx_nc -v --autocrop -m yolov8m-seg.onnx
mx_nc -v --autocrop -m yolov8m-pose.onnx
For the detection model, this outputs yolov8m.dfp
which contains the main body of the model to run on the accelerator and yolov8m_post.onnx
which contains post-processing steps to run on the host.
Evaluation on MXA#
To run validation using MXA, we need to define a custom BaseValidator
class that the ultralytics
API can use. The only method we must override is BaseValidator.__call__
which contains the validation loop (reference). The key change is to use the MXA and Onnx runtimes to produce the model outputs instead of the torch model.
The implementations for each application are provided below. All three are largely the same as the original with some simplifications. These implementations are meant to be simple rather than optimal. As mentioned earlier, they can be sped up greatly with batching but become more difficult to understand.
import torch
import numpy as np
import json
from pathlib import Path
from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM
import memryx as mx
import onnxruntime as ort
class MxaDetectionValidator(DetectionValidator):
"""
The Validator must be a child of BaseValidator which is the parent
of DetectionValidator. The BaseValidator defines the __call__
method which we need to override.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Set required attributes
self.stride = 32
self.training = False
model_name = Path(self.args.model).stem
LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")
# Ensure your paths/naming scheme matches
self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")
def __call__(self, model):
model.eval()
# Create COCO dataloader
self.data = check_det_dataset(self.args.data)
self.dataloader = self.get_dataloader(
self.data.get(self.args.split), self.args.batch
)
# Validation Loop
self.init_metrics((model))
self.jdict = []
progress_bar = TQDM(
self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
)
for batch in progress_bar:
batch = self.preprocess(batch)
preds = self.mxa_detect(batch["img"])
preds = self.postprocess(preds)
self.update_metrics(preds, batch)
# Compute and print stats
stats = self.get_stats()
self.check_stats(stats)
self.finalize_metrics()
self.print_results()
# Save predictions and evaluate on pycocotools
with open(str(self.save_dir / "predictions.json"), "w") as f:
LOGGER.info(f"Saving {f.name}...")
json.dump(self.jdict, f)
stats = self.eval_json(stats)
return stats
def mxa_detect(self, img):
"""
Detection using MXA accelerator.
Args:
img (torch.Tensor): Input image. (1, 3, 640, 640)
Returns:
preds (list): List of length 2.
preds[0] (torch.Tensor): Predictions. (1, 84, 8400)
preds[1] (None): Unused fmaps
Notes:
Fj in (64, 80) and Fi in (80, 40, 20)
"""
# Pass images through accelerator
img = img.detach().cpu().numpy() # (1, 3, 640, 640)
img = np.transpose(img, (2, 3, 0, 1)) # (640, 640, 1, 3)
accl_out = self.mxa.run(img) # (6, Fi, Fi, Fj)
# Process accl out for onnxruntime
onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
onnx_inps = [
np.transpose(fmap, (2, 0, 1))[np.newaxis, ...] for fmap in accl_out
] # (6, 1, Fj, Fi, Fi)
input_feed = {k: v for k, v in zip(onnx_inp_names, onnx_inps)}
# Pass fmaps through onnxruntime
onnx_out = self.ort.run(None, input_feed)
out = torch.from_numpy(onnx_out[0]) # (1, 84, 8400)
preds = [out, None]
return preds
import torch
import numpy as np
import json
from pathlib import Path
from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM
import memryx as mx
import onnxruntime as ort
class MxaSegmentationValidator(SegmentationValidator):
"""
The Validator must be a child of BaseValidator which is the parent
of SegmentationValidator. The BaseValidator defines the __call__
method which we need to override.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Set required attributes
self.stride = 32
self.training = False
self.args.plots = False
model_name = Path(self.args.model).stem
LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")
# Ensure your paths/naming scheme matches
self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")
def __call__(self, model):
model.eval()
# Create COCO dataloader
self.data = check_det_dataset(self.args.data)
self.dataloader = self.get_dataloader(
self.data.get(self.args.split), self.args.batch
)
# Validation Loop
self.init_metrics((model))
self.jdict = []
progress_bar = TQDM(
self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
)
for i, batch in enumerate(progress_bar):
self.batch_i = i # For plots
batch = self.preprocess(batch)
preds = self.mxa_segment(batch["img"])
preds = self.postprocess(preds)
self.update_metrics(preds, batch)
# Compute and print stats
stats = self.get_stats()
self.check_stats(stats)
self.finalize_metrics()
self.print_results()
# Save predictions and evaluate on pycocotools
with open(str(self.save_dir / "predictions.json"), "w") as f:
LOGGER.info(f"Saving {f.name}...")
json.dump(self.jdict, f)
stats = self.eval_json(stats)
return stats
def mxa_segment(self, img):
"""
Segmentation using MXA accelerator.
Args:
img (torch.Tensor): Input image. (1, 3, 640, 640)
Returns:
preds (list): List of length 2.
preds[0] (torch.Tensor): Boxes (1, 116, 8400)
preds[1] (torch.Tensor): Masks (1, 32, 160, 160)
Notes:
For shapes: Fj in (64, 80) and Fi in (80, 40, 20)
"""
# Pass images through accelerator
img = img.detach().cpu().numpy() # (1, 3, 640, 640)
img = np.transpose(img, (2, 3, 0, 1)) # (640, 640, 1, 3)
accl_out = self.mxa.run(img) # (10, ...)
# Prepare accelerator output as input to onnx post-processor
# Reorder names to match accl output order
onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
for i, j in [(3, 7), (6, 8)]:
onnx_inp_names.insert(i, onnx_inp_names.pop(j))
# Trailing reshapes need to be handled manually
onnx_inps = [
np.transpose(fmap, (2, 0, 1))[np.newaxis, ...] for fmap in accl_out
] # (10, 1, ...)
input_feed = {
name: (
fmap
if "Reshape" not in name
else np.reshape(fmap, (1, fmap.shape[1], -1))
)
for name, fmap in zip(onnx_inp_names, onnx_inps)
}
onnx_out = self.ort.run(None, input_feed)
preds = [
torch.from_numpy(onnx_out[1]), # Boxes (1, 116, 8400)
torch.from_numpy(onnx_out[0]), # Masks (1, 32, 160, 160)
]
return preds
import torch
import numpy as np
import json
from pathlib import Path
from ultralytics import YOLO
from ultralytics.models.yolo.detect.val import DetectionValidator
from ultralytics.models.yolo.pose.val import PoseValidator
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.data.utils import check_det_dataset
from ultralytics.utils import LOGGER, TQDM
import memryx as mx
import onnxruntime as ort
class MxaPoseValidator(PoseValidator):
"""
The Validator must be a child of BaseValidator which is the parent
of PoseValidator. The BaseValidator defines the __call__ method
which we need to override.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Set required attributes
self.stride = 32
self.training = False
model_name = Path(self.args.model).stem
LOGGER.info(f"\033[32mRunning {model_name} inference on MXA\033[0m")
# Create MXA and Onnx runtimes
self.mxa = mx.SyncAccl(f"weights/{model_name}.dfp")
self.ort = ort.InferenceSession(f"weights/{model_name}_post.onnx")
def __call__(self, model):
model.eval()
# Create COCO dataloader
self.data = check_det_dataset(self.args.data)
self.dataloader = self.get_dataloader(
self.data.get(self.args.split), self.args.batch
)
# Validation Loop
self.init_metrics((model))
self.jdict = []
progress_bar = TQDM(
self.dataloader, desc=self.get_desc(), total=len(self.dataloader)
)
for batch in progress_bar:
batch = self.preprocess(batch)
preds = self.mxa_pose(batch["img"])
preds = self.postprocess(preds)
self.update_metrics(preds, batch)
# Compute and print stats
stats = self.get_stats()
self.check_stats(stats)
self.finalize_metrics()
self.print_results()
# Save predictions and evaluate on pycocotools
with open(str(self.save_dir / "predictions.json"), "w") as f:
LOGGER.info(f"Saving {f.name}...")
json.dump(self.jdict, f)
stats = self.eval_json(stats)
return stats
def mxa_pose(self, img):
"""
Pose Estimation using MXA accelerator.
Args:
img (torch.Tensor): Input image. 1, 3, 640, 640)
Returns:
preds (list): List of length 2.
preds[0] (torch.Tensor): Predictions. (1, 56, 8400)
preds[1] (None): Unused loss output
"""
# Pass images through accelerator
img = img.detach().cpu().numpy() # (1, 3, 640, 640)
img = np.transpose(img, (2, 3, 0, 1)) # (640, 640, 1, 3)
accl_out = self.mxa.run(img) # (9, ...)
# Process accl out for onnxruntime
# Reorder names to match accl output order
onnx_inp_names = [inp.name for inp in self.ort.get_inputs()]
for i, j in [(2, 6), (5, 7)]:
onnx_inp_names.insert(i, onnx_inp_names.pop(j))
# Trailing reshapes need to be handled manually
onnx_inps = [
np.transpose(fmap, (2, 0, 1))[np.newaxis, ...] for fmap in accl_out
] # (9, 1, ...)
input_feed = {
name: (
fmap
if "Reshape" not in name
else np.reshape(fmap, (1, fmap.shape[1], -1))
)
for name, fmap in zip(onnx_inp_names, onnx_inps)
}
# Pass fmaps through onnxruntime
onnx_out = self.ort.run(None, input_feed)
out = torch.from_numpy(onnx_out[0]) # (1, 56, 8400)
preds = [out, None]
return preds
Once the validator is set up, you can run the following command to validate the model on MXA. You should see a mAP 0.50:0.95 of approximately 49.9% for the medium detection model:
model.val(validator=MxaDetectionValidator, batch=1, rect=False)
model.val(validator=MxaSegmentationValidator, batch=1, rect=False)
model.val(validator=MxaPoseValidator, data="coco-pose.yaml", batch=1, rect=False)
Sample output:
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.499
...
Results#
Model Size |
mAP CUDA |
mAP MXA |
---|---|---|
Nano |
37.3 |
37.1 |
Small |
44.9 |
44.7 |
Medium |
50.2 |
49.9 |
Large |
52.9 |
52.6 |
X-Large |
53.9 |
53.6 |
Model Size |
mAP Box CUDA |
mAP Box MXA |
mAP Mask CUDA |
mAP Mask MXA |
---|---|---|---|---|
Nano |
36.7 |
36.3 |
30.5 |
30.0 |
Medium |
49.9 |
49.6 |
40.8 |
40.3 |
Large |
52.3 |
52.0 |
42.6 |
42.1 |
X-Large |
53.4 |
53.2 |
43.4 |
42.8 |
Model Size |
mAP Pose CUDA |
mAP Pose MXA |
---|---|---|
Nano |
50.4 |
49.5 |
Small |
60.0 |
59.0 |
Medium |
65.0 |
63.5 |
Large |
67.6 |
66.4 |
X-Large |
69.2 |
68.0 |
Note
All mAP numbers refer to mAP 0.50:0.95.
Third-Party Licenses#
This tutorial utilizes models and APIs from ultralytics. The licenses for these dependencies are outlined below:
Models: Yolov8 Models from Ultralytics
Code and Pre/Post-Processing: The Validator APIs were sourced from their GitHub
Summary#
In this tutorial, we demonstrated how to validate the accuracy of a pretrained YOLOv8 model on the COCO dataset using the MemryX accelerator. The results indicate a slight decrease in performance, with the mAP 0.50:0.95 dropping from 50.2% on CUDA to 49.9% on MXA for the medium detection model. As shown in the table above, consistent small reductions were observed across all model sizes within each application. This performance was achieved without any tuning or retraining of the model, simply by running it out-of-the-box on the MemryX hardware.
See also