YOLOv6 - High-Performance Object Detection on the Edge with Qualcomm RB5

May 22, 2024

Meituan‘s YOLOv6 object detection model can run on the Qualcomm Robotics RB5 AI acceleration platform. This means that the high-performance YOLOv6 model can now be used for real-time object detection in edge AI applications.

YOLOv6 is a major step forward for the YOLO series models compared to its predecessors YOLOv5, YOLOv7, and YOLOv8. It introduces numerous innovative enhancements to the network architecture and training strategy, resulting in a remarkable increase in accuracy while maintaining the high inference speed that has made YOLO widely sought after.

Yolov6: Next-Gen Object Detection

Introduction:

This paper presents updates and enhancements to the YOLOv6 real-time object detection model, referred to as YOLOv6 v3.0.
YOLOv6 introduces various innovations in the network architecture and training strategies to boost accuracy while maintaining high inference speed.

Network Design:

The addition of a Bi-directional Concatenation (BiC) module in the neck provides more accurate localization signals by fusing low-level feature maps.
The SPPF block is simplified into a SimCSPSPPF block to improve representational ability while maintaining efficiency.
RepBi-PAN neck is introduced, incorporating BiC and SimCSPSPPF.

Anchor-Aided Training (AAT):

AAT combines the advantages of anchor-based and anchor-free paradigms during training by adding auxiliary anchor-based branches.
This boosts accuracy, especially for small objects, without compromising inference speed.

Self-Distillation Enhancements:

For large models, a weight decay strategy is applied to the knowledge distillation loss for enhanced performance.
For small models, a Decoupled Localization Distillation (DLD) method integrates a heavy auxiliary regression branch only during distillation training.

Deeper Models:

YOLOv6 is extended to M6 and L6 versions with additional backbone/neck stages to improve detection of small and large objects at higher resolutions.

Results:

YOLOv6-N hits 37.5% AP on COCO at 1187 FPS on a Tesla T4.
YOLOv6-S achieves 45.0% AP at 484 FPS, outperforming peers like YOLOv5-S.
YOLOv6-M/L also outperforms other detectors at similar speeds.
YOLOv6-L6 achieves a new state-of-the-art 57.2% AP for real-time detection.

The impressive speed and accuracy of the YOLOv6 family make it an excellent choice for deploying high-performance object detection at the edge.

The conversion process: Key Steps

Set up the host environment on an x86 Ubuntu 18.04 machine
Download the required conversion tools
Convert the YOLOv6 PyTorch model to ONNX format
Convert the ONNX model to a Qualcomm .dlc format for RB5

Required Hardware:

x86 host PC running Ubuntu 18.04
Qualcomm Robotics RB5 acceleration platform

Step 1: Setup the Host Environment:

Install Python3 and pip, and then set the SNPE conversion tool environment from Qualcomm

sudo apt install python3-pip
pip3 install --upgrade pip
sudo update-alternatives --install /usr/bin/python python/udr/bin/python3.6.1
wget https://xxxx/snpe-1.68.0.zip
unzip snpe-1.68.0.zip
export SNPE_ROOT=/path/to/snpe-1.68.0
export PYTHONPATH=$PYTHONPATH:$SNPE_ROOT/lib/python
source $SNPE_ROOT/bin/dependencies.sh
source $SNPE_ROOT/bin/check_python_depends.sh

Step 2: Download the YOLOv6 conversion code and SNPE conversion tool:

wget https://xxxx/YOLOv6.tar.gz

Step 3: Convert to ONNX:

Use the YOLOv6 export script to convert the PyTorch model to ONNX

tar -zxvf YOLOv6.tar.gz
cd YOLOv6/deploy/ONNX/
python3 export_onnx.py --weights yolov6n.pt --img 288 --batch 1

This will generate a yolov6n.onnx model file.

Step 4: Convert to .dlc:

Use the SNPE conversion tool to convert the ONNX model to the .dlc format for RB5

cd $SNPE_ROOT/bin/x86_64-linux-clang
./snpe-onnx-to-dlc --input_network yolov6n.onnx --output_path yolov6n.dlc

The yolov6n.dlc file can now be loaded onto the Qualcomm RB5 for accelerated YOLOv6 inference at the edge.

By following these steps, developers can take advantage of the high accuracy of the YOLOv6 model while benefiting from the performance and power efficiency of the Qualcomm Robotics RB5 platform for edge AI deployments.