|
|
|
|
|
This project provides a pipeline for deploying and performing inference with a YOLOv8 object detection model using Triton Inference Server on Google Cloud’s Vertex AI, locally or Docker based systems. The repository includes scripts for automating the deployment process, a graphical user interface for inference, and performance analysis tools for optimizing the model’s performance.
requirements.txt
: Lists the external libraries and dependencies required for the project.server/
: Contains scripts for deploying the model to Triton Inference Server.
signature-detection/
: Contains scripts for performing inference with the YOLOv8 model.
analyzer/
: Contains results and configuration for performance analysis using Triton Model Analyzer.inference/
: Scripts for performing inference using Triton Client, Vertex AI, or locally and GUI for visualization.
inference_onnx.py
: Script for performing inference with ONNX runtime locally.inference_pipeline.py
: Script for performing inference on images using different methods.predictors.py
: Contains the predictor classes for different inference methods. You can add new predictors for custom inference methods.gui/
: Contains the Gradio interface for interacting with the deployed model. The inference_gui.py
script can be used to test the model in real time. The UI has built-in examples and plots of results and performance.models/
: Contains the Model Repository for Triton Server, including the YOLOv8 model and pre/post-processing scripts in a Ensemble Model.data/
: Contains the datasets and data processing scripts.utils/
: Scripts for uploading/download the model to/from Google Cloud Storage or Azure Stoage and exporting the model to ONNX/TensorRT format.Dockerfile
: Contains the configuration for building the Docker image for Triton Inference Server.
Dockerfile.dev
: Contains the configuration for building the Docker image for local development.docker-compose.yml
: Contains the configuration for running Dockerfile.dev.entrypoint.sh
: Script for initializing the Triton Inference Server with the required configurations.LICENSE
: The license for the project.git clone https://github.com/your-username/t4ai-triton-server.git
pip install -r requirements.txt
deploy_vertex_ai.sh
to deploy the model to Vertex AI Endpoint. Or programmatically using nvidia_triton_custom_container_prediction.ipynb
.serve_triton_local_.py
script can be used to start the server locally.docker-compose.yml
.inference_gui.py
to test the deployed model and visualize the results.inference_pipeline.py
script to select predictor and perform inference on test dataset images.inference_onnx.py
script to perform inference with the ONNX runtime locally.The repository includes an Ensemble Model for the YOLOv8 object detection model. The Ensemble Model combines the YOLOv8 model with pre and post-processing scripts to perform inference on images. The model repository is located in the models/
directory.
flowchart TB
subgraph "Triton Inference Server"
direction TB
subgraph "Ensemble Model Pipeline"
direction TB
subgraph Input
raw["raw_image
(UINT8, [-1])"]
conf["confidence_threshold
(FP16, [1])"]
iou["iou_threshold
(FP16, [1])"]
end
subgraph "Preprocess Py-Backend"
direction TB
pre1["Decode Image
BGR to RGB"]
pre2["Resize (640x640)"]
pre3["Normalize (/255.0)"]
pre4["Transpose
[H,W,C]->[C,H,W]"]
pre1 --> pre2 --> pre3 --> pre4
end
subgraph "YOLOv8 Model ONNX Backend"
yolo["Inference YOLOv8s"]
end
subgraph "Postproces Python Backend"
direction TB
post1["Transpose
Outputs"]
post2["Filter Boxes (confidence_threshold)"]
post3["NMS (iou_threshold)"]
post4["Format Results [x,y,w,h,score]"]
post1 --> post2 --> post3 --> post4
end
subgraph Output
result["detection_result
(FP16, [-1,5])"]
end
raw --> pre1
pre4 --> |"preprocessed_image (FP32, [3,-1,-1])"| yolo
yolo --> |"output0"| post1
conf --> post2
iou --> post3
post4 --> result
end
end
subgraph Client
direction TB
client_start["Client Application"]
response["Detections Result
[x,y,w,h,score]"]
end
client_start -->|"HTTP/gRPC Request
with raw image
confidence_threshold
iou_threshold"| raw
result -->|"HTTP/gRPC Response with detections"| response
style Client fill:#e6f3ff,stroke:#333
style Input fill:#f9f,stroke:#333
style Output fill:#9ff,stroke:#333
The inference module allows you to perform image analysis using different methods, leveraging both local and cloud-based solutions. The pipeline is designed to be flexible and supports multiple prediction methods, making it easy to experiment and deploy in different environments.
The pipeline supports the following inference methods:
The inference module provides both a graphical user interface (GUI) and command-line tools for performing inference.
The GUI allows you to interactively test the deployed model and visualize the results in real-time.
inference_gui.py
python signature-detection/gui/inference_gui.py --triton-url {triton_url}
https://github.com/user-attachments/assets/d41a45a1-8783-41a6-b963-b315d0e994b4
The CLI tool provides a flexible way to perform inference on a dataset using different predictors.
inference_pipeline.py
python signature-detection/inference/inference_pipeline.py
This script calculates metrics of inference time and gives you a tabulated final report like this:
+-----------------------+----------------------+
| Métrica | Valor |
+=======================+======================+
| Tempo médio (ms) | 141.20447635650635 |
+----------------------------+-----------------+
| Desvio padrão (ms) | 17.0417248165512 |
+----------------------------+-----------------+
| Tempo máximo (ms) | 175.67205429077148 |
+----------------------------+-----------------+
| Tempo mínimo (ms) | 125.48470497131348 |
+----------------------------+-----------------+
| Tempo total (min) | 00:02:541 |
+----------------------------+-----------------+
| Número de inferências | 18 |
+----------------------------+-----------------+
For local inference without relying on external services, you can use the ONNX runtime.
inference_onnx.py
python signature-detection/inference/inference_onnx.py \
--model_path {onnx_model_path} \
--img './input/test_image.jpg' \
--conf-thres 0.5 \
--iou-thres 0.5
optional
, the default values are:
--model_path
: signature-detection/models/yolov8s.onnx
--img
: Random image from the test dataset--conf-thres
: 0.5
--iou-thres
: 0.5
If you need to extend the inference pipeline or add custom prediction methods, you can:
BasePredictor
.request
, format_response
, etc.).InferencePipeline
to support the new predictor.The inference pipeline is built around a modular class structure that allows for easy extension and customization. Here’s the class hierarchy:
classDiagram
class ABC {
}
class BasePredictor {
+__init__()
+request(input)
+format_response(response)
+predict(input)
}
class HttpPredictor {
+__init__(url)
~_create_payload(image)
+request(input)
+format_response(response)
}
class VertexAIPredictor {
+__init__(url, access_token)
~_get_google_access_token()
}
class TritonClientPredictor {
+__init__(url, endpoint, scheme)
+request()
+format_response(response)
}
class InferencePipeline {
+__init__(predictor)
+run(image_path)
~_process_response(response)
}
ABC <|-- BasePredictor
BasePredictor <|-- HttpPredictor
HttpPredictor <|-- VertexAIPredictor
BasePredictor <|-- TritonClientPredictor
InferencePipeline --> BasePredictor : uses
To control access to specific server protocols, the server uses the --http-restricted-api
and --grpc-restricted-protocol
flags. These flags ensure that only requests containing the required admin-key
header with the correct value will have access to restricted endpoints.
In this project, the entrance configuration restricts access to the following endpoints via both HTTP and GRPC protocols:
The entrypoint.sh
script is configured to restrict access to the server’s administrative endpoints. The access control is enforced via both HTTP and GRPC protocols, ensuring that only requests containing the admin-key
header with the correct value will be allowed.
tritonserver \
--model-repository=${TRITON_MODEL_REPOSITORY} \
--model-control-mode=explicit \
--load-model=* \
--log-verbose=1 \
--allow-metrics=false \
--allow-grpc=true \
--grpc-restricted-protocol=model-repository,model-config,shared-memory,statistics,trace:admin-key=${TRITON_ADMIN_KEY} \
--http-restricted-api=model-repository,model-config,shared-memory,statistics,trace:admin-key=${TRITON_ADMIN_KEY}
admin-key
header with the correct value defined in the .env
file.This configuration ensures that sensitive operations and configurations are protected, while still allowing regular inference requests to proceed without restrictions.
The Triton Model Analyzer can be used to profile the model and generate performance reports. The metrics-model-inference.csv
file contains performance metrics for various configurations of the YOLOv8 model.
You can run the Model Analyzer using the following command:
docker run -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/signature-detection/models:/signature-detection/models \
--net=host nvcr.io/nvidia/tritonserver:24.11-py3-sdk
model-analyzer profile -f perf.yaml \
--triton-launch-mode=remote --triton-http-endpoint=localhost:8000 \
--output-model-repository-path /signature-detection/analyzer/configs \
--export-path profile_results --override-output-model-repository \
--collect-cpu-metrics --monitoring-interval=5
model-analyzer report --report-model-configs yolov8s_config_0,yolov8s_config_12,yolov8s_config_4,yolov8s_config_8 ... --export-path /workspace --config-file perf.yaml
You can modify the perf.yaml
file to experiment with different configurations and analyze the performance of the model in your deployment environment. See the Triton Model Analyzer documentation for more details.
This project uses a custom-trained YOLOv8 model for signature detection. All model weights, training artifacts, and the dataset are hosted on Hugging Face to comply with Ultralytics’ YOLO licensing requirements and to ensure proper versioning and documentation.
Model Repository: Contains the trained model weights, ONNX exports, and comprehensive model card detailing the training process, performance metrics, and usage guidelines.
Dataset Repository: Includes the training dataset, validation splits, and detailed documentation about data collection and preprocessing steps.
Demo Space: Provides a live demo space for testing the model and dataset using the Hugging Spaces.
The utils/
folder contains scripts designed to simplify interactions with cloud storage providers and the process of exporting machine learning models. Below is an overview of the available scripts and their usage examples.
The download_from_cloud.py
script allows you to download models or other files from Google Cloud Storage (GCP) or Azure Blob Storage. Use the appropriate arguments to specify the provider, storage credentials, and paths.
python signature-detection/utils/download_from_cloud.py --provider gcp --bucket-name <your-bucket-name>
python signature-detection/utils/download_from_cloud.py --provider az --container-name <your-container-name> --connection-string "<your-connection-string>"
Arguments:
--provider
: Specify the cloud provider (gcp
or az
).--bucket-name
: GCP bucket name (required for gcp
).--container-name
: Azure container name (required for az
).--connection-string
: Azure connection string (required for az
).--local-folder
: Local folder to save downloaded files (default: models
folder).--remote-folder
: Remote folder path in the cloud (default: triton-server/image/signature-detection/models
).The upload_models_to_cloud.py
script allows you to upload models or files from a local directory to either GCP or Azure storage.
python signature-detection/utils/upload_models_to_cloud.py --provider gcp --bucket-name <your-bucket-name>
python signature-detection/utils/upload_models_to_cloud.py --provider az --container-name <your-container-name> --connection-string "<your-connection-string>"
Arguments:
--provider
: Specify the cloud provider (gcp
or az
).--bucket-name
: GCP bucket name (required for gcp
).--container-name
: Azure container name (required for az
).--connection-string
: Azure connection string (required for az
).--local-folder
: Local folder containing files to upload (default: models
folder).--remote-folder
: Remote folder path in the cloud (default: triton-server/image/signature-detection/models
).The export_model.py
script simplifies the process of exporting YOLOv8 models to either ONNX or TensorRT formats. This is useful for deploying models in environments requiring specific formats.
python signature-detection/utils/export_model.py --model-path /path/to/yolov8s.pt --output-path model.onnx --format onnx
python signature-detection/utils/export_model.py --model-path /path/to yolov8s.pt --output-path model.engine --format tensorrt
Arguments:
--model-path
: Path to the input model file (e.g., YOLOv8 .pt
file).--output-path
: Path to save the exported model.--format
: Export format (onnx
or tensorrt
).