signature-detection

Object Detection with Triton Inference Server

Triton Badge Docker Badge Python Badge Apache Badge Opencv Badge

This project provides a pipeline for deploying and performing inference with a YOLOv8 object detection model using Triton Inference Server on Google Cloud’s Vertex AI, locally or Docker based systems. The repository includes scripts for automating the deployment process, a graphical user interface for inference, and performance analysis tools for optimizing the model’s performance.

Table of Contents

📁 Project Structure

Key Files

🛠️ Features

💻 Installation

  1. Clone the repository:
    git clone https://github.com/your-username/t4ai-triton-server.git
    
  2. Install dependencies (Optional: Create a virtual environment):
    pip install -r requirements.txt
    
  3. Configure your environment: Set up Google Cloud credentials and env file (See .env.example).
  4. Build and deploy:
  5. Run inference: The scripts in signature-detection/inference can be used to perform inference on images using differents methods (requests, triton client, vertex ai).
    • GUI: Use the inference_gui.py to test the deployed model and visualize the results.
    • CLI: Use the inference_pipeline.py script to select predictor and perform inference on test dataset images.
    • ONNX: Use the inference_onnx.py script to perform inference with the ONNX runtime locally.

🧩 Ensemble Model

The repository includes an Ensemble Model for the YOLOv8 object detection model. The Ensemble Model combines the YOLOv8 model with pre and post-processing scripts to perform inference on images. The model repository is located in the models/ directory.

flowchart TB
    subgraph "Triton Inference Server"
        direction TB
        subgraph "Ensemble Model Pipeline"
            direction TB
            subgraph Input
                raw["raw_image
                 (UINT8, [-1])"]
                conf["confidence_threshold
                 (FP16, [1])"]
                iou["iou_threshold
                 (FP16, [1])"]
            end

            subgraph "Preprocess Py-Backend"
                direction TB
                pre1["Decode Image
                    BGR to RGB"]
                pre2["Resize (640x640)"]
                pre3["Normalize (/255.0)"]
                pre4["Transpose
                [H,W,C]->[C,H,W]"]
                pre1 --> pre2 --> pre3 --> pre4
            end

            subgraph "YOLOv8 Model ONNX Backend"
                yolo["Inference YOLOv8s"]
            end

            subgraph "Postproces Python Backend"
                direction TB
                post1["Transpose
                   Outputs"]
                post2["Filter Boxes (confidence_threshold)"]
                post3["NMS (iou_threshold)"]
                post4["Format Results [x,y,w,h,score]"]
                post1 --> post2 --> post3 --> post4
            end

            subgraph Output
                result["detection_result
                    (FP16, [-1,5])"]
            end

            raw --> pre1
            pre4 --> |"preprocessed_image (FP32, [3,-1,-1])"| yolo
            yolo --> |"output0"| post1
            conf --> post2
            iou --> post3
            post4 --> result
        end
    end

    subgraph Client
        direction TB
        client_start["Client Application"]
        response["Detections Result
                [x,y,w,h,score]"]
    end

    client_start -->|"HTTP/gRPC Request
          with raw image
          confidence_threshold
          iou_threshold"| raw
    result -->|"HTTP/gRPC Response with detections"| response

    style Client fill:#e6f3ff,stroke:#333
    style Input fill:#f9f,stroke:#333
    style Output fill:#9ff,stroke:#333

⚡ Inference

The inference module allows you to perform image analysis using different methods, leveraging both local and cloud-based solutions. The pipeline is designed to be flexible and supports multiple prediction methods, making it easy to experiment and deploy in different environments.

Available Methods

The pipeline supports the following inference methods:

  1. Triton Client: Inference using the Triton Inference Server SDK.
  2. Vertex AI: Inference using Google Cloud’s Vertex AI Endpoint.
  3. HTTP: Inference using HTTP requests to the Triton Inference Server.

How To Use

The inference module provides both a graphical user interface (GUI) and command-line tools for performing inference.

1. Graphical User Interface (GUI)

The GUI allows you to interactively test the deployed model and visualize the results in real-time.

python signature-detection/gui/inference_gui.py --triton-url {triton_url} 

https://github.com/user-attachments/assets/d41a45a1-8783-41a6-b963-b315d0e994b4

2. Command-Line Interface (CLI)

The CLI tool provides a flexible way to perform inference on a dataset using different predictors.

python signature-detection/inference/inference_pipeline.py
💡

This script calculates metrics of inference time and gives you a tabulated final report like this:

        
+-----------------------+----------------------+
| Métrica               | Valor                |
+=======================+======================+
| Tempo médio (ms)      | 141.20447635650635   |
+----------------------------+-----------------+
| Desvio padrão (ms)    | 17.0417248165512     |
+----------------------------+-----------------+
| Tempo máximo (ms)     | 175.67205429077148   |
+----------------------------+-----------------+
| Tempo mínimo (ms)     | 125.48470497131348   |
+----------------------------+-----------------+
| Tempo total (min)     | 00:02:541            |
+----------------------------+-----------------+
| Número de inferências | 18                   |
+----------------------------+-----------------+
        
      

3. ONNX Runtime

For local inference without relying on external services, you can use the ONNX runtime.

python signature-detection/inference/inference_onnx.py \
  --model_path {onnx_model_path} \
  --img './input/test_image.jpg' \
  --conf-thres 0.5 \
  --iou-thres 0.5

Extending the Pipeline

If you need to extend the inference pipeline or add custom prediction methods, you can:

  1. Create a new predictor class that inherits from BasePredictor.
  2. Implement the required methods (request, format_response, etc.).
  3. Update the InferencePipeline to support the new predictor.

Class Diagram

The inference pipeline is built around a modular class structure that allows for easy extension and customization. Here’s the class hierarchy:

classDiagram
    class ABC {
    }
    class BasePredictor {
        +__init__()
        +request(input)
        +format_response(response)
        +predict(input)
    }
    class HttpPredictor {
        +__init__(url)
        ~_create_payload(image)
        +request(input)
        +format_response(response)
    }
    class VertexAIPredictor {
        +__init__(url, access_token)
        ~_get_google_access_token()
    }
    class TritonClientPredictor {
        +__init__(url, endpoint, scheme)
        +request()
        +format_response(response)
    }
    class InferencePipeline {
        +__init__(predictor)
        +run(image_path)
        ~_process_response(response)
    }
  
    ABC <|-- BasePredictor
    BasePredictor <|-- HttpPredictor
    HttpPredictor <|-- VertexAIPredictor
    BasePredictor <|-- TritonClientPredictor
    InferencePipeline --> BasePredictor : uses

🔒 Limit Endpoint Access

To control access to specific server protocols, the server uses the --http-restricted-api and --grpc-restricted-protocol flags. These flags ensure that only requests containing the required admin-key header with the correct value will have access to restricted endpoints.

In this project, the entrance configuration restricts access to the following endpoints via both HTTP and GRPC protocols:

Restricted Endpoints:

Entry Point Configuration

The entrypoint.sh script is configured to restrict access to the server’s administrative endpoints. The access control is enforced via both HTTP and GRPC protocols, ensuring that only requests containing the admin-key header with the correct value will be allowed.

tritonserver \
  --model-repository=${TRITON_MODEL_REPOSITORY} \
  --model-control-mode=explicit \
  --load-model=* \
  --log-verbose=1 \
  --allow-metrics=false \
  --allow-grpc=true \
  --grpc-restricted-protocol=model-repository,model-config,shared-memory,statistics,trace:admin-key=${TRITON_ADMIN_KEY} \
  --http-restricted-api=model-repository,model-config,shared-memory,statistics,trace:admin-key=${TRITON_ADMIN_KEY}

Key Points:

  1. Inference Access: The server allows inference requests from any user.
  2. Admin Access: Access to the restricted endpoints (model-repository, model-config, etc.) is limited to requests that include the admin-key header with the correct value defined in the .env file.
  3. GRPC Protocol: The GRPC protocol is enabled and restricted in the same way as HTTP, providing consistent security across both protocols.

This configuration ensures that sensitive operations and configurations are protected, while still allowing regular inference requests to proceed without restrictions.

📊 Model Analyzer

The Triton Model Analyzer can be used to profile the model and generate performance reports. The metrics-model-inference.csv file contains performance metrics for various configurations of the YOLOv8 model.

You can run the Model Analyzer using the following command:

docker run -it  \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v $(pwd)/signature-detection/models:/signature-detection/models \
    --net=host nvcr.io/nvidia/tritonserver:24.11-py3-sdk 
model-analyzer profile -f perf.yaml \
    --triton-launch-mode=remote --triton-http-endpoint=localhost:8000  \
    --output-model-repository-path /signature-detection/analyzer/configs  \
    --export-path profile_results --override-output-model-repository \
    --collect-cpu-metrics --monitoring-interval=5
model-analyzer report --report-model-configs yolov8s_config_0,yolov8s_config_12,yolov8s_config_4,yolov8s_config_8 ... --export-path /workspace --config-file perf.yaml 

You can modify the perf.yaml file to experiment with different configurations and analyze the performance of the model in your deployment environment. See the Triton Model Analyzer documentation for more details.

🤗 Model & Dataset Resources

This project uses a custom-trained YOLOv8 model for signature detection. All model weights, training artifacts, and the dataset are hosted on Hugging Face to comply with Ultralytics’ YOLO licensing requirements and to ensure proper versioning and documentation.

🧰 Utils

The utils/ folder contains scripts designed to simplify interactions with cloud storage providers and the process of exporting machine learning models. Below is an overview of the available scripts and their usage examples.

1. Downloading Models from Cloud Storage

The download_from_cloud.py script allows you to download models or other files from Google Cloud Storage (GCP) or Azure Blob Storage. Use the appropriate arguments to specify the provider, storage credentials, and paths.

Arguments:

2. Uploading Models to Cloud Storage

The upload_models_to_cloud.py script allows you to upload models or files from a local directory to either GCP or Azure storage.

Arguments:

3. Exporting Models

The export_model.py script simplifies the process of exporting YOLOv8 models to either ONNX or TensorRT formats. This is useful for deploying models in environments requiring specific formats.

Arguments:

🤝 Contributors

</table> ## Contributing First off, thanks for taking the time to contribute! Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make will benefit everybody else and are **greatly appreciated**. Please read [our contribution guidelines](/signature-detection/docs/CONTRIBUTING.md), and thank you for being involved! ## License This project is licensed under the **Apache Software License 2.0**. See [LICENSE](/signature-detection/LICENSE) for more information.
Samuel Lima Braz
Samuel Lima Braz</td>
Jorge Willians
Jorge Willians</td>
Nixon Silva
Nixon Silva</td>
ronaldobalzi-tech4h
ronaldobalzi-tech4h</td> </tr> </tbody>
Add your contributions </img>