Triton Client Libraries and Examples

To simplify communication with Triton, the Triton project provides several client libraries and examples of how to use those libraries. Ask questions or report problems in the main Triton issues page.

The provided client libraries are:

C++ and Python APIs that make it easy to communicate with Triton from your C++ or Python application. Using these libraries you can send either HTTP/REST or GRPC requests to Triton to access all its capabilities: inferencing, status and health, statistics and metrics, model repository management, etc. These libraries also support using system and CUDA shared memory for passing inputs to and receiving outputs from Triton.
Java API (contributed by Alibaba Cloud PAI Team) that makes it easy to communicate with Triton from your Java application using HTTP/REST requests. For now, only a limited feature subset is supported.
The protoc compiler can generate a GRPC API in a large number of programming languages.
- See src/grpc_generated/go for an example for the Go programming language.
- See src/grpc_generated/java for an example for the Java and Scala programming languages.
- See src/grpc_generated/javascript for an example with JavaScript programming language.

There are also many example applications that show how to use these libraries. Many of these examples use models from the example model repository.

C++ and Python versions of image_client, an example application that uses the C++ or Python client library to execute image classification models on Triton. See Image Classification Example.
Several simple C++ examples show how to use the C++ library to communicate with Triton to perform inferencing and other task. The C++ examples demonstrating the HTTP/REST client are named with a simple_http_ prefix and the examples demonstrating the GRPC client are named with a simple_grpc_ prefix. See Simple Example Applications.
Several simple Python examples show how to use the Python library to communicate with Triton to perform inferencing and other task. The Python examples demonstrating the HTTP/REST client are named with a simple_http_ prefix and the examples demonstrating the GRPC client are named with a simple_grpc_ prefix. See Simple Example Applications.
Several simple Java examples show how to use the Java API to communicate with Triton to perform inferencing and other task.
A couple of Python examples that communicate with Triton using a Python GRPC API generated by the protoc compiler. grpc_client.py is a simple example that shows simple API usage. grpc_image_client.py is functionally equivalent to image_client but that uses a generated GRPC client stub to communicate with Triton.

Getting the Client Libraries And Examples

The easiest way to get the Python client library is to use pip to install the tritonclient module. You can also download the C++, Python and Java client libraries from Triton GitHub release, or download a pre-built Docker image containing the client libraries from NVIDIA GPU Cloud (NGC).

It is also possible to build the client libraries with cmake.

Download Using Python Package Installer (pip)

The GRPC and HTTP client libraries are available as a Python package that can be installed using a recent version of pip.

$ pip install tritonclient[all]

Using all installs both the HTTP/REST and GRPC client libraries. There are two optional packages available, grpc and http that can be used to install support specifically for the protocol. For example, to install only the HTTP/REST client library use,

$ pip install tritonclient[http]

There is another optional package namely cuda, that must be installed in order to use cuda_shared_memory utilities. all specification will install the cuda package by default but in other cases cuda needs to be explicitly specified for installing client with cuda_shared_memory support.

$ pip install tritonclient[http, cuda]

The components of the install packages are:

http
grpc [ service_pb2, service_pb2_grpc, model_config_pb2 ]
utils [ linux distribution will include shared_memory and cuda_shared_memory]

Download From GitHub

The client libraries can be downloaded from the Triton GitHub release page corresponding to the release you are interested in. The client libraries are found in the "Assets" section of the release page in a tar file named after the version of the release and the OS, for example, v2.3.0_ubuntu2004.clients.tar.gz.

The pre-built libraries can be used on the corresponding host system or you can install them into the Triton container to have both the clients and server in the same container.

$ mkdir clients
$ cd clients
$ wget https://github.com/triton-inference-server/server/releases/download/<tarfile_path>
$ tar xzf <tarfile_name>

After installing, the libraries can be found in lib/, the headers in include/, the Python wheel files in python/, and the jar files in java/. The bin/ and python/ directories contain the built examples that you can learn more about below.

Download Docker Image From NGC

A Docker image containing the client libraries and examples is available from NVIDIA GPU Cloud (NGC). Before attempting to pull the container ensure you have access to NGC. For step-by-step instructions, see the NGC Getting Started Guide.

Use docker pull to get the client libraries and examples container from NGC.

$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk

Where <xx.yy> is the version that you want to pull. Within the container the client libraries are in /workspace/install/lib, the corresponding headers in /workspace/install/include, and the Python wheel files in /workspace/install/python. The image will also contain the built client examples.

Important Note: When running either the server or the client using Docker containers and using the CUDA shared memory feature you need to add --pid host flag when launching the containers. The reason is that CUDA IPC APIs require the PID of the source and destination of the exported pointer to be different. Otherwise, Docker enables PID namespace which may result in equality between the source and destination PIDs. The error will be always observed when both of the containers are started in the non-interactive mode.

Build Using CMake

The client library build is performed using CMake. To build the client libraries and examples with all features, first change directory to the root of this repo and checkout the release version of the branch that you want to build (or the main branch if you want to build the under-development version).

$ git checkout main

If building the Java client you must first install Maven and a JDK appropriate for your OS. For example, for Ubuntu you should install the default-jdk package:

$ apt-get install default-jdk maven

Building on Windows vs. non-Windows requires different invocations because Triton on Windows does not yet support all the build options.

Non-Windows

Use cmake to configure the build. You should adjust t

Client

Install / Use

README