Tripnet: Recognition of instrument-tissue interactions in endoscopic videos via action triplets

CI Nwoye, C Gonzalez, T Yu, P Mascagni, D Mutter, J Marescaux, and N Padoy

This repository contains the implementation code, inference code, and evaluation scripts.

News

[ 17/09/2025 ]: Check out our CAMMA Dataset Overlaps repository for an analysis of video overlaps across Cholec80, CholecT50, and Endoscapes to ensure fair dataset splits.

Abstract

Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity.

To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes.

Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called class activation guide, which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D interaction space (3Dis), which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40.

News and Updates

[2023.02.20]: CholecT50 dataset is now public!
[2022.05.09]: TensorFlow v2 implementation code released!
[2022.05.09]: TensorFlow v1 implementation code released!
[2022.05.03]: PyTorch implementation code released!

Model Overview

The Tripnet model is composed of:

Feature Extraction layer: extract high and low level features from input image from a video
Encoder: for triplet components encoding
- Weakly-Supervised Localization (WSL) Layer: for localizing the instruments
- Class Activation Guide (CAG): for detecting the verbs and targets leveraging the instrument activations.
Decoder: for triplet assocaition due to multi-instances
- 3D interaction space (3Dis): for learning to associate instrument-verb-target using a learning projection and for final triplet classification.

We hope this repo will help researches/engineers in the development of surgical action recognition systems. For algorithm development, we provide training data, baseline models and evaluation methods to make a level playground. For application usage, we also provide a small video demo that takes raw videos as input without any bells and whistles.

Performance

Results Table

Dataset ||Components AP ||||| Association AP ||| :---:|:---:|:---:|:---: |:---:|:---:|:---:|:---:|:---:|:---:| .. | API | APV | APT ||| APIV | APIT | APIVT | CholecT40 | 89.7 | 60.7 | 38.3 ||| 35.5 | 19.9 | 19.0| CholecT45 | 89.9 | 59.9 | 37.4 ||| 31.8 | 27.1 | 24.4| CholecT50 | 92.1 | 54.5 | 33.2 ||| 29.7 | 26.4 | 20.0|

Installation

Requirements

The model depends on the following libraries:

sklearn
PIL
Python >= 3.5
ivtmetrics
Developer's framework:
1. For Tensorflow version 1:
  - TF >= 1.10
2. For Tensorflow version 2:
  - TF >= 2.1
3. For PyTorch version:
  - Pyorch >= 1.10.1
  - TorchVision >= 0.11

System Requirements:

The code has been test on Linux operating system. It runs on both CPU and GPU. Equivalence of basic OS commands such as unzip, cd, wget, etc. will be needed to run in Windows or Mac OS.

Quick Start

clone the git repository: git clone https://github.com/CAMMA-public/tripnet.git
install all the required libraries according to chosen your framework.
download the dataset
download model's weights
train
evaluate

Dataset Zoo

Data Preparation

All frames are resized to 256 x 448 during training and evaluation.
Image data are mean normalized.
The dataset variants are tagged in this code as follows:
- cholect50 = CholecT50 with split used in the original paper.
- cholect50-challenge = CholecT50 with split used in the CholecTriplet challenge.
- cholect45-crossval = CholecT45 with official cross-val split (currently public released).
- cholect50-crossval = CholecT50 with official cross-val split.

Evaluation Metrics

The ivtmetrics computes AP for triplet recognition. It also support the evaluation of the recognition of the triplet components.

pip install ivtmetrics

conda install -c nwoye ivtmetrics

Usage guide is found on pypi.org.

Running the Model

The code can be run in a trianing mode (-t) or testing mode (-e) or both (-t -e) if you want to evaluate at the end of training :

Training on CholecT45/CholecT50 Dataset

Simple training on CholecT50 dataset:

python run.py -t  --data_dir="/path/to/dataset" --dataset_variant=cholect50 --version=1

You can include more details such as epoch, batch size, cross-validation and evaluation fold, weight initialization, learning rates for all subtasks, etc.:

python3 run.py -t -e  --data_dir="/path/to/dataset" --dataset_variant=cholect45-crossval --kfold=1 --epochs=180 --batch=64 --version=2 -l 1e-2 1e-3 1e-4 --pretrain_dir='path/to/imagenet/weights'

All the flags can been seen in the run.py file. The experimental setup of the published model is contained in the paper.

Testing

python3 run.py -e --dataset_variant=cholect45-crossval --kfold 3 --batch 32 --version=1 --test_ckpt="/path/to/model-k3/weights" --data_dir="/path/to/dataset"

Training on Custom Dataset

Adding custom datasets is quite simple, what you need to do are:

organize your annotation files in the same format as in CholecT45 dataset.
final model layers can be modified to suit your task by changing the class-size (num_tool_classes, num_verb_classes, num_target_classes, num_triplet_classes) in the argparse.

Model Zoo

N.B. Download links to models' weights will not be provided until after the CholecTriplet2022 challenge.

PyTorch

| Network | Base | Resolution | Dataset | Data split | Link | ------------|-----------|------------|-----------|-------------|-------------------| | Tripnet | ResNet-18 | Low | CholecT50 | RDV | Download | | Tripnet | ResNet-18 | High | CholecT50 | RDV | [Download]| | Tripnet | ResNet-18 | Low | CholecT50 | Challenge | Download | | Tripnet| ResNet-18 | Low | CholecT50 | crossval k1 | Download | | Tripnet| ResNet-18 | Low | CholecT50 | crossval k2 | Download | | Tripnet| ResNet-18 | Low | CholecT50 | crossval k3 | Download | | Tripnet| ResNet-18 | Low | CholecT50 | crossval k4 | Download | | Tripnet| ResNet-18 | Low | CholecT50 | crossval k5 | Download | | Tripnet| ResNet-18 | Low | CholecT45 | crossval k1 | Download | | Tripnet| ResNet-18 | Low | CholecT45 | crossval k2 | Download | | Tripnet| ResNet-18 | Low | CholecT45 | crossval k3 | Download | | Tripnet| ResNet-18 | Low | CholecT45 | crossval k4 | Download | | Tripnet| ResNet-18 | Low | CholecT45 | crossval k5 | [Download](https://s3.unistra.fr/camma_public/gith

Tripnet

Install / Use

README