<div align=center> <img src="./docs/figs/logo.png" width = 20% height = 20% /> </div> <div align=center>

</div>

Contents Overview

Overview of iVSR
Setup iVSR env on linux
How to use iVSR
- Run with iVSR SDK sample
- Run with FFmpeg
Model files
License

1. Overview of iVSR

1.1 What is iVSR

iVSR facilitates AI media processing with exceptional quality and performance on Intel hardware.

iVSR offers a patch-based, heterogeneous, multi-GPU, and multi-algorithm solution, harnessing the full capabilities of Intel CPUs and GPUs. It is adaptable for deployment on a single device, a distributed system, cloud infrastructure, edge cloud, or K8S environment.

<div align=center> <img src="./docs/figs/iVSR.png" width = 75% height = 75% /> </div>

1.2 Why is iVSR needed

Simple APIs ensure that any changes to the OpenVINO API remain hidden.
A patch-based solution facilitates inference on hardware with limited memory capacity, particularly useful for super-resolution of high-resolution input videos, such as 4K.
The iVSR SDK includes features to safeguard AI models created by Intel, which contain Intel IP.
The iVSR SDK is versatile and supports a wide range of AI media processing algorithms.
For specific algorithms, performance optimization can be executed to better align with customer requirements.

1.3 iVSR Components

This repository or package includes the following major components:

1.3.1 iVSR SDK

The iVSR SDK is a middleware library that supports various AI video processing filters. It is designed to accommodate different AI inference backends, although currently, it only supports OpenVINO.<br> For a detailed introduction to the iVSR SDK API, please refer to this introduction.

We've also included a vsr_sample as a demonstration of its usage.

In order to support the widely-used media processing solution FFmpeg, we've provided an iVSR SDK plugin to simplify integration.<br> This plugin is integrated into FFmpeg's dnn_processing filter in the FFmpeg documentation in the libavfilter library, serving as a new ivsr backend to this filter. Please note that the patches provided in this project are specifically for FFmpeg n7.1.<br>

1.3.3 OpenVINO patches and extension

In this folder, you'll find patches for OpenVINO that enable the Enhanced BasicVSR model. These patches utilize OpenVINO's Custom OpenVINO™ Operations feature, which allows users to support models with custom operations not inherently supported by OpenVINO.<br> These patches are specifically for OpenVINO 2022.3, meaning the Enhanced BasicVSR model will only work on OpenVINO 2022.3 with these patches applied.<br>

1.4 Capabilities of iVSR

Currently, iVSR offers two AI media processing functionalities: Video Super Resolution (VSR) and Smart Video Processing (SVP) for bandwidth optimization. Both functionalities can be run on Intel CPUs and Intel GPUs (including Flex170, Arc770) via OpenVINO and FFmpeg.

1.4.1 Video Super Resolution (VSR)

Video Super Resolution (VSR) is a technique extensively employed in the AI media enhancement domain to upscale low-resolution videos to high-resolution. iVSR supports Enhanced BasicVSR, Enhanced EDSR, and TSENet. It also has the capability to be extended to support additional models.

i. Enhanced BasicVSR

BasicVSR is a publicly available AI-based VSR algorithm. For more details on the public BasicVSR, please refer to this paper.<br><br> We have improved the public model to attain superior visual quality and reduced computational complexity. This improved model is named Enhanced BasicVSR. The performance of the Enhanced BasicVSR model inference has also been optimized for Intel GPUs. Please note that this optimization is specific to OpenVINO 2022.3. Therefore, the Enhanced BasicVSR model only works with OpenVINO 2022.3 with the applied patches.<br><br> The input shape of this model and the output shape are:
```
Input shape: [1, (channels)3, (frames)3, H, W]
Output shape: [1, (channels)3, (frames)3, 2xH, 2xW]
```
ii. Enhanced EDSR

EDSR is another publicly available AI-based single image SR algorithm. For more details on the public EDSR, please refer to this paper<br><br> We have improved the public EDSR model to reduce the computational complexity by over 79% compared to Enhanced BasicVSR. This improvement maintains similar visual quality and is named Enhanced EDSR.<br><br> The input shape of this model and the output shape are:
```
Input shape: [1, (channels)3, H, W]
Output shape: [1, (channels)3, 2xH, 2xW]
```
iii. TSENet

TSENet is one multi-frame SR algorithm derived from ETDS.<br><br> We provide a preview version of the feature to support this model in the SDK and its plugin. Please contact your Intel representative to obtain the model package.<br><br> The input shape of this model and the output shape are:
```
Input shape: [1, (channels * frames)9, H, W]
Output shape: [1, (channels)3, 2xH, 2xW]
```
For each inference, the input data is the (n-1)th, (n)th, and (n+1)th frames combined. The output data is the (N)th frame. For the first frame, the input data is 1st, 1st, 2nd frames combined. For the last frame, the input data is the (n-1)th, (n)th, (n)th frames combined.

1.4.2. Smart Video Processing (SVP)

SVP is an AI-based video prefilter that enhances perceptual rate-distortion in video encoding. With SVP, encoded video streams maintain the same visual quality while reducing bandwidth usage.<br>

Two SVP model variants are provided:

SVP-Basic: This model is designed for efficiency, preserving fidelity while reducing the encoded bitrate. Modifications made by SVP-Basic are imperceptible to the human eye but can be measured by minor BD-rate degradation when evaluated using SSIM or MS-SSIM metrics. SVP-Basic is adaptable to various video scenarios, including live sports, gaming, livestream sales, VOD, video conferencing, video surveillance, and 5G video streaming.<br>
SVP-SE: This model focuses on subjective video quality preservation, achieving up to 50% bitrate savings. It enhances visuals by reducing complex details and noise that are less perceptible to human eyes. As a result, it cannot be evaluated by traditional full-reference visual quality metrics like PSNR, SSIM, or VMAF. SVP-SE improves the visibility and quality of visuals, making them more vivid and appealing, which is beneficial in industries such as entertainment, media, and advertising.<br>

The input and output shapes are:

RGB based model:

Input shape: [1, (channels)3, H, W]
Output shape: [1, (channels)3, H, W]

Y based model:

Input shape: [1, (channels)1, H, W]
Output shape: [1, (channels)1, H, W]

<br>

2. Setup iVSR env on linux

The software was validated on:

Intel Xeon hardware platform
(Optional) Intel® Data Center GPU Flex 170(aka ATS-M1 150W)
Host OS: Linux-based OS (Ubuntu 22.04 or Rocky Linux 9.3)
Docker-based OS: Ubuntu 22.04 or Rocky Linux 9.3
OpenVINO: 2022.3, 2023.2, or 2024.5
FFmpeg: n7.1

Building iVSR requires the installation of the GPU driver (o

IVSR

Install / Use

README