SkillAgentSearch skills...

Provena

A provenance system for supporting modelling and simulation workflows

Install / Use

/learn @provena/Provena
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Provena

Provena is an information system built to capture, manage, and analyse Provenance records, and their artefacts. Provena additionally implements a robust data storage solution for seamless interaction with and inspection of the data referenced in provenance records.

Description

To facilitate system functionality, Provena implements a suite of components. Each comprise of a Docker deployed Python FastAPI serverless microservice, and where user interaction is required, these APIs are paired with a front-end web application. The following components are implemented:

Citations

Conference paper

Yu, J., Baker, P., Cox, S.J.D., Petridis, R., Freebairn, A.C., Mirza, F., Thomas, L., Tickell, S., Lemon, D. and Rezvani, M., Provena: A provenance system for large distributed modelling and simulation workflows, In Vaze, J., Chilcott, C., Hutley, L. and Cuddy, S.M. (eds) MODSIM2023, 25th International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand, July 2023, pp. 14–20. ISBN: 978-0-9872143-0-0. https://doi.org/10.36334/modsim.2023.yu90

Software citation

Yu, Jonathan; Baker, Peter; Petridis, Ross; Freebairn, Andrew; Thomas, Linda; Mirza, Fareed; Tickell, Sharon; Lemon, David; Rezvani, Mojtaba; Hou, Xinyu; Cox, Simon (2023): Provena. v1. CSIRO. Software Collection. https://doi.org/10.25919/1edm-0612

See also https://data.csiro.au/collection/csiro:59335

Brief Description of Components

Identity Service (API)

The Identity Service is a heavily utilised component of the Provena IS. It provides the critical functionality of the minting and management of persistent web identifiers which resolve to a configurable domain/endpoint. It is implemented as a Python FastAPI wrapper of the ARDC Handle Service.

For more info on the Identity Service API:
Identity Service API Readme

Registry (API and UI)

The Provena Registry facilitates the registration of resources in the Provena Registry. Specifically, it enables the registration, updating, management, and exploration of persistently identified resources, such as people, organisations, models, and more.

The Provena Registry UI is a Typescript React GUI which enables user friendly interaction with the Provena Registry API.

For more info on the Provena Registry API and UI:
Registry API Readme
Registry UI Readme

Data Store (API and UI)
The Provena Data Store API facilitates the registration of dataset entities and storage of datasets in the Provena Registry and AWS S3 data storage, respectively. It is the datasets which form the major inputs and outputs of workflow provenance recorded in the Provena Provenance Store. The Data Store API acts as a proxy to the underlying storage layer to provide a simple unified Provena authorisation interface for data access.

The Provena Data Store UI is a Typescript React GUI which enables user friendly interaction with the Provena Data Store API.

For more info on the Provena Data Store API and UI:
Data Store API Readme
Data Store UI Readme

Provenance Store (API and UI)

The Provena Provenance API enables the registration of workflow provenance records according to the Provena workflow provenance data models. The Provena data model represents modelling as activities which consume inputs, produce outputs, at a particular point in time, associated with People and Organisations in the Registry.

There are two primary registration mechanisms:

  • individual record registration (using the Pydantic models directly)
  • CSV bulk model run record ingestion (using a job queue workflow deployed separately)

Workflow provenance can support transparency of the modelling activities and facilitate repetition of the knowledge generation activity. Provenance can also assist in determining the integrity of certain knowledge and knowledge generation activities where important decisions are based off of this knowledge.

The Provena Provenance Store UI is a Typescript React GUI which enables user friendly interaction with the Provena Provenance API and Registry API.

This UI facilitates

  • interactive visualisation of provenance graphs
  • tooling to assist with bulk registration of provenance model run activities

and more...

For more info on the Provena Provenance API and UI:
Provenance API Readme
Provenance UI Readme

Authorisation Service (API)

The Provena Auth API provides a set of user and admin tools which assists with:

  • creating, viewing and managing system access requests
  • generating access reports which assess the current users access to the components configured in the system Authorisation Model
  • creating, querying and managing user groups
  • managing the User to Registry Person Link

The Auth API is used heavily by other APIs as it communicates with the User groups table to enable querying of group membership. This group information is used to enforce resource level permissions on registry items.

For more info on the Authentication API:
Authentication API Readme

Search Service (API)

This API exposes a registry search endpoint which queries an Elastic Search index of the Registry.

The Search API is used across many of the Provena UIs as a simple way to search for registered items.

For more info on the Search API:
Search API Readme

Technologies Used

Please visit the READMEs of the respective compoents for information on the technologies in each component. Links are included in component descriptions above.

Getting Started

Deploying and managing Provena

We recommend that new users, who want to deploy Provena, visit our thorough deployment guide available here: Provena from Scratch.

Deploying APIs and UIs

Once a Provena deployment has been configured and deployed as per the above guide, individual APIs and UIs can be deployed for development, debugging or testing purposes by following instructions in their respective READMEs.

Usage

Provena comes with a suite of fully developed User Interface web pages that facilitate a user friendly interaction with the Provena APIs and resources.

Additionally, all Provena services can be accessed by authorised users through the system APIs. Provena has well documented REST APIs (see API endpoint documentation in READMEs) which can be interacted with directly using a REST client such as Postman or thunderclient, or programmatically using Python, for example. Programmatic interactions are recommended for our power users and we have guide on authentication for this. See our API access docs: API access docs.

Testing

Component Unit tests

Each API component has a dedicated unit test suite. Unlike the Integration tests which are purposed for testing user stories and interacts between multiple APIs at once, the unit test were designed to test individual endpoints and key functionality of the Provena source code. All unit tests operate within a mocked environment. Each API's unit tests are located in a dedicated tests directory within the component's source code. See each components readme for more info on how to run the unit tests.

Integration Tests

The Provena repo contains a dedicated integration testing directory, separate from the unit tests defined for each component individually. Unit tests were designed for testing lower level functionality of the Provena source code, meanwhile, the Integration tests were purposed for testing typical user stories. That is, integration tests are high level tests requiring minimal dependency on our source code, but rather, interface with several exposed APIs to test a particular user story. For more info and to run the tests, see the integration tests readme.

System Tests

These are some basic end-to-end system tests which test core functionality pathways to detect and prevent feature regression as new features are developed. These tests interact with the User Interfaces. For more info and to run the tests, see the system tests readme.

Configuring Provena

Provena features a detached configuration management approach. This means that configuration should be stored in a separate private repository. Provena provides a set of utilities which interact with this configuration repository, primary the ./config bash script.

config - Configuration management tool for interacting with a private configuration repository

Usage:
  config NAMESPACE STAGE [OPTIONS]
  config --help | -h
  config --version | -v

Options:
  --target, -t REPO_CLONE_STRING
    The repository clone string

  --repo-dir, -d PATH
    Path to the pre-cloned repository

  --help, -h
    Show this help

  --version, -v
    Show version number

Arguments:
  NAMESPACE
    The namespace to use (e.g., 'rrap')

  STAGE
    The stage to use (e.g., 'dev', 'stage')

Environment Variables:
  DEBUG
    Set to 'true' for verbose output

The central idea of this configuration approach is that each namespace/stage combination contains a set of files, which are gitignored by default in Provena, which are 'merged' into the user's clone of the Provena repository, allowing temporary access to private information without exposing it in git.

Sample configuration files are provided in various spots to help get started, including the template config repo.

Config path caching

The script builds in functionality to cache the repo which makes available

View on GitHub
GitHub Stars7
CategoryCustomer
Updated3h ago
Forks3

Languages

Python

Security Score

70/100

Audited on Apr 7, 2026

No findings