Codescribe
AI Agent and CLI for Code Translation and Software Development in Scientific Computing
Install / Use
/learn @Lab-Notebooks/CodescribeREADME
.. |icon| image:: ./media/icon.svg :width: 35
################### |icon| Codescribe ###################
|Code style: black|
Overview
Codescribe is an AI-assisted framework designed to streamline Fortran-to-C++ code translation and facilitate the development and maintenance of scientific codebases. It automates the process of generating corresponding C++ source files and creating Fortran-C++ interfaces, simplifying the integration of Fortran and C++. The tool allows users to interface with large language models (LLMs) through the API endpoints and locally through the Transformers library, and enables the creation of custom prompts tailored to the specific needs of the source code. Codescribe empowers research software engineers by complementing existing tools like OpenAI Codex and addressing the niche requirements of scientific software development.
Resources
-
Papers:
- https://arxiv.org/abs/2410.24119
-
Tutorials:
- https://anl.box.com/s/zv3zdbphqprdz8rjh1c84xpeqd8yg32u
- https://github.com/akashdhruv/codescribe-tutorial.git
-
Use cases:
- https://erf.readthedocs.io/en/latest/CouplingToNoahMP.html
- https://mcfm.fnal.gov
-
Demo:
- https://doi.org/10.5281/zenodo.18853292
Key Features
-
Incremental Translation: Translate Fortran codebases into C++ incrementally, creating Fortran-C++ layers for seamless interoperability.
|fig1|
-
Custom Prompts: Automatically generate prompts for generative AI to assist with the conversion process.
-
Language Model Integration: Leverage LLMs through the Transformers API to refine the translation and improve accuracy.
|fig2|
-
Fortran-C++ Interfaces: Generate the necessary interface layers between Fortran and C++ for easy function and subroutine conversion.
-
Code Generation and Update: Create new source files or modify existing ones from natural-language prompts.
Statement of Need
In scientific computing, translating legacy Fortran codebases to C++ is necessary to leverage modern libraries and ensure performance portability across various heterogeneous high-performance computing (HPC) platforms. However, bulk translation of entire codebases often results in broken functionality and unmanageable complexity. Incremental translation, which involves creating Fortran-C++ layers, testing, and iteratively converting the code, is a more practical approach. Codescribe supports this process by automating the creation of these interfaces and assisting with generative AI to improve efficiency and accuracy, ensuring that performance and functionality are maintained throughout the conversion. Additionally, Codescribe facilitates code generation and updates, enabling users to create new applications or modify existing files seamlessly.
Installation
At present, we recommend installing Codescribe in a virtual environment:
.. code:: bash
python3 -m venv env source env/bin/activate pip install --upgrade pip
And install Codescribe using pip in editable mode:
.. code:: bash
pip install -e .
Editable mode enables testing of features/updates directly from the source code and is an effective method for debugging.
Usage
You can use the --help option with every command to get a better
understanding of their functionality.
.. code:: bash
▶ code-scribe --help Usage: code-scribe [OPTIONS] COMMAND [ARGS]...
Software development tool for converting code from Fortran to C++
Options: -v, --version --help Show this message and exit.
Commands: draft Perform a draft conversion from Fortran to C++ format Format TOML seed prompt files generate Perform AI-based code generation index Index Fortran files along a project directory tree inspect Perform AI code inspection on files translate Perform AI-based code conversion of Fortran files update Perform AI-based code update on files
Following is a brief overview of different commands:
#. code-scribe index <project_root_dir> - Parses the project
directory tree and creates a scribe.yaml file at each node along
the directory tree. These YAML files contain metadata about
functions, modules, and subroutines in the source files. This
information is used during the conversion process to guide LLM models
in understanding the structure of the code.
.. code:: yaml
# Example contents of scribe.yaml
directory: src
files:
module1.f90:
modules:
- module1
subroutines:
- subroutine1
- subroutine2
functions:
- function1
module2.f90:
modules: []
subroutines:
- subroutineA
functions:
- functionB
#. code-scribe draft <filelist>: Takes a list of files and generates
draft versions of the corresponding C++ files. The draft files are
saved with a .scribe extension and include prompts tailored to
each statement in the original source code.
#. code-scribe translate <filelist> -m <model_name_or_path> -p <seed_prompt.toml>: This command performs neural translation using
generative AI. You can either download a model locally from
HuggingFace and provide it as an option to -m or you can simply
set -m openai-gpt-4o to use the OpenAI API to perform code
translation. Note that -m openai-gpt-4o requires the environment
variable OPENAI_API_KEY to be set. The <prompt.toml> is a
chat template that guides AI to perform code translation using the
source and draft .scribe files.
.. code:: toml
# Example contents of seed_prompt.toml
[[chat.user]]
content = "‹Rules and syntax-related instructions for code conversion>"
[[chat.assistant]]
content = "I am ready. Please give me a test problem."
[[chat.user]]
content = "<Template of contents in a source file>"
[[chat.assistant]]
content = "<Desired contents of the converted file. Syntactically correct code>"
[[chat.user]]
content = "<Append code from a source file>"
#. code-scribe translate <filelist> -p <seed_prompt.toml> --save-prompts: This command allows the generation of file-specific
JSON chat templates that one can copy/paste to chat interfaces like
that of ChatGPT to generate the source code. The JSON files are
created from the seed prompt file and appended with source and draft
code.
#. code-scribe inspect <filelist> -q <query_prompt> --save-prompts:
Create a scribe.json that you can copy/paste to chat interfaces.
#. code-scribe inspect <filelist> -q <query_prompt> -m <model_name_or_path>: Perform a query on a set of source files
using a single prompt. This is useful for navigating and
understanding the source code.
#. code-scribe generate <seed_prompt> -m <model_name_or_path>:
Generate new source files or applications based on specifications in
the prompt.
#. code-scribe generate "<natural_language_prompt>" -m <model_name_or_path> -r <reference_file1> -r <reference_file2>:
Generate new source files or applications based on specifications in
the prompt. This implementation offers great flexibility in
generating source code and specification files.
#. code-scribe update <filelist> -p <seed_prompt.toml> -m <model_name_or_path>: Modify or extend existing source files using
seed prompt files.
#. code-scribe update <filelist> -q "<natural_language_prompt>" -r <reference_file1> -r <reference_file2> -m <model_name_or_path>:
This command allows for updating files using natural language prompts
and reference files. This implementation offers great flexibility in
updating existing files.
Integrating LLM of Choice
#. OpenAI Model: Codescribe supports OpenAI's GPT models (such as
gpt-4, gpt-3.5-turbo, etc.) via the OpenAI API. To use OpenAI's
models, specify -m openai-gpt-4o when executing the commands, as
shown below:
.. code:: bash
▶ code-scribe translate <filelist> -m openai-gpt-4o -p <seed_prompt.toml>
Ensure that the environment variable OPENAI_API_KEY is set with
your OpenAI API key. You can set it by running the following command
in your terminal:
.. code:: bash
export OPENAI_API_KEY="your_openai_api_key_here"
And you have installed the OpenAI library:
.. code:: bash
pip install openai
#. Hugging Face Transformers (TFModel): If you want to use a Hugging
Face model, such as those found on the Hugging Face model hub (e.g.,
Mistral, Llama), you can specify the path to the pre-trained model or
use a model directly from the Hugging Face library. Codescribe
supports this integration with the TFModel class.
To use a Hugging Face model, first install the necessary libraries if not already installed:
.. code:: bash
pip install transformers torch
Then specify the path to the pre-trained model using the -m flag in
the command. For example, to use a GPT-2 model:
.. code:: bash
▶ code-scribe translate <filelist> -m <path_to_model> -p <seed_prompt.toml>
You can download a model from the Hugging Face model hub by visiting
https://huggingface.co/models and choosing one that fits your
needs.
#. ARGO Models: Codescribe also supports integration with Argonne's
ARGO models, such as argo-gpt4o. These models are accessible on the
Argonne network by setting the environment variables ARGO_USER and
ARGO_API_ENDPOINT. To use ARGO models, specify -m argo-gpt4o or
any other ARGO-supported model of your choice when executing
commands, as shown below:
.. code:: b
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
