SkillAgentSearch skills...

OSACA

Open Source Architecture Code Analyzer

Install / Use

/learn @RRZE-HPC/OSACA

README

.. image:: docs/img/osaca-logo.png :alt: OSACA logo :width: 80%

OSACA

Open Source Architecture Code Analyzer

For an innermost loop kernel in assembly, this tool allows automatic instruction fetching of assembly code and automatic runtime prediction including throughput analysis and detection for critical path and loop-carried dependencies.

.. image:: https://github.com/RRZE-HPC/OSACA/workflows/test-n-publish/badge.svg?branch=master&event=push :target: https://github.com/RRZE-HPC/OSACA/actions :alt: Build Status

.. image:: https://codecov.io/github/RRZE-HPC/OSACA/coverage.svg?branch=master :target: https://codecov.io/github/RRZE-HPC/OSACA?branch=master :alt: Code Coverage

.. image:: https://readthedocs.org/projects/osaca/badge/?version=latest :target: https://osaca.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. image:: https://img.shields.io/badge/read-the_docs-blue :target: https://osaca.readthedocs.io/ :alt: Docs

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/ambv/black :alt: Code Style

Getting started

OSACA is as a python module with a command line interface.

OSACA is also integrated into the Compiler Explorer at godbolt.org <https://godbolt.org>_, which allows using OSACA from a browser without any installation. To analyze an assembly snippet, go to https://godbolt.org change language to "Analysis", insert an AArch64 or x86 assembly code and make sure OSACA is selected in the corresponding analysis panel, e.g., https://godbolt.org/z/shK4f8. When analyzing a high-level language code, use the "Add tool..." menu in the compiler output panel to add OSACA analysis, e.g. https://godbolt.org/z/hbMoPn. To change the micro architecture model, add --arch and µarch shortname (e.g., SKX for Skylake, ZEN2, N1 for ARM Neoverse) to the "Compiler options..." (when using "Analysis" mode) or "Arguments" (when analyzing compiler output of a high-level code).

Installation

On most systems with python pip and setuputils installed, just run:

.. code:: bash

pip install --user osaca

for the latest release.

To build OSACA from source, clone this repository using git clone https://github.com/RRZE-HPC/OSACA and run in the root directory:

.. code:: bash

python ./setup.py install

After installation, OSACA can be started with the command osaca in the CLI.

Dependencies:

Necessary equirements are:

  • Python3 <https://www.python.org/>_

  • Graphviz <https://www.graphviz.org/>_ for dependency graph creation (minimal dependency is libgraphviz-dev on Ubuntu)

  • Python packages:

    • networkx <https://networkx.org/>_
    • pyparsing <https://github.com/pyparsing/pyparsing>_
    • ruamel.yaml <https://pypi.org/project/ruamel.yaml/>_

Optional requirements are:

  • Kerncraft <https://github.com/RRZE-HPC/kerncraft>__ >=v0.8.4 for marker insertion
  • ibench <https://github.com/RRZE-HPC/ibench>__ or asmbench <https://github.com/RRZE-HPC/asmbench/>__ for throughput/latency measurements
  • BeautifulSoup4 <https://www.crummy.com/software/BeautifulSoup/bs4/doc/>__ for scraping instruction form information for the x86 ISA (experimental)

Design

A schematic design of OSACA's workflow is shown below:

.. image:: docs/img/osaca-workflow.png :alt: OSACA workflow :width: 80%

Usage

The usage of OSACA can be listed as:

.. code:: bash

osaca [-h] [-V] [--arch ARCH] [--fixed] [--lines LINES]
	  [--ignore-unknown] [--lcd-timeout SECONDS]
	  [--db-check] [--import MICROBENCH] [--insert-marker]
      [--export-graph GRAPHNAME] [--consider-flag-deps]
      [--out OUT] [--yaml-out YAML_OUT] [--verbose]
      FILEPATH

-h, --help prints out the help message. -V, --version shows the program’s version number. --arch ARCH needs to be replaced with the target architecture abbreviation. See the table of supported microarchitectures below <https://github.com/RRZE-HPC/OSACA?tab=readme-ov-file#supported-microarchitectures>__ for all possible options. If no micro-architecture is given, OSACA assumes a default architecture for x86/AArch64. --syntax SYNTAX Define the assembly syntax (ATT, Intel) for x86. If no syntax is given, OSACA tries to determine automatically the syntax to use. --fixed Run the throughput analysis with fixed port utilization for all suitable ports per instruction. Otherwise, OSACA will print out the optimal port utilization for the kernel. --lines Define lines that should be included in the analysis. This option overwrites any range defined by markers in the assembly. Add either single lines or ranges defined by "-" or ":", each entry separated by commas, e.g.: --lines 1,2,8-18,20:24 --db-check Run a sanity check on the by "--arch" specified database. The output depends on the verbosity level. Keep in mind you have to provide an existing (dummy) filename in anyway. --import MICROBENCH Import a given microbenchmark output file into the corresponding architecture instruction database. Define the type of microbenchmark either as "ibench" or "asmbench". --insert-marker OSACA calls the Kerncraft module for the interactively insertion of IACA <https://software.intel.com/en-us/articles/intel-architecture-code-analyzer>__ byte markers or OSACA AArch64 byte markers in suggested assembly blocks. --export-graph EXPORT_PATH Output path for .dot file export. If "." is given, the file will be stored as "./osaca_dg.dot". After the file was created, you can convert it to a PDF file using dot <https://graphviz.gitlab.io/_pages/pdf/dotguide.pdf>__. --ignore-unknown Force OSACA to apply a throughput and latency of 0.0 cy for all unknown instruction forms. If not specified, a warning will be printed instead if one ore more isntruction form is unknown to OSACA. --lcd-timeout SECONDS Set timeout in seconds for LCD analysis. After timeout, OSACA will continue its analysis with the dependency paths found up to this point. Defaults to 10. -f, --consider-flag-deps Consider flag dependencies for the critical path and loop-carried dependency analysis. By default, those dependencies are ignored. -v, --verbose Increases verbosity level -o OUT, --out OUT Write analysis to this file (default to stdout) --yaml-out YAML_OUT Write analysis as YAML representation to this file

The FILEPATH describes the filepath to the file to work with and is always necessary, use "-" to read from stdin.

Supported microarchitectures

x86 CPUs

+----------+-----------------+------------+ | Designer | Model/microarch | OSACA flag | +==========+=================+============+ | Intel | Sandy Bridge | SNB | +----------+-----------------+------------+ | Intel | Ivy Bridge | IVB | +----------+-----------------+------------+ | Intel | Haswell | HSW | +----------+-----------------+------------+ | Intel | Broadwell | BDW | +----------+-----------------+------------+ | Intel | Skylake-X | SKX | +----------+-----------------+------------+ | Intel | Cascadelake-X | CSX | +----------+-----------------+------------+ | Intel | Icelake client | ICL | +----------+-----------------+------------+ | Intel | Icelake server | ICX | +----------+-----------------+------------+ | Intel | Sapphire Rapids | SPR | +----------+-----------------+------------+ | AMD | Naples / Zen 1 | ZEN1 | +----------+-----------------+------------+ | AMD | Rome / Zen 2 | ZEN2 | +----------+-----------------+------------+ | AMD | Milan / Zen 3 | ZEN3 | +----------+-----------------+------------+ | AMD | Genoa / Zen 4 | ZEN4 | +----------+-----------------+------------+ | AMD | Turin / Zen 5 | ZEN5 | +----------+-----------------+------------+

ARM AArch64 CPUs

+-----------+-------------------+-------------+ | Designer | Model/microarch | OSACA flag | +===========+===================+=============+ | ARM | Cortex-A72 | A72 | +-----------+-------------------+-------------+ | ARM | Neoverse N1 | N1 | +-----------+-------------------+-------------+ | ARM | Neoverse V2 | V2 | +-----------+-------------------+-------------+ | Marvell | ThunderX2 | TX2 | +-----------+-------------------+-------------+ | Fujitsu | FX700/A64FX | A64FX | +-----------+-------------------+-------------+ | HiSilicon | TaiShan v110 | TSV110 | +-----------+-------------------+-------------+ | Apple | M1-Firestorm | M1 | +-----------+-------------------+-------------+ | NVIDIA | Neoverse V2/Grace | V2 | +-----------+-------------------+-------------+


Hereinafter OSACA's scope of function will be described.

Throughput & Latency analysis

As main functionality of OSACA, the tool starts the analysis on a marked assembly file by running the following command with one or more of the optional parameters:

.. code-block:: bash

osaca --arch ARCH [--fixed] [--ignore-unknown]
                  [--export-graph EXPORT_PATH]
      file

The file parameter specifies the target assembly file and is always mandatory.

The parameter ARCH is positional for the analysis and must be replaced by the target architecture abbreviation.

OSACA assumes an optimal scheduling for all instructions and assumes the processor to be able to schedule instructions in a way that it achieves a minimal reciprocal throughput. However, in older versions (<=v0.2.2) of OSACA, a fixed probability for port utilization was assumed. This means, instructions with N available ports for execution were scheduled with a probabil

View on GitHub
GitHub Stars358
CategoryDevelopment
Updated19d ago
Forks28

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 18, 2026

No findings