OSACA
Open Source Architecture Code Analyzer
Install / Use
/learn @RRZE-HPC/OSACAREADME
.. image:: docs/img/osaca-logo.png :alt: OSACA logo :width: 80%
OSACA
Open Source Architecture Code Analyzer
For an innermost loop kernel in assembly, this tool allows automatic instruction fetching of assembly code and automatic runtime prediction including throughput analysis and detection for critical path and loop-carried dependencies.
.. image:: https://github.com/RRZE-HPC/OSACA/workflows/test-n-publish/badge.svg?branch=master&event=push :target: https://github.com/RRZE-HPC/OSACA/actions :alt: Build Status
.. image:: https://codecov.io/github/RRZE-HPC/OSACA/coverage.svg?branch=master :target: https://codecov.io/github/RRZE-HPC/OSACA?branch=master :alt: Code Coverage
.. image:: https://readthedocs.org/projects/osaca/badge/?version=latest :target: https://osaca.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
.. image:: https://img.shields.io/badge/read-the_docs-blue :target: https://osaca.readthedocs.io/ :alt: Docs
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/ambv/black :alt: Code Style
Getting started
OSACA is as a python module with a command line interface.
OSACA is also integrated into the Compiler Explorer at godbolt.org <https://godbolt.org>_, which allows using OSACA from a browser without any installation. To analyze an assembly snippet, go to https://godbolt.org change language to "Analysis", insert an AArch64 or x86 assembly code and make sure OSACA is selected in the corresponding analysis panel, e.g., https://godbolt.org/z/shK4f8. When analyzing a high-level language code, use the "Add tool..." menu in the compiler output panel to add OSACA analysis, e.g. https://godbolt.org/z/hbMoPn. To change the micro architecture model, add --arch and µarch shortname (e.g., SKX for Skylake, ZEN2, N1 for ARM Neoverse) to the "Compiler options..." (when using "Analysis" mode) or "Arguments" (when analyzing compiler output of a high-level code).
Installation
On most systems with python pip and setuputils installed, just run:
.. code:: bash
pip install --user osaca
for the latest release.
To build OSACA from source, clone this repository using git clone https://github.com/RRZE-HPC/OSACA and run in the root directory:
.. code:: bash
python ./setup.py install
After installation, OSACA can be started with the command osaca in the CLI.
Dependencies:
Necessary equirements are:
-
Python3 <https://www.python.org/>_ -
Graphviz <https://www.graphviz.org/>_ for dependency graph creation (minimal dependency islibgraphviz-devon Ubuntu) -
Python packages:
networkx <https://networkx.org/>_pyparsing <https://github.com/pyparsing/pyparsing>_ruamel.yaml <https://pypi.org/project/ruamel.yaml/>_
Optional requirements are:
Kerncraft <https://github.com/RRZE-HPC/kerncraft>__ >=v0.8.4 for marker insertionibench <https://github.com/RRZE-HPC/ibench>__ orasmbench <https://github.com/RRZE-HPC/asmbench/>__ for throughput/latency measurementsBeautifulSoup4 <https://www.crummy.com/software/BeautifulSoup/bs4/doc/>__ for scraping instruction form information for the x86 ISA (experimental)
Design
A schematic design of OSACA's workflow is shown below:
.. image:: docs/img/osaca-workflow.png :alt: OSACA workflow :width: 80%
Usage
The usage of OSACA can be listed as:
.. code:: bash
osaca [-h] [-V] [--arch ARCH] [--fixed] [--lines LINES]
[--ignore-unknown] [--lcd-timeout SECONDS]
[--db-check] [--import MICROBENCH] [--insert-marker]
[--export-graph GRAPHNAME] [--consider-flag-deps]
[--out OUT] [--yaml-out YAML_OUT] [--verbose]
FILEPATH
-h, --help
prints out the help message.
-V, --version
shows the program’s version number.
--arch ARCH
needs to be replaced with the target architecture abbreviation.
See the table of supported microarchitectures below <https://github.com/RRZE-HPC/OSACA?tab=readme-ov-file#supported-microarchitectures>__ for all possible options. If no micro-architecture is given, OSACA assumes a default architecture for x86/AArch64.
--syntax SYNTAX
Define the assembly syntax (ATT, Intel) for x86. If no syntax is given, OSACA tries to determine automatically the syntax to use.
--fixed
Run the throughput analysis with fixed port utilization for all suitable ports per instruction.
Otherwise, OSACA will print out the optimal port utilization for the kernel.
--lines
Define lines that should be included in the analysis. This option overwrites any range defined by markers in the assembly. Add either single lines or ranges defined
by "-" or ":", each entry separated by commas, e.g.: --lines 1,2,8-18,20:24
--db-check
Run a sanity check on the by "--arch" specified database.
The output depends on the verbosity level.
Keep in mind you have to provide an existing (dummy) filename in anyway.
--import MICROBENCH
Import a given microbenchmark output file into the corresponding architecture instruction database.
Define the type of microbenchmark either as "ibench" or "asmbench".
--insert-marker
OSACA calls the Kerncraft module for the interactively insertion of IACA <https://software.intel.com/en-us/articles/intel-architecture-code-analyzer>__ byte markers or OSACA AArch64 byte markers in suggested assembly blocks.
--export-graph EXPORT_PATH
Output path for .dot file export. If "." is given, the file will be stored as "./osaca_dg.dot".
After the file was created, you can convert it to a PDF file using dot <https://graphviz.gitlab.io/_pages/pdf/dotguide.pdf>__.
--ignore-unknown
Force OSACA to apply a throughput and latency of 0.0 cy for all unknown instruction forms.
If not specified, a warning will be printed instead if one ore more isntruction form is unknown to OSACA.
--lcd-timeout SECONDS
Set timeout in seconds for LCD analysis. After timeout, OSACA will continue its analysis with the dependency paths found up to this point.
Defaults to 10.
-f, --consider-flag-deps
Consider flag dependencies for the critical path and loop-carried dependency analysis. By default, those dependencies are ignored.
-v, --verbose
Increases verbosity level
-o OUT, --out OUT
Write analysis to this file (default to stdout)
--yaml-out YAML_OUT
Write analysis as YAML representation to this file
The FILEPATH describes the filepath to the file to work with and is always necessary, use "-" to read from stdin.
Supported microarchitectures
x86 CPUs
+----------+-----------------+------------+
| Designer | Model/microarch | OSACA flag |
+==========+=================+============+
| Intel | Sandy Bridge | SNB |
+----------+-----------------+------------+
| Intel | Ivy Bridge | IVB |
+----------+-----------------+------------+
| Intel | Haswell | HSW |
+----------+-----------------+------------+
| Intel | Broadwell | BDW |
+----------+-----------------+------------+
| Intel | Skylake-X | SKX |
+----------+-----------------+------------+
| Intel | Cascadelake-X | CSX |
+----------+-----------------+------------+
| Intel | Icelake client | ICL |
+----------+-----------------+------------+
| Intel | Icelake server | ICX |
+----------+-----------------+------------+
| Intel | Sapphire Rapids | SPR |
+----------+-----------------+------------+
| AMD | Naples / Zen 1 | ZEN1 |
+----------+-----------------+------------+
| AMD | Rome / Zen 2 | ZEN2 |
+----------+-----------------+------------+
| AMD | Milan / Zen 3 | ZEN3 |
+----------+-----------------+------------+
| AMD | Genoa / Zen 4 | ZEN4 |
+----------+-----------------+------------+
| AMD | Turin / Zen 5 | ZEN5 |
+----------+-----------------+------------+
ARM AArch64 CPUs
+-----------+-------------------+-------------+
| Designer | Model/microarch | OSACA flag |
+===========+===================+=============+
| ARM | Cortex-A72 | A72 |
+-----------+-------------------+-------------+
| ARM | Neoverse N1 | N1 |
+-----------+-------------------+-------------+
| ARM | Neoverse V2 | V2 |
+-----------+-------------------+-------------+
| Marvell | ThunderX2 | TX2 |
+-----------+-------------------+-------------+
| Fujitsu | FX700/A64FX | A64FX |
+-----------+-------------------+-------------+
| HiSilicon | TaiShan v110 | TSV110 |
+-----------+-------------------+-------------+
| Apple | M1-Firestorm | M1 |
+-----------+-------------------+-------------+
| NVIDIA | Neoverse V2/Grace | V2 |
+-----------+-------------------+-------------+
Hereinafter OSACA's scope of function will be described.
Throughput & Latency analysis
As main functionality of OSACA, the tool starts the analysis on a marked assembly file by running the following command with one or more of the optional parameters:
.. code-block:: bash
osaca --arch ARCH [--fixed] [--ignore-unknown]
[--export-graph EXPORT_PATH]
file
The file parameter specifies the target assembly file and is always mandatory.
The parameter ARCH is positional for the analysis and must be replaced by the target architecture abbreviation.
OSACA assumes an optimal scheduling for all instructions and assumes the processor to be able to schedule instructions in a way that it achieves a minimal reciprocal throughput. However, in older versions (<=v0.2.2) of OSACA, a fixed probability for port utilization was assumed. This means, instructions with N available ports for execution were scheduled with a probabil
