ttsim, a fast full-system simulator of Tenstorrent hardware

ttsim provides a virtual Wormhole or Blackhole device that can run on any Linux/x86_64 system (including Windows via WSL2), without Tenstorrent silicon required. It is slower than silicon but still fast enough that you can run interesting workloads with good productivity, allowing you to explore and experiment with Tenstorrent's hardware and programming model before purchasing silicon.

Each simulator consists of a single libttsim.so file compiled for a specific chip architecture (Wormhole or Blackhole). This library exports a simple API that TT-Metalium knows how to communicate with.

Distribution

We currently provide binary releases for Linux/x86_64 only, with plans to release source code in the future under the Apache License. Visit the latest release page to download the latest version.

Chip Status

Wormhole/Blackhole: Nearing feature complete, with a small number of remaining features and bugs under active debug. Can run many tt-metal, ttnn, and tt-forge examples/tests in slow dispatch mode. Can also run numerous LLK tests.

Getting Started

Prerequisites

TT-Metalium installed and built
Set TT_METAL_HOME environment variable (e.g., export TT_METAL_HOME=~/tt-metal)

Installation

Download the simulator binary for your target chip from the releases page. Replace vX.Y with the desired version number.

mkdir -p ~/sim
cd ~/sim

# Download simulators
wget https://github.com/tenstorrent/ttsim/releases/download/vX.Y/libttsim_wh.so
wget https://github.com/tenstorrent/ttsim/releases/download/vX.Y/libttsim_bh.so

Running with TT-Metalium

Metal has simulator support out of the box, enabled by setting the TT_METAL_SIMULATOR environment variable to point to the simulator .so file. The libttsim.so must live in a directory also containing an SOC descriptor YAML file.

Wormhole

export TT_METAL_SIMULATOR=~/sim/libttsim_wh.so
cp $TT_METAL_HOME/tt_metal/soc_descriptors/wormhole_b0_80_arch.yaml ~/sim/soc_descriptor.yaml

cd $TT_METAL_HOME
TT_METAL_SLOW_DISPATCH_MODE=1 ./build/programming_examples/metal_example_add_2_integers_in_riscv

Blackhole

export TT_METAL_SIMULATOR=~/sim/libttsim_bh.so
cp $TT_METAL_HOME/tt_metal/soc_descriptors/blackhole_140_arch.yaml ~/sim/soc_descriptor.yaml

cd $TT_METAL_HOME
TT_METAL_SLOW_DISPATCH_MODE=1 ./build/programming_examples/metal_example_add_2_integers_in_riscv

Known Issues

Fast dispatch is not working, though work is in progress to support it. You must set TT_METAL_SLOW_DISPATCH_MODE=1.

SFPLOADMACRO is not supported in the SFPU. Set TT_METAL_DISABLE_SFPLOADMACRO=1 to disable its usage.

There are a variety of other unimplemented features in the simulator at present. We are working to fill in the gaps, but this will take time. Error messages will include one of the following categories:

UndefinedBehavior, UnpredictableValueUsed, NonContractualBehavior: See tt-isa-documentation glossary
UntestedFunctionality: Feature is implemented but lacks sufficient test coverage to be enabled
UnimplementedFunctionality: Feature not yet implemented but planned for future support
UnsupportedFunctionality: Feature unlikely to be implemented without strong justification
MissingSpecification: Feature requires additional internal specification work before implementation can proceed
SystemError/ConfigurationError: OS errors or issues with command line options, environment variables, or configuration files
AssertionFailure: Internal simulator bug

Numerical Accuracy

ttsim is designed to provide bit-exact numerical results relative to silicon for all computations, floating point and otherwise. The goal is to match all hardware computations bit-for-bit across all instructions, opcodes, functional units, and special cases, including the precise bit representation of NaNs produced by operations. While bugs are inevitable and some code paths are not yet bit-exact, this fidelity is the intended target.

While most code will achieve bit-exact results, cases that can produce divergent results include:

Computations with timing-dependent variation in operand order.
Reads from hardware entropy sources or random number generators.
Reads from performance counters, cycle counters, or timers.
Missing synchronization, cache flushes, or memory fences.
Execution of UndefinedBehavior or UnpredictableValue cases.
Any other violations of ISA specification requirements.

For timing-dependent computations, ttsim may evaluate operations in any order permitted by software synchronization. This may include operation orders that are extremely unlikely on silicon. To ensure an exact match with silicon, avoid algorithms where the runtime order of operations affects the result. For example, floating-point reductions using addition will diverge unless each addition is explicitly serialized in a deterministic order.

Current implementation status within the simulated Tensix:

Unpacker, SFPU, and packer: believed to be fully bit-accurate.
FPU MOV*, ELW*, and GMPOOL opcodes: believed to be fully bit-accurate.
FPU MVMUL and GAPOOL opcodes: not yet bit-accurate, but planned to be fixed to the extent possible.

Contributing

We welcome bug reports and feature requests! Please see CONTRIBUTING.md for guidelines.

Note: We do not accept pull requests. All development happens in an internal repository, and this public repository contains filtered binary releases. Please file issues for bugs or suggestions.

Support and Issues

If you encounter problems:

Check the Known Issues section above
Search existing issues to see if it's already reported
Open a new issue with details about your problem

For security vulnerabilities, please follow our Security Policy.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details. Additional information is available in LICENSE_understanding.txt.

Ttsim

Install / Use

README