Pluto
Pluto: An automatic polyhedral parallelizer and locality optimizer
Install / Use
/learn @bondhugula/PlutoREADME
Pluto
Overview
<img src="poly_hyperplane.png" width="50%"></img><br/>
PLUTO is an automatic parallelization tool based on the polyhedral model. The polyhedral model for compiler optimization provides an abstraction to perform high-level transformations such as loop-nest optimization and parallelization on affine loop nests. Pluto transforms C programs from source to source for coarse-grained parallelism and data locality simultaneously. The core transformation framework mainly works by finding affine transformations for efficient tiling. The scheduling algorithm used by Pluto has been published in [1]. OpenMP parallel code for multicores can be automatically generated from sequential C program sections. Outer (communication-free), inner, or pipelined parallelization is achieved purely with OpenMP parallel for pragrams; the code is also optimized for locality and made amenable for auto-vectorization. An experimental evaluation and comparison with previous techniques can be found in [2]. Though the tool is fully automatic (C to OpenMP C), a number of options are provided (both command-line and through meta files) to tune aspects like tile sizes, unroll factors, and outer loop fusion structure. Cloog is used for code generation.
This is the chain of the entire source-to-source system that polycc will run.
C code ⟶ Polyhedral extraction ⟶ Dependence analysis (clan or PET) (ISL or candl)
⟶ Pluto transformer (core Pluto algorithm + post transformation)
⟶ CLooG ⟶ C (with OpenMP, ivdep pragmas) (cloog + clast processing to mark loops parallel, ivdep)
-
Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. International Conference on Compiler Construction (ETAPS CC), Apr 2008, Budapest, Hungary.
-
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan. ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Jun 2008, Tucson, Arizona.
This package includes both the tool pluto and libpluto. The pluto tool is a
source-to-source transformer meant to be run via the polycc script, libpluto
provides a thread-safe library interface.
License
Pluto and libpluto are available under the MIT LICENSE. Please see the file
LICENSE in the top-level directory for more details.
Installing Pluto
Prerequisites
A Linux distribution. Pluto has been tested on x86 and x86-64 machines running Fedora, Ubuntu, and CentOS.
-
In order to use the development version from Pluto's git repository, automatic build system tools, including
autoconf,automake, andlibtoolare needed. -
LLVM/Clang 15.x (15.x recommended, 11.x, 12.x, 14.x tested to work as well), along with its development/header files, is needed for the pet submodule. These packages are available in standard distribution repositories or could be installed by building LLVM and Clang from source. See
pet/READMEfor additional details. On most modern distributions, these can be installed from the repositories.Example:
# On an Ubuntu. sudo apt install -y llvm-14-dev libclang-14-dev # On a Fedora. sudo dnf -y install llvm15-devel clang15-devel -
LLVM
FileCheckis used for Pluto's test suite. (On a Fedora, this is part of the 'llvm' package.) -
GMP (GNU multi-precision arithmetic library) is needed by ISL (one of the included libraries). If it's not already on your system, it can be installed easily with, for e.g.,
sudo yum -y install gmp gmp-develon a Fedora (sudo apt-get install libgmp3-devor something similar on an Ubuntu).
Pluto includes all polyhedral libraries on which it depends. See pet/README for
pet's pre-requisites.
Building Pluto
Stable release
Download the latest stable release from GitHub releases.
tar zxvf pluto-<version>.tar.gz
cd pluto-<version>/
./configure [--with-clang-prefix=<clang install location>]
make -j 32
make check-pluto
configure can be provided --with-isl-prefix=<isl install location> to build
with another isl version; otherwise, the bundled isl is used.
Development version from Git
git clone git@github.com:bondhugula/pluto.git
cd pluto/
git submodule init
git submodule update
./autogen.sh
./configure [--enable-debug] [--with-clang-prefix=<clang headers/libs location>]
# Example: on an Ubuntu: --with-clang-prefix=/usr/lib/llvm-14, on a Fedora,
# typically, it's /usr/lib64/llvm14.
make -j 32
make check-pluto
-
Use
--with-clang-prefix=<location>to point to the specific clang to build with. -
Use
--with-isl-prefix=<isl install location>to compile and link with an already installed isl. By default, the version of isl bundled with Pluto will be used.
polycc is the wrapper script around src/pluto (core transformer) and all other
components. polycc runs all of these in sequence on an input C program (with
the section to parallelize/optimize marked) and is what a user should use on
input. The output generated is OpenMP parallel C code that can be readily compiled
and run on shared-memory parallel machines like general-purpose multicores.
libpluto.{so,a} is also built and can be found in src/.libs/. make install
will install it.
Using Pluto
-
Use
#pragma scopand '#pragma endscop' around the section of code you want to parallelize/optimize. -
Then, run:
./polycc <C source file> [--pet]
The output file will be named <original prefix>.pluto.c unless '-o <filename>" is supplied. When --debug is used, the .cloog used to generate code is not deleted and is named similarly. The pet frontend
--petis needed to process many of the test cases/examples.
Please refer to the documentation of Clan or PET for information on the
kind of code around which one can put #pragma scop and #pragma endscop. Most of the time, although your program may not satisfy the
constraints, it may be possible to work around them.
Trying a new example
-
Use
#pragma scopand#pragma endscoparound the section of code you want to parallelize/optimize. -
Then, just run
./polycc <C source file> --pet.The transformation is also printed out, and
test.par.cwill have the parallelized code. If you want to see intermediate files, like the.cloogfile generated (.opt.cloog,.tiled.cloog, or.par.cloog, depending on command-line options provided), use--debugon the command line. -
Tile sizes can be specified in a file
tile.sizes, otherwise, default sizes will be set. See further below for details/instructions on how to specify/force custom sizes.
To run a good number of experiments on a code, it is best to use the setup
created for example codes in the examples/ directory. If you do not have
ICC (Intel C compiler), uncomment line 9 and comment line
8 of examples/common.mk to use GCC.
-
Just copy one of the sample directories in
examples/, editMakefile(SRC =). -
do a
make(this will build all executables;origis the original code compiled with the native compiler,tiledis the tiled code,paris the OpenMP parallelized + locality-optimized code. One could domake <target>where target can be orig, orig_par, opt, tiled, par, pipepar, etc. (seeexamples/common.mkfor a complete list). -
make check-plutoto test for correctness,make perfto compare performance.
Command-line options
./polycc -h
Specifying custom tile sizes through the tile.sizes file
A 'tile.sizes' file in the current working directory can be used to manually specify tile sizes. Specify one tile size on each line, and as many tile sizes are there are hyperplanes in the outermost non-trivial permutable band. When specifying tile sizes for multiple levels (with --second-level-tile), first specify first-level tile sizes, then second:first tile size ratios. See examples/matmul/tile.sizes as an example. If 8x128x8 is the first level tile size, and 128x256x128 for the second level, the tile.sizes file will be:
# First level tile size 8x128x8.
8
128
8
# Second level is 16*8 x 2*128 x 16*8.
16
2
16
The default tile size in the absence of a tile.sizes file is 32 (along all dimensions), and the default second/first ratio is 8 (when using --second-level-tile). The sizes specified correspond to transformed loops in that order. For eg., for heat-3d, you'll see this output when you run Pluto
# With default tile sizes.
../../polycc test/3d7pt.c --pet
[pluto] compute_deps (isl)
[pluto] Number of statements: 1
[pluto] Total number of loops: 4
[pluto] Number of deps: 15
[pluto] Maximum domain dimensionality: 4
[pluto] Number of parameters: 0
[pluto] Concurrent start hyperplanes found
[pluto] Affine transformations [<iter coeff's> <param> <const>]
T(S1): (t-i, t+i, t+j, t+k)
loop types (loop, loop, loop, loop)
[Pluto] After tiling:
T(S1): ((t-i)/32, (t+i)/32, (t+j)/32, (t+k)/32, t-i, t+i, t+j, t+k)
loop types (loop, loop, loop, loop, loop, loop, loop, loop)
[Pluto] After intra_tile reschedule
T(S1): ((t-i)/32, (t+i)/32, (t+j)/32, (t+k)/32, t, t+i, t+j, t+k)
