SkillAgentSearch skills...

Pal

An optimized C library for math, parallel processing and data movement

Install / Use

/learn @parallella/Pal
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

PAL: The Parallel Architectures Library

Build Status Coverity Scan Build Status

The Parallel Architectures Library (PAL) is a compact C library with optimized routines for math, synchronization, and inter-processor communication.

Content

  1. Why?

  2. Design goals

  3. License

  4. Contribution Wanted!

  5. A Simple Example

  6. Build Instructions

  7. Library API reference
    7.0 Syntax
    7.1 Program Flow
    7.2 Data Movement
    7.3 Synchronization
    7.3 Basic Math
    7.5 Basic DSP
    7.4 Image Processing
    7.6 FFT (FFTW)
    7.7 Linar Algebra (BLAS)
    7.8 System Calls

8 Status Report

9 Benchmarking


Why?

Any sane and informed person knows that the future of computing is massively parallel. Unfortunately the energy needed to escape the current "von Neumann potential well" seems to be approaching infinity. The legacy programming stack is so effective and so easy to use that developers and companies simply cannot afford to choose the better (parallel) solution. To make parallel computing ubiquitous our only choice is to rewrite the whole software stack from scratch, including: algorithms, run-times, libraries, and applications. The goal of the Parallel Architectures Library project is to establish the lowest layer of this brave new programming stack.

Design Goals

  • Fast (Super fast but no "belt AND suspenders")
  • Compact (Small enough to work for memory limited processors with <32KB RAM)
  • Scalable (Thread and data scalable)
  • Portable (Portable across different ISAs and systems)
  • Permissive (Apache 2.0 license to maximize industry adoption)

License

The PAL source code is licensed under the Apache License, Version 2.0. See LICENSE for full license text unless otherwise specified.

Contribution

Our goal is to make PAL a broad community project from day one. If just 100 people contribute one function each, we'll be done in a couple of days! If you know C, you are ready to contribute!!

Instructions for contributing can be found HERE.

Build Instructions

Install Prerequisites

$ sudo apt-get install libtool build-essential pkg-config autoconf automake doxygen

Build Sequence

$ ./bootstrap
$ ./configure --enable-device-epiphany
$ make

Testing

To run the automated unit tests you need to run

$ make check

A Simple Example

The following sample shows how to use PAL launch a simple task on a remote processor within the system. The program flow should be familiar to anyone who has used accelerator programming frameworks.

Manager Code

#include <pal.h>
#include <stdio.h>
#define N 16
int main(int argc, char *argv[])
{

    // Stack variables
    char *file = "./hello_task.elf";
    char *func = "main";
    int status, i, all, nargs = 1;
    char *args[nargs];
    char argbuf[20];

    // References as opaque structures
    p_dev_t dev0;
    p_prog_t prog0;
    p_team_t team0;
    p_mem_t mem[4];

    // Execution setup
    dev0 = p_init(P_DEV_DEMO, 0);        // initialize device and team
    prog0 = p_load(dev0, file, func, 0); // load a program from file system
    all = p_query(dev0, P_PROP_NODES);   // find number of nodes in system
    team0 = p_open(dev0, 0, all);        // create a team

    // Running program
    for (i = 0; i < all; i++) {
        sprintf(argbuf, "%d", i); // string args needed to run main asis
        args[0] = argbuf;
        status = p_run(prog0, team0, i, 1, nargs, args, 0);
    }
    p_wait(team0);    // not needed
    p_close(team0);   // close team
    p_finalize(dev0); // finalize memory

    return 0;
}

Worker Code (hello_task.elf)

#include <stdio.h>
int main(int argc, char* argv[]){
    int pid=0;
    int i;
    pid=atoi(argv[2]);
    printf("--Processor %d says hello!--\n", pid);
    return i;
}

PAL LIBRARY API REFERENCE

SYNTAX

PROGRAM FLOW

These program flow functions are used to manage the system and to execute programs. All PAL objects are referenced via handles (opaque objects).

| FUNCTION | NOTES | | --------------------------------------- | -------------------------------------------- | | p_init() | initialize the run time | | p_query() | query a device object | | p_load() | load binary elf file into memory | | p_run() | run a program on a team of processors | | p_open() | open a team of processors | | p_append() | add members to team | | p_remove() | remove members from team | | p_close() | close a team of processors | | p_barrier() | team barrier | | p_wait() | wait for team to finish | | p_fence() | memory fence | | p_finalize() | cleans up run time | | p_error() | get error code (if any). | | p_mem_error() | get error code for a memory object (if any). |

MEMORY ALLOCATION

These functions are used for creating memory objects. The functions return a unique PAL handle for each new memory object. This handle can then be used by functions like p_read() and p_write() to access data within the memory object.

| FUNCTION | NOTES | STATUS | | ----------------------------------- | ----------------------------------- | ------ | | p_malloc() | allocate memory on local processor | | | p_rmalloc() | allocate memory on remote processor | | | p_free() | free memory | |

DATA MOVEMENT

The data movement functions move blocks of data between opaque memory objects and locations specified by pointers. The memory object is specified by a PAL handle returned by a previous API call. The exception is the p_memcpy function which copies blocks of bytes within a shared memory architecture only.

| FUNCTION | NOTES | | ----------------------------------- | ------------------------- | | p_gather() | gather operation | | p_memcpy() | fast memcpy() | | p_read() | read from a memory object | | p_scatter() | scatter operation | | p_write() | write to a memory object |

SYNCHRONIZATION

The synchronization functions are useful for program sequencing and resource locking in shared memory systems.

| FUNCTION | NOTES | | --------------------------------------------------- | --------------------------- | | p_mutex_lock() | lock a mutex | | p_mutex_trylock() | try locking a mutex once | | p_mutex_unlock() | unlock (clear) a mutex | | p_mutex_init() | initialize a mutex | | p_atomic_add() | atomic fetch and add | | p_atomic_sub() | atomic fetch and sub | | p_atomic_and() | atomic fetch and 'and' | | p_atomic_xor() | atomic fetch and 'xor' | | p_atomic_or() | atomic fetch and 'or' | | p_atomic_swap() | atomic exchange | | p_atomic_compswap() | atomic compare and exchange |

MATH

The math functions replace the traditional math lib functions and extend them to include support for data as well as task parallelism.

| FUNCTION | NOTES | | ------------------------------------- | ----------------------------- | | p_abs() | absolute value | | p_absdiff() | absolute difference | | p_add() | add | | p_acos() | arc cosine | | p_acosh() | arc hyperbolic cosine | | p_asin() | arc sine | | p_asinh() | arc hyperbolic sine | | p_cbrt() | cubic root | | p_cos() | cosine | | p_cosh() | hyperbolic cosine

View on GitHub
GitHub Stars318
CategoryDevelopment
Updated13d ago
Forks112

Languages

C

Security Score

95/100

Audited on Mar 19, 2026

No findings