SkillAgentSearch skills...

Inplace

CUDA and OpenMP implementations of C2R/R2C inplace transposition

Install / Use

/learn @bryancatanzaro/Inplace
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Inplace

CUDA and OpenMP implementations of the C2R and R2C inplace transposition algorithms. These algorithms are described in our PPoPP paper.

We have included a specialization for very tall, skinny matrices that yields good performance for in-place conversions between Arrays of Structures and Structures of Arrays.

The code includes OpenMP and CUDA implementations. The OpenMP implementation is declared in <inplace/openmp.h>, while the CUDA implementation is declared in <inplace/transpose.h>, and carries the following signatures:

namespace inplace {

void transpose(bool row_major, float* data, int m, int n);
void transpose(bool row_major, double* data, int m, int n);

}
View on GitHub
GitHub Stars48
CategoryDevelopment
Updated8mo ago
Forks9

Languages

Cuda

Security Score

82/100

Audited on Jul 25, 2025

No findings