Juzhen
Juzhen is a set of C++ APIs for matrix operations. It provides a higher-level interface for lower-level numerical calculation software like CBLAS and CUDA.
Install / Use
/learn @anewgithubname/JuzhenREADME
Juzhen (矩阵)
Juzhen is a set of C++ APIs for matrix operations. It provides a higher level interface for lower-level numerical calculation software like CBLAS and CUDA. It supports Neural Net API similar to the ones used in PyTorch or Tensorflow.
This API is developed under C++17 and CUDA 11.8.
Example
You can perform matrix operations like this:
#include <iostream>
#include "cpp/juzhen.hpp"
using namespace std;
int compute(){
// declare matrices like you would in MATLAB or Numpy.
Matrix<float> A = {"A", {{1,2,3},{4,5,6}}};
cout << A << endl;
Matrix<float> B = {"B", {{.1,.2},{.3,.4},{.5,.6}}};
cout << B << endl << endl;
//compute with matrices as if you are using MATLAB.
cout << log(exp(A*B)+1.0f)/5.0f << endl;
return 0;
}
Then compile:
mkdir bin
clang++ -x c++ -std=c++20 -DCPU_ONLY -DLOGGING_OFF -O3 ./cpp/launcher.cu ./helloworld.cu -o bin/helloworld.out -lopenblas
bin/helloworld.out
or on GPU:
#include <iostream>
#include "cpp/juzhen.hpp"
using namespace std;
int compute(){
//matrices on GPU
Matrix<CUDAfloat> A(Matrix<float>("A",{{1,2,3},{4,5,6}}));
cout << A << endl;
Matrix<CUDAfloat> B(Matrix<float>("B",{{.1,.2},{.3,.4},{.5,.6}}));
cout << B << endl << endl;
//only when you try to print, matrices are downloaded from the GPU.
cout << (log(exp(A*B)+1.0f)/5.0f) << endl;
return 0;
}
Then compile:
mkdir bin
nvcc -std=c++20 -DLOGGING_OFF -O3 ./cpp/launcher.cu ./cpp/cumatrix.cu ./cpp/cukernels.cu ./helloworld.cu -o bin/helloworld.out -lcublas -lcurand
bin/helloworld.out
They both prints out:
A 2 by 3
1 2 3
4 5 6
B 3 by 2
0.1 0.2
0.3 0.4
0.5 0.6
logM 2 by 2
0.461017 0.571807
0.981484 1.28033
You can verify the result using MATLAB:
>> A = [1,2,3;4,5,6];
>> B = [.1,.2;.3,.4;.5,.6];
>> (log(exp(A*B)+1.0)./5.0)
ans =
0.4610 0.5718
0.9815 1.2803
>>
Juzhen also supports matrix slicing:
#include <iostream>
using namespace std;
#include "cpp/juzhen.hpp"
int compute() {
Matrix<float> A = { "A", {{1,2,3},{4,5,6}} };
cout << A.columns(0, 2).inv() << endl;
return 0;
}
The code above is the same as the following MATLAB code:
>> A = [1,2,3;4,5,6];
>> inv(A(:,1:2))
ans =
-1.6667 0.6667
1.3333 -0.3333
You can also use a higher level Neural Net API to write Neural Net applications:
//helloworldnn.cu
#include "ml/layer.hpp"
using namespace std;
using namespace Juzhen;
#define FLOAT float
inline Matrix<float> randn(int m, int n)
{
return Matrix<float>::randn(m, n);
}
inline Matrix<float> ones(int m, int n) { return Matrix<float>::ones(m, n); }
#define MatrixI Matrix<int>
int compute()
{
// problem set up
const int n = 5000, batchsize = 50, numbatches = n / batchsize;
// regression dataset generation
auto X = randn(10, n), beta = randn(10, 1);
auto Y = beta.T() * X + randn(1, n);
auto XT = randn(10, n);
auto YT = beta.T() * XT + randn(1, n);
// define layers
Layer<float> L0(16, 10, batchsize), L1(4, 16, batchsize);
LinearLayer<float> L2(1, 4, batchsize);
// least sqaure loss
LossLayer<float> L3t(n, YT);
// nns are linked lists containing layers
list<Layer<float> *> trainnn({&L2, &L1, &L0}), testnn({&L3t, &L2, &L1, &L0});
// sgd
int iter = 0;
while (iter < 10000)
{
int batch_id = (iter % numbatches);
// obtaining batches
auto X_i = X.columns(batchsize * batch_id, batchsize * (batch_id + 1));
auto Y_i = Y.columns(batchsize * batch_id, batchsize * (batch_id + 1));
// forward-backward pass
forward(trainnn, X_i);
LossLayer<float> L3(batchsize, Y_i);
trainnn.push_front(&L3);
backprop(trainnn, X_i);
trainnn.pop_front();
// print progress
if (iter % 1000 == 0)
{
cout << "testing loss: " << Juzhen::forward(testnn, XT).elem(0, 0) << endl;
}
iter++;
}
return 0;
}
Then compile:
mkdir bin
clang++ -x c++ -std=c++20 -DCPU_ONLY -DLOGGING_OFF -O3 ./cpp/launcher.cu ./helloworldnn.cu -o bin/helloworldnn.out -lopenblas
bin/helloworldnn.out
Should I use Juzhen?
NO(*). MATLAB or NumPy are popular among data scientists for a reason. If you are interested in numerical C++ coding, I recommend Eigen or Armadillo.
This software is written with educational purposes in mind. If you still want to use it, please read on.
Get Started
-
First, install CBLAS and LAPACK and set environment varibles.
-
In Ubuntu,
sudo apt install libopenblas-dev liblapack-dev -
In Windows, you can download precompiled binaries from OpenBLAS.
-
In MacOS, BLAS and LAPACK are parts of Acclerate Framework, comes with XCode.
-
-
To use Juzhen without GPU support, copy the "cpp" folder to your C++ project directory and include the header file
juzhen.hpp. -
Suppose you wrote your code in
main.cu. Compile it usingclang++ -x c++ -I cpp/ -DLOGGING_OFF -D CPU_ONLY -O3 cpp/launcher.cu main.cu -o bin/main.out -llapack -lopenblasDo not forget to link lapack and BLAS!
-
If you want to use GPU APIs, you need to install CUDA and compile your code using
nvcc:nvcc -I cpp/ -O3 -DLOGGING_OFF main.cu cpp/cumatrix.cu cpp/cukernels.cu cpp/launcher.cu -o bin/main.out -lcublas -lcurand -llapack -lopenblas --extended-labmdaNote that we dropped
-D CPU_ONLYflag.
Examples Code:
- helloworld-cpu
- Regression using neural net APIs (on CPU or GPU).
- Binary Logistic Regression using a neural network.
- KNN classification on MNIST Dataset.
- 10 Class Logistic Regression using a neural network.
- Rectified Flow to transform two Gaussians.
# to build all examples, using CMAKE
mkdir build
cd build && cmake ..
cmake --build . --config=Release
# to run
./helloworld
<!-- ## Compile and Run Examples:
1. Helloworld CPU
```
make helloworld
bin/helloworld.out
```
2. Helloworld GPU
```
make helloworld-gpu
bin/helloworld-gpu.out
```
3. Simple Logistic Regression
```
make logi-bin
bin/logi-bin.out
```
4. MNIST CPU
```
make logi-cpu
bin/logi-cpu.out
```
5. MNIST GPU
```
make logi-gpu
bin/logi-gpu.out
```
6. Neural Net API CPU
```
make helloworld-nn
bin/helloworld-nn.out
```
7. Neural Net API GPU
```
make helloworld-nn-gpu
bin/helloworld-nn-gpu.out
``` -->
Supported Platforms
- Linux (CPU/GPU)
- MacOS (CPU)
- Windows (CPU/GPU*), you will need to install Visual Studio 2019 to compile the code.
std::move Sementics
Consider the following examples:
- Copy.
Matrix<float> A = {"A",{{1,2},{3,4},{5,6}}}; //A is created. Memory allocated. auto B = A; //B is a copy of A. Extra space allocated for B. B.zeros(); //B is zero, but A remains the same. - Ownership Transfer
Matrix<float> A = {"A",{{1,2},{3,4},{5,6}}}; auto B = std::move(A); // Transfer the memory owned by A to B. - Return Value
Matrix<float> A = {"A",{{1,2},{3,4},{5,6}}}; auto B = exp(A); // New memory allocated for B. B.zeros(); // B is zero, but A is not affected. - Sacrificial Intermediate Results
Matrix<float> A = {"A",{{1,2},{3,4},{5,6}}}; auto B = exp(std::move(A)); // Using the memory space owned by A to create B. A does not own any memory any more.
Allocating memory is expensive, so make good use of std::move semantics and steal memory from sacrificial intermediate results.
Garbage Collection
The allocated memory will not be immediately released, so later computations can reclaim those spaces without calling memory allocation functions. To release the memory before the scope exits, please remember to add a line at the begining of the scope:
MemoryDeleter<T> md1;
where T is the type of your matrix.
Profiling
main.cpp
#include <iostream>
using namespace std;
#include "cpp/juzhen.hpp"
int compute(){
Matrix<float> A = Matrix<float>::randn(500, 1000);
{Profiler p; //start the profiler
for (int i = 0; i < 1000; i++)
{
auto &&C = A * A.T();
}
}// profiler will automatically stop and print out elapsed time when the current scope exits.
return 0;
}
Compile and Run:
$ clang++ -x c++ -I cpp/ -DLOGGING_OFF -D CPU_ONLY -O3 cpp/launcher.cu main.cu -o bin/main.out -llapack -lopenblas
$ bin/main.out
Time: 1247.48 ms
Total memory released: 2.86102 MB.
Compare it with MATLAB:
>> A = randn(500,1000,'single');
tic;
for i=1:1000
C = A*A';
end;
toc;
Elapsed time is 0.7590
