TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

This repository is the official implementation of TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions, Taehyung Kwon, Jihoon Ko, Jinghong jung, and Kijung Shin, ICDM 2023

Requirements

To run the provided codes, you need to install PyTorch. Since the installation commands for the package rely on the environments, please visit the page (https://pytorch.org/get-started/locally/) for guideline to install the package.
The code should be run on the folder (./) which includes the TensorCodec folder. The dataset files should be located in ./input.

Initializing the orders

Initialization of the orders of the tensor is implemented in init_order.py.

Positional argument

-sp, --save_path: path for saving the original tensor.
-lp, --load_path: path for saving the reordered tensor.

Example commands and results

  python TensorCodec/init_order.py -lp input/action_orig.npy -sp input/action.npy

  order: 0, loss before: 2387.244534244066, loss after: 2387.244534244066
  order: 1, loss before: 46618.97891356207, loss after: 7292.559081679066
  order: 2, loss before: 20461.291436824664, loss after: 14315.363870162731
  Total elapsed time: 8.323100328445435

Running TensorCodec

Training (compressing) and evaluating (decompressing) process are implemented in main.py.

Positional argument

action: train for compressing the matrix. test for checking the reconstruction loss of the trained model.
-d, --dataset: data to be compressed

Optional arguments (common)

-de, --device: GPU id(s) for execution.
-rk, --rank: rank of TT cores.
-hs, --hidden_size: size of the hidden dimension.
-m, --model: type of the model (gru, lstm, mha). The default is lstm.
-nb, --num_batch: the number of mini-batches for training.
-b, --batch_size: the number of entries of the tensor which are processed simultaneosly in GPUs.

Optional arguments for training

-lr, --lr: learning rate.
-e, --epoch: maximum epoch numbers.
-sp, --save_path: path for saving the parameters of the trained model and the new orders of the indices of the tensor (bijective function from indices of the reordered tensor to the indices of the original tensor).
-tol, --tol: tolerance for training.

Optional argument for evaluating

-lp, --load_path: path for loading the parameters of the trained model and the orders of the indices of the tensor.
-sp, --save_path: path for saving the reconstructed tensor

Example command

  # Training
  python TensorCodec/main.py train -d action -de 0 1 2 3 -rk 6 -hs 8 -sp output/action_r6_h8 -e 5000 -lr 1 -m lstm -nb 100 -t 100 -b 2097152

  # Evaluating
  python TensorCodec/main.py test -lp output/action_r6_h8 -d action -de 0 1 2 3 -rk 6 -hs 8 -sp output/action_recon.npy

Evaluating the trained model

Command

We uploaded the trained model for the 3 smallest tensors among 3-order tensors and the smallest tensor among 4-order tensors in the folder 'trained model'.
The hyperparameters (rank and hidden dimension) correspond to the models with the fewest parameters shown in Figure 3 of the main paper for all datasets.
You can run the code with the following commands. Note that the device option should be changed depending on the available GPUs.

  python TensorCodec/main.py test -d action -lp 'TensorCodec/trained model/action_r6_h8.pt' -de 0 1 2 3 -rk 6 -hs 8 -sp action_recon.npy
  python TensorCodec/main.py test -d airquality -lp 'TensorCodec/trained model/airquality_r7_h11.pt' -de 0 1 2 3 -rk 7 -hs 11 -sp airquality_recon.npy
  python TensorCodec/main.py test -d uber -lp 'TensorCodec/trained model/uber_r8_h7.pt' -de 0 1 2 3 -rk 8 -hs 7 -sp uber_recon.npy
  python TensorCodec/main.py test -d nyc -lp 'TensorCodec/trained model/nyc_r2_h5.pt' -de 0 1 2 3 -rk 2 -hs 5 -sp nyc_recon.npy

Expected results

||action|airquality|uber|nyc| |-|-|-|-|-| |Fitness|0.65|0.648|0.669|0.558| |Compressed Size (bytes)|11686|26031|11870|5227|

Real-world datasets we used

|Name|shape|Density|Source|Link| |-|-|-|-|-| |Uber|183 x 24 x 1,140|0.138|FROSTT|Link| |Air Quality|5,600 x 362 x 6|0.917|Air Korea|Link| |Action|100 x 570 x 567|0.393|Multivariate LSTM-FCNs|Link| |PEMS-SF|963 X 144 X 440|0.999|The UEA & UCR Time Series Classification Repository|Link| |Activity|337 x 570 x 320|0.569|Multivariate LSTM-FCNs|Link| |Stock|1,317 x 88 x 916|0.816|Zoom-Tucker|Link| |NYC|265 X 265 X 28 X 35|0.118|New York City Government|Link| |Absorb|192 x 228 x 30 x 120|1.000|Climate Data at the National Center for Atmospheric Research|Link|

TensorCodec

Install / Use

README

TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

Requirements

Initializing the orders

Positional argument

Example commands and results

Running TensorCodec

Positional argument

Optional arguments (common)

Optional arguments for training

Optional argument for evaluating

Example command

Evaluating the trained model

Command

Expected results

Real-world datasets we used

Related Skills