SkillAgentSearch skills...

Dataloader

check python dataloading speed

Install / Use

/learn @Delaunay/Dataloader
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Dataloader

Fast JPEG loader even on hard drives

On a standard hard drive:

  • Pytorch Default ImageFolder and Dataloader: 30.02 img / sec
  • cpploader ImageFolder : 105.67 img / sec
  • cpploader ZippedImageFolder : 2023.00 img / sec

Able to staturate the GPU with fresh data at all times!

Pytorch Extension

Easy Install

python setup.py --user

Minimal Example

import cpploader

print('Creating Dataset')
folder = cpploader.Dataset(backend, '/imagenet_folder_train/', True)

print('Creating Sampler')
sampler = cpploader.Sampler('RandomSampler', folder.size(), args.seed)

print('Setting up Loader')
loader = cpploader.Loader(
    folder,
    sampler,
    args.batch_size,
    args.threads,
    args.buffering,
    args.seed,
    args.mx_io
)

batch, targets = loader.next()

batch = batch.float().cuda()
targets = targets.long().cuda()

IO Benchmark

./io_benchmark --data /imagenet_folder_train/ -n 32 -b 32 -j 2

    --data  imagenet_folder             
    -n      number of batches to fetch  defaults to 32
    -b      batch size                  defaults to 32
    -j      number of threads           defaults to 16
    -io     number of io threads        defaults to 4
    -p      prefetching/buffering size  defaults to 3
    --seed  PRNG seed                   defaults to time(nullptr))

After runing it will display the report below. The sample report was generated using two spindle disks in RAID0 to store the dataset. We can very clearly see that disk IO was the bottleneck in that case.

     - 32842 images found
     - 100 classes found
     - took 2.07719s to initialize
    ---------------------------------------------------
                        REPORT
    ---------------------------------------------------
            Per Thread     |  Overall
    1. read 1026 images
     -    66.5692      sec |     4.1606      sec
     -    15.4125 file/sec |   246.6006 file/sec
     -     9.2535   Mo/sec |   148.0565   Mo/sec
    2. transform 1026 images
     -    19.7767      sec |     1.2360      sec
     -    51.8792 file/sec |   830.0670 file/sec
     -    31.1983   Mo/sec |   499.1728   Mo/sec
    3. decode 1026 images
     -     9.8698      sec |     0.6169      sec
     -   103.9539 file/sec |  1663.2621 file/sec
     -   311.8616   Mo/sec |  4989.7863   Mo/sec
    4. scaling 1026 images
     -     1.3806      sec |     0.0863      sec
     -   743.1643 file/sec | 11890.6291 file/sec
     -   106.4768   Mo/sec |  1703.6281   Mo/sec
    Total 1026 images
     -    97.5962      sec |    16.6930      sec
     -    10.5127 file/sec |    61.4628 file/sec
     - Overhead -80.9032 sec 
     - Compression Ratio before scaling 4.9921
     - Compression Ratio after scaling 0.2389
     - IO wait 165.3841 sec
    ---------------------------------------------------
    Dataloader 32 batchs 
     -  Sched   0.0042 sec
     -  Batch  16.6606 sec |    61.4625 img/sec
     - Reduce   0.0280 sec | 36557.4010 img/sec
     -  Total  16.6928 sec |    61.3439 img/sec
    ---------------------------------------------------
    Thread Pool Report
       ID       WORK       IDLE   (%)  TASKS  WORK/TASK
        0    16.4763     0.0043  99.97    64     0.2574
        1    16.6905     0.0043  99.97    65     0.2568
        2    16.6916     0.0043  99.97    65     0.2568
        3    16.6141     0.0043  99.97    64     0.2596
        4    16.6100     0.0043  99.97    64     0.2595
        5    16.4774     0.0043  99.97    63     0.2615
        6    16.5331     0.0043  99.97    64     0.2583
        7    16.4817     0.0042  99.97    64     0.2575
        8    16.5182     0.0042  99.97    64     0.2581
        9    16.5331     0.0042  99.97    64     0.2583
       10    16.4014     0.0042  99.97    64     0.2563
       11    16.6132     0.0042  99.97    64     0.2596
       12    16.4809     0.0042  99.97    64     0.2575
       13    16.6858     0.0042  99.97    63     0.2649
       14    16.6135     0.0042  99.97    64     0.2596
       15    16.5208     0.0032  99.98    65     0.2542
    Total   264.9417     0.0668  99.97  1025     0.2585

                     TIME   (%)
    Empty Queue      0.00   0.02
    Full  Queue      0.00   0.00
       All Time     16.70 100.00

        Arrival Rate      7.69
      Departure Rate      7.83

Compilation

  • run and tested on gcc 8.2 with the -std=c++17 flag (depends on <filesystem>)
git clone --recurse-submodules -j8 ...
cd dataloader
./download_libtorch.sh

# Build
mkdir build
cd build
cmake -DBUILDING_TEST=ON ..
make

# Run tests
make test
View on GitHub
GitHub Stars4
CategoryDevelopment
Updated2y ago
Forks1

Languages

C++

Security Score

55/100

Audited on Feb 6, 2024

No findings