HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Install / Use
/learn @NVIDIA-Merlin/HugeCTRREADME
HugeCTR
HugeCTR is a GPU-accelerated recommender framework designed for training and inference of large deep learning models.
Design Goals:
- Fast: HugeCTR performs outstandingly in recommendation benchmarks including MLPerf.
- Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR with plenty of documents, notebooks and samples.
- Domain Specific: HugeCTR provides the essentials, so that you can efficiently deploy your recommender models with very large embedding.
NOTE: If you have any questions in using HugeCTR, please file an issue or join our Slack channel to have more interactive discussions.
Table of Contents
- Core Features
- Getting Started
- HugeCTR SDK
- Support and Feedback
- Contributing to HugeCTR
- Additional Resources
Core Features
HugeCTR supports a variety of features, including the following:
- High-Level abstracted Python interface
- Model parallel training
- Optimized GPU workflow
- Multi-node training
- Mixed precision training
- HugeCTR to ONNX Converter
- Sparse Operation Kit
To learn about our latest enhancements, refer to our release notes.
Getting Started
If you'd like to quickly train a model using the Python interface, do the following:
-
Build the HugeCTR Docker image: From version 25.03, HugeCTR only provides the Dockerfile source, and users need to build the image by themselves. To build the hugectr image, use the Dockerfile located at
tools/dockerfiles/Dockerfile.basewith the following command:docker build --build-arg RELEASE=true -t hugectr:release -f tools/dockerfiles/Dockerfile.base . -
Start the container with your local host directory (/your/host/dir mounted) by running the following command:
docker run --gpus=all --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) hugectr:releaseNOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.
NOTE: HugeCTR uses NCCL to share data between ranks, and NCCL may requires shared memory for IPC and pinned (page-locked) system memory resources. It is recommended that you increase these resources by issuing the following options in the
docker runcommand.-shm-size=1g -ulimit memlock=-1 -
Write a simple Python script to generate a synthetic dataset:
# dcn_parquet_generate.py import hugectr from hugectr.tools import DataGeneratorParams, DataGenerator data_generator_params = DataGeneratorParams( format = hugectr.DataReaderType_t.Parquet, label_dim = 1, dense_dim = 13, num_slot = 26, i64_input_key = False, source = "./dcn_parquet/file_list.txt", eval_source = "./dcn_parquet/file_list_test.txt", slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543 ], dist_type = hugectr.Distribution_t.PowerLaw, power_law_type = hugectr.PowerLaw_t.Short) data_generator = DataGenerator(data_generator_params) data_generator.generate() -
Generate the Parquet dataset for your DCN model by running the following command:
python dcn_parquet_generate.pyNOTE: The generated dataset will reside in the folder
./dcn_parquet, which contains training and evaluation data. -
Write a simple Python script for training:
# dcn_parquet_train.py import hugectr from mpi4py import MPI solver = hugectr.CreateSolver(max_eval_batches = 1280, batchsize_eval = 1024, batchsize = 1024, lr = 0.001, vvgpu = [[0]], repeat_dataset = True) reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet, source = ["./dcn_parquet/file_list.txt"], eval_source = "./dcn_parquet/file_list_test.txt", slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543 ]) optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam, update_type = hugectr.Update_t.Global) model = hugectr.Model(solver, reader, optimizer) model.add(hugectr.Input(label_dim = 1, label_name = "label", dense_dim = 13, dense_name = "dense", data_reader_sparse_param_array = [hugectr.DataReaderSparseParam("data1", 1, True, 26)])) model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, workspace_size_per_gpu_in_mb = 75, embedding_vec_size = 16, combiner = "sum", sparse_embedding_name = "sparse_embedding1", bottom_name = "data1", optimizer = optimizer)) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape, bottom_names = ["sparse_embedding1"], top_names = ["reshape1"], leading_dim=416)) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat, bottom_names = ["reshape1", "dense"], top_names = ["concat1"])) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCross, bottom_names = ["concat1"], top_names = ["multicross1"], num_layers=6)) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct, bottom_names = ["concat1"], top_names = ["fc1"], num_output=1024)) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU, bottom_names = ["fc1"], top_names = ["relu1"])) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout, bottom_names = ["relu1"], top_names = ["dropout1"], dropout_rate=0.5)) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat, bottom_names = ["dropout1", "multicross1"], top_names = ["concat2"])) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct, bottom_names = ["concat2"], top_names = ["fc2"], num_output=1)) model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss, bottom_names = ["fc2", "label"], top_names = ["loss"])) model.compile() model.summary() model.graph_to_json(graph_config_file = "dcn.json") model.fit(max_iter = 5120, display = 200, eval_interval = 1000, snapshot = 5000, snapshot_prefix = "dcn")NOTE: Ensure that the paths to the synthetic datasets are correct with respect to this Python script.
data_reader_type,check_type,label_dim,dense_dim, anddata_reader_sparse_param_arrayshould be consistent with the generated dataset. -
Train the model by running the following command:
