ATOS
Multi-GPU dynamic scheduler using PGAS style cross-GPU communication
Install / Use
/learn @owensgroup/ATOSREADME
ATOS
Content
dataset: Folder containing graph dataset downloader for each graph.single-GPU: Folder containing single-GPU Atos asynchronous graph analytics implementations.bfs_nvlink: Folder containing multi-GPU Atos asynchronous BFS implementations on NVLink connected multi-GPU system.pr_nvlink: Folder containing multi-GPU Atos asynchronous PageRank implementations on NVLink connected multi-GPU system.bfs_ib: Folder containing multi-GPU Atos asynchronous BFS implementations on InfiniBand(IB) connected multi-GPU system.pr_ib: Folder containing multi-GPU Atos asynchronous PageRank implementations on InfiniBand(IB) connected multi-GPU system.comm: Folder containing implementations of distributed standard/priority queues and communication aggregator.perf_data: Folder containing the performance results of BFS and PageRank on NVLink and IB systems.
Prerequisite
Single-GPU Atos Graph Analytics
- CUDA (V11.4.120 or newer)
- GCC (9.4.0 or newer)
- boost (1.63 or newer)
Required Environment Variables
Set the following environment variables accordingly based on your dependency path
- CUDA_HOME
Multi-GPU Atos BFS and PageRank
- CUDA (V11.4.120 or newer)
- GCC (9.4.0 or newer)
- boost (1.63 or newer)
- NVSHMEM (can be downloaded from https://developer.nvidia.com/nvshmem by joining NVIDIA developer)
- METIS (https://github.com/KarypisLab/METIS) Compile METIS with 64 bites option
- OpenMPI (4.0.5 or newer) or IBM Spectrum MPI on Summit
Required Environment Variables
Set the following environment variables accordingly based on your dependency path
- CUDA_HOME
- METIS64_HOME
- NVSHMEM_HOME
- MPI_HOME
Reproduce Multi-GPU Performance Test on BFS and PageRank
-
Download Datasets
Several tested graph datasets are included under the
datasetsdirectory. Under each graph dataset folder, one needs to runmaketo download the dataset. The downloaded graph datasets are either.mtxformat or.csrformat. Atos uses the.csrformat. In the case of.mtxformat, one can usegen_csrtool under directorydatasetsto convert themtxformat tocsrformat. -
Compile BFS and PageRank
-
BFS implementations on NVLink systems are under the
bfs_nvlinkdirectory; BFS implementations on InfiniBand(IB) systems are under thebfs_ibdirectory. -
Under
bfs_nvlinkandbfs_ibdirectory, runmaketo compile the code. -
PageRank implementations on NVLink systems are under the
pr_nvlinkdirectory; PageRank implementations on InfiniBand(IB) systems are under thepr_ibdirectory. -
Under
pr_nvlinkandpr_ibdirectory, runmaketo compile the code
-
-
Run Performance Test for BFS on NVLink System
- Go the
bfs_nvlinkfolder. - To test BFS on the graphs under the datasets directory, run the
figure5_persist.shandfigure5_discrete.shto generate and extract the performance data for BFS on NVLinks. - The script file
figure5_persist.shgenerates performance results for BFS implementation using a standard queue and persistent kernel scheme. - The script file
figure5_discrete.shgenerate performance results for BFS implementation using priority queue and discrete kernel scheme. - The
figure5_persist.shandfigure5_discrete.shfirstly run the performance tests and output the results to a temporary file; then they extract and print the performance results.
Note: If abnormal results are generated by
figure5_persist.shandfigure5_discrete.sh, please re-run the performance tests as the print output from multi-processes can tangle in the way that our script fails to extract the performance output correctly. - Go the
-
Run Performance Test for BFS on InfiniBand(IB) System
- Go to the
bfs_ibfolder. - To test on the graphs under the datasets directory, run the
run_bfs.shor therun_bfs.batchif on Summit to generate the performance results for BFS on IB system. - Then extract the performance results by
./figure10_bfs.sh outputfile
Note: If abnormal results are generated by
figure10_bfs.sh, please re-run the performance tests as the print output from multi-processes can tangle in the way that our script fails to extract the performance output correctly. - Go to the
-
Run Performance Test for PageRank on NVLink System
- Go to the
pr_nvlinkfolder. <br> - To test on the graphs under the datasets directory, run the
figure7_persist.shandfigure7_discrete.shto generate and extract the performance data for PageRank on NVLinks. - The script file
figure7_persist.shgenerates performance results for BFS implementation using a standard queue and persistent kernel scheme. - The script file
figure7_discrete.shgenerate performance results for BFS implementation using standard queue and discrete kernel scheme. - The
figure7_persist.shandfigure7_discrete.shfirstly run the performance tests and output the results to a temporary file; then they extract and print the performance results.
Note: If abnormal results are generated by
figure7_persist.shandfigure7_discrete.sh, please re-run the performance tests as the print output from multi-processes can tangle in the way that our script fails to extract the performance output correctly. - Go to the
-
Run Performance Test for PageRank on InfiniBand(IB) System
- Go to the
pr_ibfolder. - To test on the graphs under the datasets directory, run the
run_pr.shor therun_pr.batchif on Summit to generate the performance results for PageRank on IB system. - Then extract the performance results by
./figure11_pr.sh outputfile
Note: If abnormal results are generated by
figure11_pr.sh, please re-run the performance tests as the print output from multi-processes can tangle in the way that our script fails to extract the performance output correctly. - Go to the
Pre-generated Performance Data
Pre-generated performance data are under perf_data directory. To extract the performance results, run figure5_persist.sh, figure5_prio.sh and figure10_bfs.sh under bfs_nvlink and bfs_ib directory and run figure7_discrete.sh, figure7_persist.sh and figure11_pr.sh under pr_nvlink and pr_ib directory.
Related Publications:
Related Skills
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
