OFED and GPUDirect
Technology use
The information provided in this page is provide assistance on the technology available within Baskerville. We are unable to assist with code migration to utilise the technologies outlined.
Mellanox OFED¶
Baskerville uses Mellanox OFED for the Infiniband and High Speed Network drivers. This page tracks the versions of MOFED deployed over time.
June 2021¶
# ofed_info -s
MLNX_OFED_LINUX-5.3-1.0.0.1:
GPUDirect¶
NVIDIA GPUDirect® is a family of technologies to enhance data movement and access for GPUs. The following components are made available on Baskerville GPU nodes:
Mellanox OFED GPUDirect RDMA¶
Current release: 1.1
Mellanox OFED GPUDirect is an addition to MOFED and provides a peer-to-peer path between GPU Memory directly to the Mellanox Infiniband adapter.
Note that each compute tray has a single HCA and whilst the GPUs are meshed using NVLINK, not all GPUs may be directly accessible from the socket attached to the HCA. Details of the architecture are linked in the Lenovo documentation.
GPUDirect Storage¶
Current release: nvidia-gds-0.95.1
GPUDirect Storage (Magnum IO) enables a direct data path between storage and GPU memory utilising RDMA for data transfers.
Open Beta
GPUDirect Storage is currently an Open Beta from NVIDIA. IBM Spectrum Scale support was added in the 0.95.1 release. Full testing of GPUDirect Storage has not been completed on Baskerville.
Checking GPUDirect Storage is loaded:
$ lsmod | grep nvidia_fs
nvidia_fs
GDRCopy¶
Current release: 2.2
GDRCopy is a low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology.
Checking GDRCopy kernel module is loaded:
$ lsmod | grep gdr
gdrdrv
Testing gdrcopy is working:
$ module load bask-apps/live GDRCopy/2.1-GCCcore-10.2.0-CUDA-11.1.1
$ GDRCOPY_ENABLE_LOGGING=1 GDRCOPY_LOG_LEVEL=0 copylat
GPU id:0; name: A100-SXM4-40GB; Bus id: 0000:31:00
GPU id:1; name: A100-SXM4-40GB; Bus id: 0000:4b:00
GPU id:2; name: A100-SXM4-40GB; Bus id: 0000:ca:00
GPU id:3; name: A100-SXM4-40GB; Bus id: 0000:e3:00
selecting device 0
device ptr: 0x7f267e000000
allocated size: 16777216
DBG: wc_mapping=1
map_d_ptr: 0x7f26999ef000
info.va: 7f267e000000
info.mapped_size: 16777216
info.page_size: 65536
info.mapped: 1
info.wc_mapping: 1
page offset: 0
user-space pointer: 0x7f26999ef000
gdr_copy_to_mapping num iters for each size: 10000
WARNING: Measuring the issue overhead as observed by the CPU. Data might not be ordered all the way to the GPU internal visibility.
Test Size(B) Avg.Time(us)
DBG: sse4_1=1 avx=1 sse=1 sse2=1
DBG: using AVX implementation of gdr_copy_to_bar
gdr_copy_to_mapping 1 0.3448
gdr_copy_to_mapping 2 0.3433
gdr_copy_to_mapping 4 0.3433
gdr_copy_to_mapping 8 0.3429
gdr_copy_to_mapping 16 0.3434
gdr_copy_to_mapping 32 0.3433
gdr_copy_to_mapping 64 0.3419
gdr_copy_to_mapping 128 0.3621
gdr_copy_to_mapping 256 0.3683
gdr_copy_to_mapping 512 0.3815
gdr_copy_to_mapping 1024 0.4214
gdr_copy_to_mapping 2048 0.5269
gdr_copy_to_mapping 4096 0.7187
gdr_copy_to_mapping 8192 1.0892
gdr_copy_to_mapping 16384 1.1099
gdr_copy_to_mapping 32768 1.7705
gdr_copy_to_mapping 65536 3.6171
gdr_copy_to_mapping 131072 7.0432
gdr_copy_to_mapping 262144 13.9050
gdr_copy_to_mapping 524288 27.6336
gdr_copy_to_mapping 1048576 55.1509
gdr_copy_to_mapping 2097152 108.0527
gdr_copy_to_mapping 4194304 219.2105
gdr_copy_to_mapping 8388608 439.5790
gdr_copy_to_mapping 16777216 878.3777
gdr_copy_from_mapping num iters for each size: 100
Test Size(B) Avg.Time(us)
DBG: using SSE4_1 implementation of gdr_copy_from_bar
gdr_copy_from_mapping 1 1.2106
gdr_copy_from_mapping 2 1.6792
gdr_copy_from_mapping 4 1.6770
gdr_copy_from_mapping 8 1.6771
gdr_copy_from_mapping 16 0.8376
gdr_copy_from_mapping 32 1.2744
gdr_copy_from_mapping 64 1.5003
gdr_copy_from_mapping 128 1.6954
gdr_copy_from_mapping 256 1.6985
gdr_copy_from_mapping 512 1.6938
gdr_copy_from_mapping 1024 2.5872
gdr_copy_from_mapping 2048 3.6041
gdr_copy_from_mapping 4096 6.9445
gdr_copy_from_mapping 8192 12.0247
gdr_copy_from_mapping 16384 23.7481
gdr_copy_from_mapping 32768 46.9334
gdr_copy_from_mapping 65536 91.3752
gdr_copy_from_mapping 131072 191.5826
gdr_copy_from_mapping 262144 403.2905
gdr_copy_from_mapping 524288 813.1163
gdr_copy_from_mapping 1048576 1614.5567
gdr_copy_from_mapping 2097152 3231.4384
gdr_copy_from_mapping 4194304 6765.2056
gdr_copy_from_mapping 8388608 14080.6063
gdr_copy_from_mapping 16777216 28266.7212
unmapping buffer
unpinning buffer
closing gdrdrv