Skip to content

Baskerville System

The Baskerville system is a Lenovo® cluster solution and supplied by OCF. The system was deployed and integrated and is managed by Advanced Research Computing at the University of Birmingham.

Compute nodes

The system comprises:

  • 46 SD650-N V2 liquid cooled compute trays
    • 2x Intel® Xeon® Platinum 8360Y CPUs, each with 36 cores at 2.4GHz (with boost to 3.5GHz)
    • 512GB RAM (16x 32GB DDR4)
    • 1TB NVMe M.2 device (used for OS and available as /scratch-local)
    • 1x 25GbE NVIDIA® Mellanox® (on-planar ConnectX-4 port)
    • 1x HDR (200Gbps) NVIDIA Mellanox Infiniband port (ConnectX-6 PCIe gen4 adapter)
    • NVIDIA HGX-100 GPU planar
      • 4x NVIDIA A100 40GB GPGPUs

The GPUs are meshed using Nvidia NVLINK. Full details of the architecture are provided in the Lenovo documentation.

The compute trays are all direct liquid cooled using Lenovo Neptune™ to provide dense computing.

Global storage

The system is equipped with Lenovo DSS-G storage systems running IBM® Spectrum Scale™:

  • 1x DSS-G250 equipped with 418x 16TB HDD
  • 1x DSS-G204 equipped with 48x 7.68TB SSD

Two file-systems are deployed:

  • /bask - general storage for home directory and project bulk data storage
  • /scratch-global - transient storage on SSD enclosures available on all compute systems

Quota

User home directories are limited to 20GB hard limit. Home directory space is provided for login type scripts and it is expected that all code and data will be placed in a project space.

Project space by default is allocated at 1TB, however additional quota for projects can be requested and is allocated based on justified need.

Network

Baskerville uses three Networks:

  • isolated 1GbE management network (not user accessible)
  • 25GbE high speed Ethernet network
  • HDR fat tree Infiniband network

Infiniband network

The HDR fat tree Infiniband network is built using NVIDIA Mellanox Quantum HDR switches (QM8790) and is built using a fully non-blocking fat tree topology.

All compute systems use ConnectX-6 PCIe gen 4 adapters which provides a full 200Gb network connection. Architecturally, this is connected to Socket 1 on the system planar.

Login, management and storage systems are PCIe gen3 attached and provide HDR-100 connectivity to the fabric. Storage nodes all use multiple HDR-100 ports and use the Spectrum Scale verbsPorts option to enable the ports. RDMA is also enabled to the Spectrum Scale storage systems.

IPoIB is also deployed on the Infiniband network.

25GbE high speed Ethernet network

The high speed network is built using NVIDIA Mellanox Spectrum®-2 switches running NVIDIA Cumulus® Linux.

Trademarks and registered trademarks are owned by their respective companies.


Last update: September 29, 2021