This page provides a general description of the components of the cluster. However, as new nodes are added, the hardware details of nodes will vary. We will archive the specifications of nodes purchased during each academic year here.
The cluster building blocks are chassis. Four CPU compute nodes fit into a 2U chassis, while each GPU compute node occupies a 5U chassis. Therefore, the GPU nodes are physically ten times bigger than the CPU compute nodes. Currently (Sept, 2018) the cluster has 1 head node, 60 CPU compute nodes, 12 GPU nodes, and 3 file servers (with 488 TB of usable storage).
CPU Compute Nodes: Each compute node has a motherboard with 2 sockets. Each socket has one CPU, which has 8 cores and 16 threads. So each compute node has 16 cores and 32 threads. Each compute node also has 64 GB of RAM. If you run all 32 threads, each one can use 2GB. When running with only 16 processes you can use 4GB ram per process, and get approximately 90% of the CPU performance of running 32 processes. Each compute node also has 1 TB of hard drive space with about 900 GB available for /scratch . The CPU nodes and file servers are connected via FDR Infiniband (Fourteen Data Rate) with a 2:1 backplane.
GPU Compute Nodes: Each GPU node is installed with eight NVIDIA GTX 980 Ti GPUs or GTX 1080 Ti. The CPUs, memory, and hard disk are the same as for the CPU compute node, except they have 128GB ram. The operating system is CentOS 7.0 for compatibility with applications. The GPU nodes are networked with gigabit Ethernet. Each GPU node chassis is fullwidth and 5U high.
File Servers: The base cluster includes three file servers containing 488TB of usable storage in the form of partitions made from 4 disk RAID5s. The older partitions are 22T, and the newer partitinos are 28TB.
Head Node: The head node is similar to a CPU compute node except that it has an independent 1U chassis, a 10GbE connection to the Internet, two disk drives, and 256GB ram.
Switches: The initial cluster configuration had 3 leaf switches (FDR 36Port InfiniBand Switch) and one spine supporting 72 nodes with a 2:1 backplane.
Interconnect: The compute nodes are connected to each other and to the file servers by 55 Gbps Infiniband through spine or leaf switches.
- Previous Node Purchases: For reference the specifications of the original nodes purchased in 2015 - 2016, as well as specifications from purchases following that date are archived here.
- Original Purchase: Production jobs are typically submitted using SLURM, which handles queuing and resource management.