Contributing to the Cluster

Contributing to the Cluster

Any PI, group of PIs or department in MPS can contribute to the cluster. Please see Standard Operations and Approximate Costs for more information on how compute nodes can be added and the associated costs.

Approximate Costs

For most users the only costs of interest are those of either CPU compute nodes or GPU compute nodes, storage, cabling and PDUs (power distribution units). Details on the cost of each component can be found here. For more detailed information see the Technical Specifications
● CPU compute node: approximately $3,300.
There are 16 physical CPU cores, 64 GB memory, and 1 TB local disk space for a CPU compute node.
● GPU compute node: approximately $10,200.
GPU compute nodes have the same components as a CPU node and 8 GPUs (see below for details).
● Storage: approximately $3,000 per 22 TB partition
● Power Supply: $735 ea.
● Infiniband Cabling, $50 $72 ea FDR InfiniBand QSFP Passive Copper Cable, (0.5 3.0 meters; ~13 needed per node).
● PDU (Power Distribution Units): approximately $530 (needed for GPU nodes).

Once the current capacity of the cluster is reached (as of May 2016, 72 of 144 nodes are occupied), other costs will become important for considering how to further expand the cluster. It is not known when the cluster capacity will be reached as this depends on the type of faculty hired and how many current faculty acquire funds to add to the cluster.

The cost of shared infrastructure (e.g., leaf switches, shared cabling) will be paid from the Dean’s contribution. Details on the cost of infrastructure components can be found here.

Support from the Dean/Division

Below is the standard text included in offer letters to new faculty hires in MPS:

MPS operates a centralized high performance compu ng (HPC) cluster to meet the HPC needs of new faculty members. Your access to and use of the MPS HPC cluster will be funded by the MPS dean’s office for the first five years of your appointment. We suggest you consult with Bill Broadley, the Lead HPC IT Architect, should you be interested in expanding your access or building more computer cluster capacity. This will ensure unnecessary duplica on and costs and allow us to determine what is needed to augment our college wide cluster (e.g. more nodes, more memory capacity, etc.).

Note, you will be asked to contribute to the sustainability of the cluster by reques ng computer services support in your research grants. These funds will be used to provide par al support the IT personnel maintaining the cluster. This in turn allows the Dean to provide funds for the shared infrastructure of the cluster.

Timeline: From Quote Request to Logging In

The typical timeline from requesting a quote to accessing your nodes on the cluster is about 10 weeks. To begin this process, email a request to cluster support

Duration (weeks)
Activity

2

Cluster support considers requests, gathers more information and makes quote request

1

Quote returned from potential vendor

1

Department creates purchase request and forwards to campus

1

Campus reviews purchase requests and sends purchase order to vendor

4

Vendor manufactures nodes, tests (burn in) and ships to campus 1 Installation and testing on campus

10

Total Duration (weeks)

 

To achieve this 10 week timeline, the PI will need to diligently track the progress of the purchase request. In addition, larger purchases or combined purchases across departments can take longer and need a caretaker to shepherd them through the processes. For example, these may require a bid process that can lead to substantial delays. As an example of how the process can move more slowly when a large purchase is made, please see the timeline for the original purchase of cluster, which points out some of the potential bottlenecks.

Requesting a Quote for Contributing to the Cluster

Email help@cse.ucdavis.edu with your needs. Contributors under $50k can contribute any existing node type. Larger contributions can have nodes customized for their research needs. When making a request please include the following information:
● Type of nodes (CPU or GPU)
● Number of nodes
● Storage requirements
● Expected time of purchase (e.g., is request for a proposal or to make a purchase now)

Useful Text for Proposals to External Granting Agencies

● The Condo model amortizes infrastructure (head node, console, PDUs, racks, and switches) over a larger number of nodes, reducing the per node cost.
● The Condo model allows users to budget for average workload instead of worst case workloads required for independent clusters.
● Administrative cost is most strongly correlated with number of clusters, thus shared clusters have lower administrative costs than individual clusters.
● The Condo model grows in step with demand. This avoids potential problems with overbuilding a cluster before demand warrants it.
● Individual clusters typically cycle between underutilization, when cycles are wasted and overutilization, when the bottleneck is available cycles. The Condo models help in both cases.

Facilities Description

Peloton is housed in the campus data center, which is a secure facility manned 24/7 by operators. This facility provides the power, cooling and communications infrastructure required to operate the cluster.  

The MPS cluster is maintained by IT staff with specialized expertise in high performance computing. The lead HPC IT Architect, Bill Broadley, has more than 20 years experience. The HPC IT team supports several clusters run by different colleges and other entities on the UC Davis campus. They insure the proper operation and security of the cluster, assist in the design, purchase and installation of the new nodes, assist in installing, troubleshooting, and optimizing software on the cluster. Requesting a Letter of Support from the Dean To request a letter of support for a grant proposal from the MPS Dean, please contact the Assistant Dean. The letter of support will:
●State current divisional support for the cluster (varies).
●State that as faculty in the MPS division you are allowed to contribute to the cluster.
●State that there is space to add the requested nodes to the cluster.