Standard Operations
The cluster is operated using a standard resource sharing model, often referred to as the Condo Model, and the Fareshare Queuing Algorithm [add link to webpage describing this] to determine user priority for shared resources. The assignment of jobs also depends on the queue that is chosen.
CondoModel: The Condo Model characterizes users by their contribution to the cluster. For the MPS cluster there are 3 tiers of users:
Tier 1: Faculty who contribute large amounts (funds for greater than about 10 CPU or 4 GPU nodes and nonshared infrastructure, e.g., cables for fast networking) will be considered Major Users. Disk storage will need to be purchased separately as per need. A unit of 22 TB usable disk space will cost about $3.0k. These will typically be new faculty with startup funds or existing faculty who are awarded major grants for computing.
Tier 2: This tier will require a minimum contribution of funds for 4 CPU or 1 GPU nodes and nonshared infrastructure (e.g., cables for fast networking). Disk storage will need to be purchased separately as per need. A unit of 22 TB usable disk space will cost about $3.0k. A Tier2 user can be one faculty member, a group of faculty members, or an entire department within MPS.
Tier 3: MPS Faculty who do not contribute financially will be Affiliate users, and their use of the MPS Cluster will be based on funds contributed by the Dean. At the start of the cluster, the Dean is providing 4 nodes and 34 TB of usable disk storage. A Tier3 user, who is not contributing any nodes, can still purchase personal disk storage as described above (minimum 22 TB unit). Starting out as an Affiliate user will allow faculty and researchers to test their code in the cluster computing environment. Based on their experience and availability of resources, they may consider becoming a Tier1 or Tier2 user.
All users will have access to unused computing cycles in the cluster. This is one of the main reasons that the MPS cluster will be attractive to faculty. Tier 1 and 2 users will be guaranteed a minimum usage (their contributed share, see below) plus access to additional unused cycles. Affiliate users will have access to the nodes made available by the Dean plus access to additional unused cycles.
Tier 1 and 2 users are guaranteed access to their nodes in less than 1 minute. If other users are running jobs on resources belonging to, and requested by, a Tier 1 or 2 user then the running job is suspended (high/medium queue) or killed (low queue see below) until these nodes become available again.
Determining Priority for Shared Resources
The Fareshare Queuing Algorithm is a standard method for allocating shared resources based on previous usage. This algorithm considers both how many cycles are being used and the time since those resources were used. The effect of this algorithm is such that if you use a lot of resources on day 1, your priority will be lower in day 2, but will then increase again with time. Your access to shared nodes in low or medium queue (see below) therefore depends on your usage and the time since using the resources. Similarly, priority to group members all submitting jobs using the high queue, will depend on each individual’s priority according to the queueing algorithm.
Job Queues
Jobs are submitted through queues , which determine the priority for that job to access a tier1 or tier2 and how a job on shared resources is treated when those resources are requested by the resource owner.
● High priority: only have access to your nodes and are guaranteed access to those nodes in 1 minute.
● Medium priority: have access to both your nodes and shared nodes (or only shared nodes for affiliate users). A job submitted to the queue and using shared resources will be suspended until these nodes are available again. RAM will be swapped to keep these runs in memory while another job is running, but this will not significantly affect run times. Good choice for fewer, longer jobs that you do not want to resubmit.
● Low Priority: these jobs will be killed and returned to the queue if these resources are requested by a high priority queue. However, the job will be restarted as soon as an appropriate number of nodes are available. Good choice for lots of short jobs.
Scheduled Downtimes
The cluster has two scheduled downtimes each year, corresponding to the down times that the Data Center has for scheduled maintenance. These downtimes typically occur in April and October of each year and lasts 1 to 3 days depending on how much work needs to be done. The exact schedule is emailed to all users before the downtime.
Addition of New Nodes to the Cluster
We will schedule large upgrades/additions to the cluster to occur during the scheduled down times in order to minimize impact on cluster users. However, smaller additions (e.g., single addition of compute node) can be done at other times, if they can be done without affecting other cluster users.