Slurm

Slurm is a free, open source workload manager originally designed for the demanding requirements of national laboratories. It’s functionality, scalability, performance and modular architecture have resulted in widespread adoption, with active development occurring at numerous organizations.

Simple configurations can be installed in only a few minutes, while use of optional plugins can support sophisticated scheduling and reporting requirements with easy extensibility. Slurm currently manages the workload on many of the world’s most powerful computers.

Scalability

Slurm was designed to manage heterogeneous clusters with up to millions of processors. It is the workload manager on Lawrence Livermore National Laboratory’s Sequoia supercomputer with 98,304 compute nodes, and has managed emulated systems over 20 times larger. Daemons and commands are extensively multi-threaded. Fault-tolerant, hierarchical communications are used for high performance.

Performance

Architectural changes in Slurm version 2.5 have dramatically increased performance. Up to 1000 job submissions per second and throughput up to 600 jobs per second are possible.

Scheduling policies

Slurm supports sophisticated and highly flexible scheduling policies including advanced reservations. A highly configurable Quality of Service (QoS) mechanism is available to satisfy Service Level Agreements (SLAs), for example to preempt lower priority jobs on demand. Resource limits can be applied to hierarchical accounts down to the level of individual users.

Multi-factor prioritization of jobs

Slurm considers many configurable factors when determining a job’s priority.  When used with native accounting, Slurm can also utilize hierarchical fair-share as a contributing factor.  Coordinators can be delegated to manage administration of their sub-trees and allotted limits in the hierarchy to lessen the load on system administrators.

Topology optimizations

Resource allocations are optimized with respect to the topology on a node (NUMA, sockets, cores and threads) with task binding. Resource allocations spanning multiple nodes will also be optimized with respect to the network topology between nodes. For example, the number of leaf switches used can be minimized or an allocation’s locality can be optimized with respect to a 3-dimensional interconnect.

Consumable resources

Generic consumable resources can be managed, including GPUs.

Size and time ranges for jobs

Job size and time limits can be specified as a range. Jobs may be granted less than the maximum size and/or time specification if doing so results in earlier job initiation.

Resizable jobs

Jobs can grow and shrink on demand.

Flexibility

Simple configurations can be operational in a couple of minutes, while use of optional plugins provide all of the functionality required at major HPC sites. These plugins support a wide variety of architectures and configurations with easy extensibility. For example, plugins for IBM Blue Gene and Cray systems provide a system-specific interface to those systems in one place while preserving a common code kernel. New plugins have been written by customers for site-specific requirements, for example to optimize use of green energy.

Multi-cluster support

Slurm can be configured to operate across multiple clusters at a site. Workload information is available and jobs can be submitted across clusters.

Graphical User Interface

Slurm’s sview tool provides a rich topology-aware system interface.

Event triggers

Arbitrary scripts can be executed when events occur. For example, system administrators can configure a program to notify them when nodes fail.

Power management

Jobs can specify their desired CPU frequency, and power use by job is recorded. Idle resources can be powered down until needed to reduce energy consumption and cost.

Usage accounting

Accounting records for jobs and job steps plus node events can be recorded in a database with various tools available to generated accounting reports. These accounting records can be used to control scheduling to impose usage limits or dynamic fair share policy.

Status running jobs

Slurm is able to status jobs as they run to gather information at the level of individual tasks and help identify load imbalance or other anomalies.

Web-based configuration tools

Build slurm.conf configuration files with a web interface.

Teaching tool

Because Slurm is open source and very modular, it can be used to test different scheduling algorithms or other constructs related to workload management without requiring the student to deal with the other components of a workload manager.

No single point of failure

Each of the Slurm daemons has an optional backup to increase reliability and decrease down time.

Free and open source

Slurm is available under the GNU General Public License v2.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s