Slurm cluster

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained.

As a cluster workload manager, Slurm has three key functions:

  • Allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.
  • Provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.
  • Arbitrates contention for resources by managing a queue of pending work.