Job runtime¶

Runtime, signified by -t or --time in the job script, defines the maximum length of time a job is allow to run for. Jobs which exceed the requested runtime will be automatically killed by the scheduler with an exit state of TIMEOUT.

Queueing time depends mostly on tasks (cores) and RAM requested, not runtime. Since jobs exceeding the requested runtime will be killed, most users should request either:

1 hour (to use the computeshort and gpushort partitions), or
240 hours (10 days, the maximum allowed on all other partitions)

Requesting the maximum runtime helps prevent jobs from ending prematurely when they could have completed within the 10-day limit.

There are some edge cases that could mean a job requesting less than 10 days will get queued ahead of a 10-day job, but these usually relate to situations where we have reserved resources at a future date (e.g. maintenance periods).

The 240 hour limit is a global setting, and cannot be changed for individual jobs or users. If you are submitting long running jobs, you should consider:

Attempting to parallelise the job
Consider if the job can be broken into smaller parts
Profiling the code to check for bottlenecks
Implementing checkpointing (a method of regularly dumping the job's state so that it can be restarted - check if your application supports this)