Skip to content

Commit aed6c23

Browse files
committed
Update executor retry config docs (#3001)
Signed-off-by: Paolo Di Tommaso <[email protected]>
1 parent b11d0f1 commit aed6c23

File tree

2 files changed

+19
-15
lines changed

2 files changed

+19
-15
lines changed

docs/azure.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,4 +382,8 @@ azure.batch.pools.<name>.runAs Specify the username under which
382382
azure.registry.server Specify the container registry from which to pull the Docker images (default: ``docker.io``, requires ``[email protected]``).
383383
azure.registry.userName Specify the username to connect to a private container registry (requires ``[email protected]``).
384384
azure.registry.password Specify the password to connect to a private container registry (requires ``[email protected]``).
385+
azure.retryPolicy.delay Delay when retrying failed API requests (default: ``500ms``).
386+
azure.retryPolicy.maxDelay Max delay when retrying failed API requests (default: ``60s``).
387+
azure.retryPolicy.jitter Jitter value when retrying failed API requests (default: ``0.25``).
388+
azure.retryPolicy.maxAttempts Max attempts when retrying failed API requests (default: ``10``).
385389
============================================== =================

docs/config.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -342,24 +342,24 @@ The ``executor`` configuration scope allows you to set the optional executor set
342342
===================== =====================
343343
Name Description
344344
===================== =====================
345-
name The name of the executor to be used e.g. ``local``, ``sge``, etc.
345+
name The name of the executor to be used (default: ``local``).
346346
queueSize The number of tasks the executor will handle in a parallel manner (default: ``100``).
347-
pollInterval Determines how often a poll occurs to check for a process termination.
348-
dumpInterval Determines how often the executor status is written in the application log file (default: ``5min``).
349-
queueStatInterval Determines how often the queue status is fetched from the cluster system. This setting is used only by grid executors (default: ``1min``).
350-
exitReadTimeout Determines how long the executor waits before to an error status when a process is terminated but the ``.exitcode`` file does not exist or is empty. This setting is used only by grid executors (default: ``270 sec``).
351-
killBatchSize Determines the number of jobs that can be `killed` in a single command execution (default: ``100``).
352-
submitRateLimit Determines the max rate of job submission per time unit, for example ``'10sec'`` eg. max 10 jobs per second or ``'50/2min'`` i.e. 50 job submissions every 2 minutes (default: `unlimited`).
347+
submitRateLimit Determines the max rate of job submission per time unit, for example ``'10sec'`` (10 jobs per second) or ``'50/2min'`` (50 jobs every 2 minutes) (default: unlimited).
348+
pollInterval Determines how often to check for process termination. Default varies for each executor.
349+
dumpInterval Determines how often to log the executor status (default: ``5min``).
350+
queueStatInterval Determines how often to fetch the queue status from the scheduler (default: ``1min``). Used only by grid executors.
351+
exitReadTimeout Determines how long to wait before returning an error status when a process is terminated but the ``.exitcode`` file does not exist or is empty (default: ``270 sec``). Used only by grid executors.
352+
killBatchSize Determines the number of jobs that can be killed in a single command execution (default: ``100``).
353353
perJobMemLimit Specifies Platform LSF *per-job* memory limit mode. See :ref:`lsf-executor`.
354354
perTaskReserve Specifies Platform LSF *per-task* memory reserve mode. See :ref:`lsf-executor`.
355-
jobName Determines the name of jobs submitted to the underlying cluster executor e.g. ``executor.jobName = { "$task.name - $task.hash" }`` Note: when using this option you need to make sure the resulting job name matches the validation constraints of the underlying batch scheduler.
356-
cpus The maximum number of CPUs made available by the underlying system (only used by the ``local`` executor).
357-
memory The maximum amount of memory made available by the underlying system (only used by the ``local`` executor).
358-
retry.delay Delay when re-retying failed submit operations (default: ``500ms``, only used by grid based executors e.g. ``slurm``, requires version ``22.03.0-edge`` or later).
359-
retry.maxDelay Max delay when re-retying failed submit operations (default: ``30s``, only used by grid based executors e.g. ``slurm``, requires version ``22.03.0-edge`` or later).
360-
retry.jitter Jitter value when re-retying failed submit operations (default: ``0.25``, only used by grid based executors e.g. ``slurm``, requires version ``22.03.0-edge`` or later)
361-
retry.maxAttempts Max attempts when re-retying failed submit operations (default: ``3``, only used by grid based executors e.g. ``slurm``, requires version ``22.03.0-edge`` or later)
362-
retry.reason Regex pattern that when verified cause a failed submit operation to be re-tried (default: ``Socket timed out``, only used by grid based executors e.g. ``slurm``, requires version ``22.03.0-edge`` or later)
355+
jobName Determines the name of jobs submitted to the underlying cluster executor e.g. ``executor.jobName = { "$task.name - $task.hash" }``. Make sure the resulting job name matches the validation constraints of the underlying batch scheduler.
356+
cpus The maximum number of CPUs made available by the underlying system. Used only by the ``local`` executor.
357+
memory The maximum amount of memory made available by the underlying system. Used only by the ``local`` executor.
358+
retry.delay Delay when retrying failed job submissions (default: ``500ms``). NOTE: used only by grid executors (requires ``22.03.0-edge`` or later).
359+
retry.maxDelay Max delay when retrying failed job submissions (default: ``30s``). NOTE: used only by grid executors (requires ``22.03.0-edge`` or later).
360+
retry.jitter Jitter value when retrying failed job submissions (default: ``0.25``). NOTE: used only by grid executors (requires ``22.03.0-edge`` or later).
361+
retry.maxAttempts Max attempts when retrying failed job submissions (default: ``3``). NOTE: used only by grid executors (requires ``22.03.0-edge`` or later).
362+
retry.reason Regex pattern that when verified cause a failed submit operation to be re-tried (default: ``Socket timed out``). NOTE: used only by grid executors (requires ``22.03.0-edge`` or later).
363363
===================== =====================
364364

365365
The executor settings can be defined as shown below::

0 commit comments

Comments
 (0)