Key Default Type Description
job.autoscaler.backlog-processing.lag-threshold
5 min Duration Lag threshold which will prevent unnecessary scalings while removing the pending messages responsible for the lag.
job.autoscaler.catch-up.duration
30 min Duration The target duration for fully processing any backlog after a scaling operation. Set to 0 to disable backlog based scaling.
job.autoscaler.enabled
false Boolean Enable job autoscaler module.
job.autoscaler.excluded.periods
List<String> A (semicolon-separated) list of expressions indicate excluded periods during which autoscaling execution is forbidden, the expression consist of two optional subexpressions concatenated with &&, one is cron expression in Quartz format (6 or 7 positions), for example, * * 9-11,14-16 * * ? means exclude from 9:00:00am to 11:59:59am and from 2:00:00pm to 4:59:59pm every day, * * * ? * 2-6 means exclude every weekday, etc.see http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html for the usage of cron expression.Caution: in most case cron expression is enough, we introduce the other subexpression: daily expression, because cron can only represent integer hour period without minutes and seconds suffix, daily expression's formation is startTime-endTime, such as 9:30:30-10:50:20, when exclude from 9:30:30-10:50:20 in Monday and Thursday we can express it as 9:30:30-10:50:20 && * * * ? * 2,5
job.autoscaler.flink.rest-client.timeout
10 s Duration The timeout for waiting the flink rest client to return.
job.autoscaler.history.max.age
1 d Duration Maximum age for past scaling decisions to retain.
job.autoscaler.history.max.count
3 Integer Maximum number of past scaling decisions to retain per vertex.
job.autoscaler.memory.gc-pressure.threshold
1.0 Double Max allowed GC pressure (percentage spent garbage collecting) during scaling operations. Autoscaling will be paused if the GC pressure exceeds this limit.
job.autoscaler.memory.heap-usage.threshold
1.0 Double Max allowed percentage of heap usage during scaling operations. Autoscaling will be paused if the heap usage exceeds this threshold.
job.autoscaler.memory.tuning.enabled
false Boolean If enabled, the initial amount of memory specified for TaskManagers will be reduced/increased according to the observed needs.
job.autoscaler.memory.tuning.maximize-managed-memory
false Boolean If enabled and managed memory is used (e.g. RocksDB turned on), any reduction of heap, network, or metaspace memory will increase the managed memory.
job.autoscaler.memory.tuning.overhead
0.2 Double Overhead to add to tuning decisions (0-1). This ensures spare capacity and allows the memory to grow beyond the dynamically computed limits, but never beyond the original memory limits.
job.autoscaler.memory.tuning.scale-down-compensation.enabled
true Boolean If this option is enabled and memory tuning is enabled, TaskManager memory will be increased when scaling down. This ensures that after applying memory tuning there is sufficient memory when running with fewer TaskManagers.
job.autoscaler.metrics.busy-time.aggregator
MAX

Enum

Metric aggregator to use for busyTime metrics. This affects how true processing/output rate will be computed. Using max allows us to handle jobs with data skew more robustly, while avg may provide better stability when we know that the load distribution is even.

Possible values:
  • "AVG"
  • "MAX"
  • "MIN"
job.autoscaler.metrics.window
15 min Duration Scaling metrics aggregation window size.
job.autoscaler.observed-scalability.coefficient-min
0.5 Double Minimum allowed value for the observed scalability coefficient. Prevents aggressive scaling by clamping low coefficient estimates. If the estimated coefficient falls below this value, it is capped at the configured minimum.
job.autoscaler.observed-scalability.enabled
false Boolean Enables the use of an observed scalability coefficient when computing target parallelism. If enabled, the system will estimate the scalability coefficient based on historical scaling data instead of assuming perfect linear scaling. This helps account for real-world inefficiencies such as network overhead and coordination costs.
job.autoscaler.observed-scalability.min-observations
3 Integer Defines the minimum number of historical scaling observations required to estimate the scalability coefficient. If the number of available observations is below this threshold, the system falls back to assuming linear scaling. Note: To effectively use a higher minimum observation count, you need to increase job.autoscaler.history.max.count. Avoid setting job.autoscaler.history.max.count to a very high value, as the number of retained data points is limited by the size of the state store—particularly when using Kubernetes-based state store.
job.autoscaler.observed-true-processing-rate.lag-threshold
30 s Duration Lag threshold for enabling observed true processing rate measurements.
job.autoscaler.observed-true-processing-rate.min-observations
2 Integer Minimum nr of observations used when estimating / switching to observed true processing rate.
job.autoscaler.observed-true-processing-rate.switch-threshold
0.15 Double Percentage threshold for switching to observed from busy time based true processing rate if the measurement is off by at least the configured fraction. For example 0.15 means we switch to observed if the busy time based computation is at least 15% higher during catchup.
job.autoscaler.quota.cpu
(none) Double Quota of the CPU count. When scaling would go beyond this number the the scaling is not going to happen.
job.autoscaler.quota.memory
(none) MemorySize Quota of the memory size. When scaling would go beyond this number the the scaling is not going to happen.
job.autoscaler.restart.time
5 min Duration Expected restart time to be used until the operator can determine it reliably from history.
job.autoscaler.restart.time-tracking.enabled
false Boolean Whether to use the actual observed rescaling restart times instead of the fixed 'job.autoscaler.restart.time' configuration. If set to true, the maximum restart duration over a number of samples will be used. The value of 'job.autoscaler.restart.time-tracking.limit' will act as an upper bound, and the value of 'job.autoscaler.restart.time' will still be used when there are no rescale samples.
job.autoscaler.restart.time-tracking.limit
15 min Duration Maximum cap for the observed restart time when 'job.autoscaler.restart.time-tracking.enabled' is set to true.
job.autoscaler.scale-down.interval
1 h Duration The delay time for scale down to be executed. If it is greater than 0, the scale down will be delayed. Delayed rescale can merge multiple scale downs within `scale-down.interval` into a scale down, thereby reducing the number of rescales. Reducing the frequency of job restarts can improve job availability. Scale down can be executed directly if it's less than or equal 0.
job.autoscaler.scale-down.max-factor
0.6 Double Max scale down factor. 1 means no limit on scale down, 0.6 means job can only be scaled down with 60% of the original parallelism.
job.autoscaler.scale-up.max-factor
100000.0 Double Max scale up factor. 2.0 means job can only be scaled up with 200% of the current parallelism.
job.autoscaler.scaling.effectiveness.detection.enabled
false Boolean Whether to enable detection of ineffective scaling operations and allowing the autoscaler to block further scale ups.
job.autoscaler.scaling.effectiveness.threshold
0.1 Double Processing rate increase threshold for detecting ineffective scaling threshold. 0.1 means if we do not accomplish at least 10% of the desired capacity increase with scaling, the action is marked ineffective.
job.autoscaler.scaling.enabled
true Boolean Enable vertex scaling execution by the autoscaler. If disabled, the autoscaler will only collect metrics and evaluate the suggested parallelism for each vertex but will not upgrade the jobs.
job.autoscaler.scaling.event.interval
30 min Duration Time interval to resend the identical event
job.autoscaler.scaling.key-group.partitions.adjust.mode
EVENLY_SPREAD

Enum

How to adjust the parallelism of Source vertex or upstream shuffle is keyBy

Possible values:
  • "EVENLY_SPREAD": This mode ensures that the parallelism adjustment attempts to evenly distribute data across subtasks. It is particularly effective for source vertices that are aware of partition counts or vertices after 'keyBy' operation. The goal is to have the number of key groups or partitions be divisible by the set parallelism, ensuring even data distribution and reducing data skew.
  • "MAXIMIZE_UTILISATION": This model is to maximize resource utilization. In this mode, an attempt is made to set the parallelism that meets the current consumption rate requirements. It is not enforced that the number of key groups or partitions is divisible by the parallelism.
job.autoscaler.stabilization.interval
5 min Duration Stabilization period in which no new scaling will be executed
job.autoscaler.utilization.max
(none) Double Max vertex utilization
job.autoscaler.utilization.min
(none) Double Min vertex utilization
job.autoscaler.utilization.target
0.7 Double Target vertex utilization
job.autoscaler.vertex.exclude.ids
List<String> A (semicolon-separated) list of vertex ids in hexstring for which to disable scaling. Caution: For non-sink vertices this will still scale their downstream operators until https://issues.apache.org/jira/browse/FLINK-31215 is implemented.
job.autoscaler.vertex.max-parallelism
200 Integer The maximum parallelism the autoscaler can use. Note that this limit will be ignored if it is higher than the max parallelism configured in the Flink config or directly on each operator.
job.autoscaler.vertex.min-parallelism
1 Integer The minimum parallelism the autoscaler can use.