Scaling

The module provides several independent controls for cluster scaling and lifecycle management. You can use managed scaling to automatically right-size capacity, auto-termination to shut down idle clusters, and step concurrency to parallelize workloads.

Managed scaling policy

EMR Managed Scaling automatically adjusts the number of instances in your cluster based on workload demand. Set the managed_scaling_policy variable to enable it.

managed_scaling_policy

object

When set, the module creates an aws_emr_managed_scaling_policy resource attached to the cluster.

managed_scaling_policy.minimum_capacity_units

number

required

Minimum number of capacity units the cluster can scale down to. The unit is determined by unit_type.

managed_scaling_policy.maximum_capacity_units

number

required

Maximum total capacity units the cluster can scale up to.

managed_scaling_policy.maximum_core_capacity_units

number

Maximum capacity units allowed for core nodes only. Must be less than or equal to maximum_capacity_units.

managed_scaling_policy.maximum_ondemand_capacity_units

number

Maximum capacity units that can be On-Demand instances. Useful for capping spend while allowing Spot to fill the rest.

managed_scaling_policy.unit_type

string

required

Unit of capacity measurement. Valid values:

"Instances" — each EC2 instance counts as one unit.
"VCPU" — capacity is measured in vCPUs.
"InstanceFleetUnits" — capacity is measured in instance fleet weighted units.

managed_scaling_policy.scaling_strategy

string

Strategy for scaling out. Controls how EMR prioritizes adding On-Demand vs Spot instances when scaling up.

managed_scaling_policy.utilization_performance_index

number

Target utilization percentage (0–100) that EMR uses when computing scale-out decisions. A lower value causes more aggressive scaling.

Example

module "emr" {
  source = "terraform-aws-modules/emr/aws"

  # ...

  managed_scaling_policy = {
    minimum_capacity_units          = 2
    maximum_capacity_units          = 20
    maximum_core_capacity_units     = 10
    maximum_ondemand_capacity_units = 5
    unit_type                       = "Instances"
    scaling_strategy                = "DEFAULT"
    utilization_performance_index   = 80
  }
}

Auto-termination policy

Auto-termination shuts the cluster down after it has been idle for a configurable period. This is useful for transient clusters that process a batch of work and then sit idle.

auto_termination_policy

object

When set, the module configures an auto-termination policy on the cluster.

auto_termination_policy.idle_timeout

number

Number of seconds the cluster can be idle before EMR terminates it automatically. The minimum is 60.

auto_termination_policy = {
  idle_timeout = 3600 # terminate after 1 hour of inactivity
}

Scale-down behavior

scale_down_behavior

string

default:"TERMINATE_AT_TASK_COMPLETION"

Controls how EC2 instances are terminated during a scale-in event or instance group resize.

"TERMINATE_AT_TASK_COMPLETION" — waits for in-progress tasks to complete before terminating the instance. This reduces job failures at the cost of slower scale-in.
"TERMINATE_AT_INSTANCE_HOUR" — terminates instances at the next full instance-hour boundary, which can reduce EC2 costs for On-Demand instances.

scale_down_behavior = "TERMINATE_AT_TASK_COMPLETION"

Step concurrency

step_concurrency_level

number

Number of steps that EMR executes concurrently. Valid range: 1–256. Defaults to 1 when not set.Requires EMR release 5.28.0 or later.

Increasing step_concurrency_level lets you submit multiple jobs to a single cluster and have them run in parallel, reducing the overhead of provisioning separate clusters per job.

step_concurrency_level = 3

Cluster lifecycle flags

keep_job_flow_alive_when_no_steps

bool

When true (the EMR default), the cluster stays running after all steps complete. Set to false to automatically terminate the cluster when the step queue is empty.

unhealthy_node_replacement

bool

default:"true"

When true, EMR gracefully replaces core nodes that have degraded within the cluster rather than leaving them in a degraded state.

keep_job_flow_alive_when_no_steps = true
unhealthy_node_replacement        = true

Get Started

Cluster Types

Configuration

Examples

Managed scaling policy

Example

Auto-termination policy

Scale-down behavior

Step concurrency

Cluster lifecycle flags

Get Started

Cluster Types

Configuration

Examples

​Managed scaling policy

​Example

​Auto-termination policy

​Scale-down behavior

​Step concurrency

​Cluster lifecycle flags

Managed scaling policy

Example

Auto-termination policy

Scale-down behavior

Step concurrency

Cluster lifecycle flags