Skip to main content
Instance groups provision a fixed number of identical EC2 instances for each node role. They offer a simpler configuration model than instance fleets and are a good choice when your workload requires predictable, steady-state capacity.
Instance groups and instance fleets are mutually exclusive. Setting any *_instance_group variable will conflict with the corresponding *_instance_fleet variable on the same cluster.
Instance groups only support a single subnet and Availability Zone. Use ec2_attributes.subnet_id (not subnet_ids) when configuring an instance-group cluster.

Group types

The module exposes three group variables, one per node role:
VariableNode roleResource
master_instance_groupMasterInline in aws_emr_cluster
core_instance_groupCoreInline in aws_emr_cluster
task_instance_groupTaskSeparate aws_emr_instance_group resource
The master group controls the primary node that coordinates the cluster.
master_instance_group.instance_type
string
required
EC2 instance type for the master node, for example "m5.xlarge".
master_instance_group.instance_count
number
Number of master instances to launch. Use 3 for high-availability master configurations.
master_instance_group.name
string
Display name for the group.
master_instance_group.bid_price
string
Maximum Spot price in USD per instance-hour. When set, EMR launches master nodes as Spot instances.
master_instance_group.ebs_config
list(object)
EBS volumes to attach. See EBS config for nested attributes.
master_instance_group = {
  name           = "master-group"
  instance_count = 1
  instance_type  = "m5.xlarge"
}
The core group stores data in HDFS and runs compute tasks.
core_instance_group.instance_type
string
required
EC2 instance type for core nodes.
core_instance_group.instance_count
number
Number of core instances to launch.
core_instance_group.name
string
Display name for the group.
core_instance_group.bid_price
string
Maximum Spot price in USD per instance-hour.
core_instance_group.autoscaling_policy
string
JSON string containing an EMR autoscaling policy document. When provided, EMR uses this policy to automatically scale the core group.
core_instance_group.ebs_config
list(object)
EBS volumes to attach. See EBS config for nested attributes.
core_instance_group = {
  name           = "core-group"
  instance_count = 2
  instance_type  = "c4.large"
}
The task group adds compute-only capacity. Task nodes do not store HDFS data.
task_instance_group.instance_type
string
required
EC2 instance type for task nodes.
task_instance_group.instance_count
number
Number of task instances to launch.
task_instance_group.name
string
Display name for the group.
task_instance_group.bid_price
string
Maximum Spot price in USD per instance-hour. Task nodes are good candidates for Spot because they hold no HDFS data.
task_instance_group.autoscaling_policy
string
JSON string containing an EMR autoscaling policy document.
task_instance_group.configurations_json
string
JSON string for per-group application configuration overrides.
task_instance_group.ebs_config
list(object)
EBS volumes to attach. See EBS config for nested attributes.
task_instance_group.ebs_optimized
bool
default:"true"
Whether EBS optimization is enabled for the instance type. Defaults to true.
task_instance_group = {
  name           = "task-group"
  instance_count = 2
  instance_type  = "c5.xlarge"
  bid_price      = "0.1"

  ebs_config = [{
    size                 = 256
    type                 = "gp3"
    volumes_per_instance = 1
  }]
  ebs_optimized = true
}

EBS config

All three group types accept an ebs_config list to attach additional EBS volumes to each instance.
ebs_config[*].size
number
default:"256"
Volume size in GiB.
ebs_config[*].type
string
default:"gp3"
EBS volume type, for example "gp3" or "io2".
ebs_config[*].iops
number
Provisioned IOPS. Only valid for io1 and io2 volume types.
ebs_config[*].throughput
number
Throughput in MiB/s. Only valid for gp3 volumes.
ebs_config[*].volumes_per_instance
number
Number of volumes of this configuration to attach per instance.

Complete example

The following example creates a private cluster with all three group types configured:
module "emr" {
  source = "terraform-aws-modules/emr/aws"

  name = "example-instance-group"

  release_label = "emr-7.9.0"
  applications  = ["spark", "trino"]
  auto_termination_policy = {
    idle_timeout = 3600
  }

  master_instance_group = {
    name           = "master-group"
    instance_count = 1
    instance_type  = "m5.xlarge"
  }

  core_instance_group = {
    name           = "core-group"
    instance_count = 2
    instance_type  = "c4.large"
  }

  task_instance_group = {
    name           = "task-group"
    instance_count = 2
    instance_type  = "c5.xlarge"
    bid_price      = "0.1"

    ebs_config = [{
      size                 = 256
      type                 = "gp3"
      volumes_per_instance = 1
    }]
    ebs_optimized = true
  }

  ebs_root_volume_size = 64
  ec2_attributes = {
    # Instance groups only support one Subnet/AZ
    # Subnets should be private subnets and tagged with
    # { "for-use-with-amazon-emr-managed-policies" = true }
    subnet_id = "subnet-abcde012"
  }
  vpc_id = "vpc-1234556abcdef"

  scale_down_behavior    = "TERMINATE_AT_TASK_COMPLETION"
  step_concurrency_level = 3
  termination_protection = false
  visible_to_all_users   = true

  tags = {
    Terraform   = "true"
    Environment = "dev"
  }
}