Instance groups

Instance groups provision a fixed number of identical EC2 instances for each node role. They offer a simpler configuration model than instance fleets and are a good choice when your workload requires predictable, steady-state capacity.

Instance groups and instance fleets are mutually exclusive. Setting any *_instance_group variable will conflict with the corresponding *_instance_fleet variable on the same cluster.

Instance groups only support a single subnet and Availability Zone. Use ec2_attributes.subnet_id (not subnet_ids) when configuring an instance-group cluster.

Group types

The module exposes three group variables, one per node role:

Variable	Node role	Resource
`master_instance_group`	Master	Inline in `aws_emr_cluster`
`core_instance_group`	Core	Inline in `aws_emr_cluster`
`task_instance_group`	Task	Separate `aws_emr_instance_group` resource

master_instance_group

The master group controls the primary node that coordinates the cluster.

master_instance_group.instance_type

string

required

EC2 instance type for the master node, for example "m5.xlarge".

master_instance_group.instance_count

number

Number of master instances to launch. Use 3 for high-availability master configurations.

master_instance_group.name

string

Display name for the group.

master_instance_group.bid_price

string

Maximum Spot price in USD per instance-hour. When set, EMR launches master nodes as Spot instances.

master_instance_group.ebs_config

list(object)

EBS volumes to attach. See EBS config for nested attributes.

master_instance_group = {
  name           = "master-group"
  instance_count = 1
  instance_type  = "m5.xlarge"
}

core_instance_group

The core group stores data in HDFS and runs compute tasks.

core_instance_group.instance_type

string

required

EC2 instance type for core nodes.

core_instance_group.instance_count

number

Number of core instances to launch.

core_instance_group.name

string

Display name for the group.

core_instance_group.bid_price

string

Maximum Spot price in USD per instance-hour.

core_instance_group.autoscaling_policy

string

JSON string containing an EMR autoscaling policy document. When provided, EMR uses this policy to automatically scale the core group.

core_instance_group.ebs_config

list(object)

EBS volumes to attach. See EBS config for nested attributes.

core_instance_group = {
  name           = "core-group"
  instance_count = 2
  instance_type  = "c4.large"
}

task_instance_group

The task group adds compute-only capacity. Task nodes do not store HDFS data.

task_instance_group.instance_type

string

required

EC2 instance type for task nodes.

task_instance_group.instance_count

number

Number of task instances to launch.

task_instance_group.name

string

Display name for the group.

task_instance_group.bid_price

string

Maximum Spot price in USD per instance-hour. Task nodes are good candidates for Spot because they hold no HDFS data.

task_instance_group.autoscaling_policy

string

JSON string containing an EMR autoscaling policy document.

task_instance_group.configurations_json

string

JSON string for per-group application configuration overrides.

task_instance_group.ebs_config

list(object)

EBS volumes to attach. See EBS config for nested attributes.

task_instance_group.ebs_optimized

bool

default:"true"

Whether EBS optimization is enabled for the instance type. Defaults to true.

task_instance_group = {
  name           = "task-group"
  instance_count = 2
  instance_type  = "c5.xlarge"
  bid_price      = "0.1"

  ebs_config = [{
    size                 = 256
    type                 = "gp3"
    volumes_per_instance = 1
  }]
  ebs_optimized = true
}

EBS config

All three group types accept an ebs_config list to attach additional EBS volumes to each instance.

ebs_config[*].size

number

default:"256"

Volume size in GiB.

ebs_config[*].type

string

default:"gp3"

EBS volume type, for example "gp3" or "io2".

ebs_config[*].iops

number

Provisioned IOPS. Only valid for io1 and io2 volume types.

ebs_config[*].throughput

number

Throughput in MiB/s. Only valid for gp3 volumes.

ebs_config[*].volumes_per_instance

number

Number of volumes of this configuration to attach per instance.

Complete example

The following example creates a private cluster with all three group types configured:

module "emr" {
  source = "terraform-aws-modules/emr/aws"

  name = "example-instance-group"

  release_label = "emr-7.9.0"
  applications  = ["spark", "trino"]
  auto_termination_policy = {
    idle_timeout = 3600
  }

  master_instance_group = {
    name           = "master-group"
    instance_count = 1
    instance_type  = "m5.xlarge"
  }

  core_instance_group = {
    name           = "core-group"
    instance_count = 2
    instance_type  = "c4.large"
  }

  task_instance_group = {
    name           = "task-group"
    instance_count = 2
    instance_type  = "c5.xlarge"
    bid_price      = "0.1"

    ebs_config = [{
      size                 = 256
      type                 = "gp3"
      volumes_per_instance = 1
    }]
    ebs_optimized = true
  }

  ebs_root_volume_size = 64
  ec2_attributes = {
    # Instance groups only support one Subnet/AZ
    # Subnets should be private subnets and tagged with
    # { "for-use-with-amazon-emr-managed-policies" = true }
    subnet_id = "subnet-abcde012"
  }
  vpc_id = "vpc-1234556abcdef"

  scale_down_behavior    = "TERMINATE_AT_TASK_COMPLETION"
  step_concurrency_level = 3
  termination_protection = false
  visible_to_all_users   = true

  tags = {
    Terraform   = "true"
    Environment = "dev"
  }
}

Get Started

Cluster Types

Configuration

Examples

Group types

EBS config

Complete example

Get Started

Cluster Types

Configuration

Examples

​Group types

​EBS config

​Complete example

Group types

EBS config

Complete example