Your private subnets must be tagged with
"for-use-with-amazon-emr-managed-policies" = true for the managed IAM policy AmazonEMRServicePolicy_v2 to function correctly. See the EMR managed IAM policies documentation for details.VPC endpoint requirements
To avoid routing cluster traffic through a NAT gateway, create Interface VPC endpoints forelasticmapreduce and sts, and a Gateway endpoint for s3. The example below includes all three:
Configuration
Choose between instance fleets or instance groups depending on your workload requirements. Instance fleets let you mix instance types and combine On-Demand and Spot capacity. Instance groups give you a fixed instance type per node group.- Instance fleet
- Instance group
Instance fleets support multiple instance types per node group and mixed On-Demand and Spot capacity. The master fleet uses a single On-Demand
m5.xlarge. The core fleet mixes three instance types with a combination of On-Demand and Spot capacity, falling back to On-Demand if no Spot capacity is available within five minutes.Supporting resources
The complete working example atexamples/private-cluster/main.tf includes a VPC with private subnets and a NAT gateway, VPC endpoints for S3, EMR, and STS, and an encrypted S3 bucket for logs.
Security considerations
- Nodes have no public IP addresses, reducing the attack surface from the internet.
- Use VPC endpoints to keep traffic between EMR and S3/EMR APIs on the AWS network and avoid NAT gateway data transfer charges.
- The module creates a dedicated service access security group for private clusters, enabling EMR to communicate with cluster nodes over port 8443.
- Restrict the autoscaling role trust policy to your account and region using
aws:SourceAccountandaws:SourceArnconditions, as shown in the full example.