Skip to main content
An EMR virtual cluster maps an EMR environment to a Kubernetes namespace on an existing Amazon EKS cluster. Jobs submitted to the virtual cluster run as pods inside that namespace, letting you share the same EKS infrastructure between EMR workloads and other Kubernetes applications. Use the terraform-aws-modules/emr/aws//modules/virtual-cluster submodule to create a virtual cluster.
You need an existing EKS cluster before creating a virtual cluster. The module also requires the OIDC provider ARN for the cluster, which is used to create the job execution IAM role with the correct trust policy. The Kubernetes provider in your Terraform configuration must be pointed at the EKS cluster.

How it works

The module creates:
  • An aws_emrcontainers_virtual_cluster resource bound to your EKS cluster and namespace
  • A Kubernetes namespace (optional, controlled by create_namespace)
  • A Kubernetes Role and RoleBinding granting EMR the permissions it needs in the namespace
  • A job execution IAM role with an OIDC trust policy scoped to the EKS cluster
  • An IAM policy granting S3 access for the buckets you specify
  • A CloudWatch log group for cluster logs

Key variables

VariableDescriptionDefault
eks_cluster_nameName of the existing EKS cluster""
eks_oidc_provider_arnOIDC provider ARN for the EKS cluster""
nameName of the EMR virtual cluster""
namespaceKubernetes namespace for EMR on EKS"emr-containers"
create_namespaceCreate the Kubernetes namespacetrue
create_iam_roleCreate the job execution IAM roletrue
create_kubernetes_roleCreate the Kubernetes Role and RoleBindingtrue
s3_bucket_arnsS3 bucket ARNs the job execution role can read/write[]
role_nameName to use for the IAM role and Kubernetes RBAC rolenull
iam_role_additional_policiesAdditional IAM policies to attach to the job execution role{}

Examples

This example creates a virtual cluster with explicit names and a dedicated namespace, and fully configures the job execution IAM role:
module "emr_virtual_cluster" {
  source = "terraform-aws-modules/emr/aws//modules/virtual-cluster"

  eks_cluster_name      = "example"
  eks_oidc_provider_arn = "arn:aws:iam::0123456789:oidc-provider/eks-example"

  name             = "emr-custom"
  create_namespace = true
  namespace        = "emr-custom"

  create_iam_role = true
  s3_bucket_arns = [
    "arn:aws:s3:::/my-elasticmapreduce-bucket",
    "arn:aws:s3:::/my-elasticmapreduce-bucket/*",
  ]
  role_name                     = "emr-custom-role"
  iam_role_use_name_prefix      = false
  iam_role_path                 = "/"
  iam_role_description          = "EMR custom Role"
  iam_role_permissions_boundary = null
  iam_role_additional_policies  = []

  tags = {
    Terraform   = "true"
    Environment = "dev"
  }
}

CloudWatch logging

The module creates a CloudWatch log group by default using the name pattern /emr-on-eks-logs/emr-workload/<NAMESPACE> and retains logs for 7 days. You can adjust retention, supply a KMS key for encryption, or disable creation entirely if you manage the log group externally:
VariableDescriptionDefault
create_cloudwatch_log_groupCreate the CloudWatch log grouptrue
cloudwatch_log_group_nameOverride the default log group namenull
cloudwatch_log_group_retention_in_daysLog retention period in days7
cloudwatch_log_group_kms_key_idKMS key ARN to encrypt the log groupnull
cloudwatch_log_group_classLog class: STANDARD or INFREQUENT_ACCESSnull