Skip to main content
EMR Studio is a web-based IDE for running Spark, PySpark, and SparkSQL workloads via Jupyter notebooks. It supports two authentication modes: IAM Identity Center (SSO) and IAM. The studio submodule is at modules/studio. Source it as terraform-aws-modules/emr/aws//modules/studio.

VPC and subnet requirements

EMR Studio requires a VPC and at least one subnet. The studio itself is not deployed inside a VPC, but it uses the VPC to attach workspace and engine security groups for connectivity to EMR clusters and Spark applications. You must provide vpc_id and subnet_ids. For high availability, supply subnets across multiple availability zones.

Authentication

SSO mode integrates with AWS IAM Identity Center. You map IAM Identity Center groups or users to the studio via session_mappings. Each mapping grants the identity a specific EMR Studio session policy.This example looks up an existing IAM Identity Center group named AWSControlTowerAdmins and maps it to the studio:
data "aws_ssoadmin_instances" "this" {}

data "aws_identitystore_group" "this" {
  identity_store_id = one(data.aws_ssoadmin_instances.this.identity_store_ids)

  alternate_identifier {
    unique_attribute {
      attribute_path  = "DisplayName"
      attribute_value = "AWSControlTowerAdmins"
    }
  }
}

module "emr_studio_sso" {
  source = "terraform-aws-modules/emr/aws//modules/studio"

  name                = "example-sso"
  description         = "EMR Studio using SSO authentication"
  auth_mode           = "SSO"
  default_s3_location = "s3://${module.s3_bucket.s3_bucket_id}/example"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  session_mappings = {
    admin_group = {
      identity_type = "GROUP"
      identity_id   = data.aws_identitystore_group.this.group_id
    }
  }

  tags = local.tags
}
For a fully customized SSO studio with explicit service and user role configurations, security group rules, and S3 bucket scoping:
module "emr_studio_complete" {
  source = "terraform-aws-modules/emr/aws//modules/studio"

  name                = "example-complete"
  description         = "EMR Studio using SSO authentication"
  auth_mode           = "SSO"
  default_s3_location = "s3://${module.s3_bucket.s3_bucket_id}/complete"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  session_mappings = {
    admin_group = {
      identity_type = "GROUP"
      identity_id   = data.aws_identitystore_group.this.group_id
    }
  }

  # Service role
  service_role_name        = "example-complete-service"
  service_role_path        = "/complete/"
  service_role_description = "EMR Studio complete service role"
  service_role_tags        = { service = true }
  service_role_s3_bucket_arns = [
    module.s3_bucket.s3_bucket_arn,
    "${module.s3_bucket.s3_bucket_arn}/complete/*"
  ]

  # User role
  user_role_name        = "example-complete-user"
  user_role_path        = "/complete/"
  user_role_description = "EMR Studio complete user role"
  user_role_tags        = { user = true }
  user_role_s3_bucket_arns = [
    module.s3_bucket.s3_bucket_arn,
    "${module.s3_bucket.s3_bucket_arn}/complete/*"
  ]

  # Security groups
  security_group_name = "example-complete"
  security_group_tags = { complete = true }

  engine_security_group_description = "EMR Studio complete engine security group"
  engine_security_group_egress_rules = {
    example = {
      description = "Egress to VPC network"
      from_port   = 443
      to_port     = 443
      ip_protocol = "tcp"
      cidr_ipv4   = module.vpc.vpc_cidr_block
    }
  }

  workspace_security_group_description = "EMR Studio complete workspace security group"
  workspace_security_group_egress_rules = {
    example = {
      description = "Egress to internet"
      from_port   = 443
      to_port     = 443
      ip_protocol = "tcp"
      cidr_ipv4   = "0.0.0.0/0"
    }
  }

  tags = local.tags
}

Workspace storage

The default_s3_location sets the default S3 path where EMR Studio stores workspace files and notebook outputs. Provide a path that includes a prefix, for example "s3://my-bucket/studio-workspaces". The S3 bucket should block public access and use server-side encryption. The full example creates a suitable bucket:
module "s3_bucket" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "~> 5.0"

  bucket_prefix = "${local.name}-"
  force_destroy = true

  attach_deny_insecure_transport_policy = true
  attach_require_latest_tls_policy      = true

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true

  server_side_encryption_configuration = {
    rule = {
      apply_server_side_encryption_by_default = {
        sse_algorithm = "AES256"
      }
    }
  }
}

Session mappings

Session mappings control which IAM Identity Center users or groups can access the studio and what permissions they have. Each mapping entry specifies an identity_type (USER or GROUP) and the corresponding identity_id from the IAM Identity Center identity store:
session_mappings = {
  admin_group = {
    identity_type = "GROUP"
    identity_id   = data.aws_identitystore_group.this.group_id
  }
  # Add additional user or group mappings as needed
}
Session mappings are only applicable when auth_mode = "SSO". They are ignored for IAM authentication mode.