Secure admission to S3 buckets using instance profiles

An IAM function is an AWS identity with permission policies that determine what the identity can and cannot do in AWS. An instance profile is a container for an IAM part that you can use to pass the part information to an EC2 case when the case starts.

In order to access AWS resources securely, you can launch Databricks clusters with instance profiles that allow y'all to access your information from Databricks clusters without having to embed your AWS keys in notebooks. This article explains how to set up instance profiles and employ them in Databricks to access S3 buckets securely.

Note

An alternative to using case profiles for access to S3 buckets from Databricks clusters is IAM credential passthrough, which passes an individual user's IAM role to Databricks and uses that IAM role to determine access to information in S3. This allows multiple users with unlike data admission policies to share a Databricks cluster. Case profiles, by contrast, are associated with simply one IAM role, which requires that all users of a Databricks cluster share that role and its data access policies. For more information, come across Admission S3 buckets using IAM credential passthrough with Databricks SCIM.

Requirements

  • AWS administrator access to IAM roles and policies in the AWS business relationship of the Databricks deployment and the AWS account of the S3 bucket.

  • Target S3 bucket. This bucket must belong to the aforementioned AWS account as the Databricks deployment or there must be a cross-account bucket policy that allows access to this saucepan from the AWS business relationship of the Databricks deployment.

  • If you intend to enable encryption for the S3 bucket, yous must add the IAM role as a Central User for the KMS key provided in the configuration. See Configure KMS encryption for s3a:// paths.

Footstep one: Create an instance profile to access an S3 bucket

  1. In the AWS console, go to the IAM service.

  2. Click the Roles tab in the sidebar.

  3. Click Create role.

    1. Nether Select type of trusted entity, select AWS service.

    2. Nether Choose the service that will use this part, select EC2.

      Select service

    3. Click Side by side: Permissions, Side by side: Tags, and Next: Review.

    4. In the Part proper name field, type a function proper noun.

    5. Click Create office. The list of roles displays.

  4. In the role list, click the role.

  5. Add an inline policy to the role. This policy grants admission to the S3 bucket.

    1. In the Permissions tab, click Add together Inline policy.

    2. Click the JSON tab.

    3. Copy this policy and gear up <s3-bucket-name> to the name of your bucket.

                                                  {                                                                                        "Version"                      :                                                                  "2012-x-17"                      ,                                                                                        "Statement"                      :                                                                  [                                                                                        {                                                                                        "Effect"                      :                                                                  "Allow"                      ,                                                                                        "Action"                      :                                                                  [                                                                                        "s3:ListBucket"                                                                                        ],                                                                                        "Resources"                      :                                                                  [                                                                                        "arn:aws:s3:::<s3-saucepan-name>"                                                                                        ]                                                                                        },                                                                                        {                                                                                        "Effect"                      :                                                                  "Let"                      ,                                                                                        "Action"                      :                                                                  [                                                                                        "s3:PutObject"                      ,                                                                                        "s3:GetObject"                      ,                                                                                        "s3:DeleteObject"                      ,                                                                                        "s3:PutObjectAcl"                                                                                        ],                                                                                        "Resource"                      :                                                                  [                                                                                        "arn:aws:s3:::<s3-bucket-proper noun>/*"                                                                                        ]                                                                                        }                                                                                        ]                                            }                                          
    4. Click Review policy.

    5. In the Name field, blazon a policy proper noun.

    6. Click Create policy.

  6. In the role summary, copy the Instance Profile ARN.

    Instance profile ARN

Step 2: Create a saucepan policy for the target S3 bucket

At a minimum, the S3 policy must include the ListBucket and GetObject actions.

Important

The s3:PutObjectAcl permission is required if you perform Step 7: Update cross-business relationship S3 object ACLs to configure the bucket owner to accept access to all of the data in the saucepan.

Bucket policy

  1. Paste in a policy. A sample cross-account bucket IAM policy could be the post-obit, replacing <aws-account-id-databricks> with the AWS account ID where the Databricks environment is deployed, <iam-role-for-s3-access> with the role you lot created in Footstep 1, and <s3-bucket-name> with the bucket name.

                                        {                                                                        "Version"                  :                                                      "2012-10-17"                  ,                                                                        "Statement"                  :                                                      [                                                                        {                                                                        "Sid"                  :                                                      "Example permissions"                  ,                                                                        "Effect"                  :                                                      "Let"                  ,                                                                        "Primary"                  :                                                      {                                                                        "AWS"                  :                                                      "arn:aws:iam::<aws-account-id-databricks>:role/<iam-office-for-s3-admission>"                                                                        },                                                                        "Action"                  :                                                      [                                                                        "s3:GetBucketLocation"                  ,                                                                        "s3:ListBucket"                                                                        ],                                                                        "Resource"                  :                                                      "arn:aws:s3:::<s3-bucket-name>"                                                                        },                                                                        {                                                                        "Issue"                  :                                                      "Allow"                  ,                                                                        "Main"                  :                                                      {                                                                        "AWS"                  :                                                      "arn:aws:iam::<aws-account-id-databricks>:role/<iam-role-for-s3-admission>"                                                                        },                                                                        "Activeness"                  :                                                      [                                                                        "s3:PutObject"                  ,                                                                        "s3:GetObject"                  ,                                                                        "s3:DeleteObject"                  ,                                                                        "s3:PutObjectAcl"                                                                        ],                                                                        "Resource"                  :                                                      "arn:aws:s3:::<s3-bucket-name>/*"                                                                        }                                                                        ]                                    }                                  
  2. Click Save.

Step 3: Note the IAM role used to create the Databricks deployment

This IAM function is the office you used when yous gear up the Databricks business relationship.

If you lot are on an E2 account:

  1. Equally the account owner or an acount admin, log in to the account console.

  2. Get to Workspaces and click your workspace name.

  3. In the Credentials box, note the function name at the end of the Office ARN.

    For example, in the Part ARN arn:aws:iam::123456789123:role/finance-prod , finance-prod is the office name.

If you are not on an E2 account:

  1. Equally the account owner, log in to the account panel.

  2. Click the AWS Account tab.

  3. Note the role name at the end of the Role ARN, here testco-role.

    IAM role

Step 4: Add the S3 IAM role to the EC2 policy

  1. In the AWS console, become to the IAM service.

  2. Click the Roles tab in the sidebar.

  3. Click the function you lot noted in Step 3.

  4. On the Permissions tab, click the policy.

  5. Click Edit Policy.

  6. Modify the policy to permit Databricks to pass the IAM role you created in Footstep 1 to the EC2 instances for the Spark clusters. Hither is an example of what the new policy should look similar. Replace <iam-role-for-s3-admission> with the role you lot created in Step ane:

                                        {                                                                        "Version"                  :                                                      "2012-10-17"                  ,                                                                        "Statement"                  :                                                      [                                                                        {                                                                        "Sid"                  :                                                      "Stmt1403287045000"                  ,                                                                        "Result"                  :                                                      "Allow"                  ,                                                                        "Activity"                  :                                                      [                                                                        "ec2:AssociateDhcpOptions"                  ,                                                                        "ec2:AssociateIamInstanceProfile"                  ,                                                                        "ec2:AssociateRouteTable"                  ,                                                                        "ec2:AttachInternetGateway"                  ,                                                                        "ec2:AttachVolume"                  ,                                                                        "ec2:AuthorizeSecurityGroupEgress"                  ,                                                                        "ec2:AuthorizeSecurityGroupIngress"                  ,                                                                        "ec2:CancelSpotInstanceRequests"                  ,                                                                        "ec2:CreateDhcpOptions"                  ,                                                                        "ec2:CreateInternetGateway"                  ,                                                                        "ec2:CreateKeyPair"                  ,                                                                        "ec2:CreateRoute"                  ,                                                                        "ec2:CreateSecurityGroup"                  ,                                                                        "ec2:CreateSubnet"                  ,                                                                        "ec2:CreateTags"                  ,                                                                        "ec2:CreateVolume"                  ,                                                                        "ec2:CreateVpc"                  ,                                                                        "ec2:CreateVpcPeeringConnection"                  ,                                                                        "ec2:DeleteInternetGateway"                  ,                                                                        "ec2:DeleteKeyPair"                  ,                                                                        "ec2:DeleteRoute"                  ,                                                                        "ec2:DeleteRouteTable"                  ,                                                                        "ec2:DeleteSecurityGroup"                  ,                                                                        "ec2:DeleteSubnet"                  ,                                                                        "ec2:DeleteTags"                  ,                                                                        "ec2:DeleteVolume"                  ,                                                                        "ec2:DeleteVpc"                  ,                                                                        "ec2:DescribeAvailabilityZones"                  ,                                                                        "ec2:DescribeIamInstanceProfileAssociations"                  ,                                                                        "ec2:DescribeInstanceStatus"                  ,                                                                        "ec2:DescribeInstances"                  ,                                                                        "ec2:DescribePrefixLists"                  ,                                                                        "ec2:DescribeReservedInstancesOfferings"                  ,                                                                        "ec2:DescribeRouteTables"                  ,                                                                        "ec2:DescribeSecurityGroups"                  ,                                                                        "ec2:DescribeSpotInstanceRequests"                  ,                                                                        "ec2:DescribeSpotPriceHistory"                  ,                                                                        "ec2:DescribeSubnets"                  ,                                                                        "ec2:DescribeVolumes"                  ,                                                                        "ec2:DescribeVpcs"                  ,                                                                        "ec2:DetachInternetGateway"                  ,                                                                        "ec2:DisassociateIamInstanceProfile"                  ,                                                                        "ec2:ModifyVpcAttribute"                  ,                                                                        "ec2:ReplaceIamInstanceProfileAssociation"                  ,                                                                        "ec2:RequestSpotInstances"                  ,                                                                        "ec2:RevokeSecurityGroupEgress"                  ,                                                                        "ec2:RevokeSecurityGroupIngress"                  ,                                                                        "ec2:RunInstances"                  ,                                                                        "ec2:TerminateInstances"                                                                        ],                                                                        "Resources"                  :                                                      [                                                                        "*"                                                                        ]                                                                        },                                                                        {                                                                        "Result"                  :                                                      "Allow"                  ,                                                                        "Action"                  :                                                      "iam:PassRole"                  ,                                                                        "Resources"                  :                                                      "arn:aws:iam::<aws-business relationship-id-databricks>:role/<iam-role-for-s3-access>"                                                                        },                                                                        {                                                                        "Event"                  :                                                      "Allow"                  ,                                                                        "Action"                  :                                                      [                                                                        "iam:CreateServiceLinkedRole"                  ,                                                                        "iam:PutRolePolicy"                                                                        ],                                                                        "Resource"                  :                                                      "arn:aws:iam::*:office/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot"                  ,                                                                        "Condition"                  :                                                      {                                                                        "StringLike"                  :                                                      {                                                                        "iam:AWSServiceName"                  :                                                      "spot.amazonaws.com"                                                                        }                                                                        }                                                                        }                                                                        ]                                    }                                  

    Annotation

    If your account is on the E2 version of the Databricks platform, you lot can omit ec2:CreateKeyPair and ec2:DeleteKeyPair . If yous are not sure of your account'due south version, contact your Databricks representative.

  7. Click Review policy.

  8. Click Save changes.

Step v: Add together the instance profile to Databricks

  1. Go to the Admin Console.

  2. Click the Instance Profiles tab.

  3. Click the Add Instance Profile push button. A dialog appears.

  4. Paste in the example profile ARN from Pace 1.

    Add instance profile ARN

    You select the Meta Instance Profile property just when you are setting up IAM credential passthrough.

    Databricks validates that the instance profile ARN is both syntactically and semantically right. To validate semantic correctness, Databricks does a dry run by launching a cluster with this instance contour. Any failure in this dry run produces a validation error in the UI. Validation of the case profile can fail if the instance contour contains the tag-enforcement policy, preventing you lot from adding a legitimate case profile. If the validation fails and you still want to add together the instance contour, select the Skip Validation checkbox.

  5. Click Add.

  6. Optionally specify the users who tin launch clusters with the instance contour.

    Add users

Step 6: Launch a cluster with the instance contour

  1. Select or create a cluster.

  2. Open the Advanced Options section.

  3. On the Instances tab, select the instance profile from the Instance Profile drop-downward list. This drop-downwards includes all of the case profiles that are available for the cluster.

    Select instance profile

  4. Verify that you can access the S3 bucket, using the post-obit command:

                                        dbutils                  .                  fs                  .                  ls                  (                  "s3a://<s3-saucepan-proper name>/"                  )                

    If the command succeeds, go to Step 7.

Warning

Once a cluster launches with an case profile, anyone who has attach permission to the cluster can admission the underlying resources controlled past this role. To limit unwanted access, you can use cluster ACLs to restrict adhere permissions.

Step 7: Update cantankerous-account S3 object ACLs

If you are writing to some other S3 saucepan within the aforementioned AWS account, y'all tin can stop here.

When you write to a file in a cross-account S3 bucket, the default setting allows simply yous to access that file. The assumption is that you will write files to your own buckets, and this default setting protects your data. To allow the bucket possessor to have access to all of the objects in the bucket, you must add the BucketOwnerFullControl ACL to the objects written by Databricks.

  1. On the Spark tab on the cluster detail page, fix the following properties:

                                        spark.hadoop.fs.s3a.acl.default BucketOwnerFullControl                                  
  2. Verify that you tin write information to the S3 bucket, and check that the permissions enable other tools and users to access the contents written past Databricks.

Automated configuration using Terraform

You can employ Databricks Terraform provider to automatically configure AWS IAM roles and their cluster attachment. Here's a sample relevant configuration, where other resources are omitted for brevity:

              resources "aws_iam_role" "data_role" {   name               = "${var.prefix}-first-ec2s3"   description        = "(${var.prefix}) EC2 Presume Role office for S3 access"   assume_role_policy = data.aws_iam_policy_document.assume_role_for_ec2.json   tags               = var.tags }  resource "aws_iam_instance_profile" "this" {   name = "${var.prefix}-outset-profile"   part = aws_iam_role.data_role.name }  resource "databricks_instance_profile" "ds" {   instance_profile_arn = aws_iam_instance_profile.this.arn }  information "databricks_node_type" "smallest" {   local_disk = truthful }  data "databricks_spark_version" "latest_lts" {   long_term_support = true }  resource "databricks_cluster" "shared_autoscaling" {   cluster_name            = "Shared Autoscaling"   spark_version           = information.databricks_spark_version.latest_lts.id   node_type_id            = data.databricks_node_type.smallest.id   autotermination_minutes = 20    autoscale {     min_workers = 1     max_workers = 50   }    aws_attributes {     instance_profile_arn = databricks_instance_profile.ds.id   } }            

Oft asked questions (FAQ)

I don't run across any instance profiles configured for my access when I create a cluster.

If you are an admin, go to the Admin Console and follow the instructions in this article to add an instance profile. Otherwise, contact your admin, who can add an example profile using the instructions in this article.

I am using mount points to store credentials. How do mountain points work on clusters with instance profile?

Existing mount points piece of work as they do on clusters that don't employ instance contour. When you launch a cluster with an case profile, y'all tin also mountain an S3 bucket without passing credentials, using:

                            dbutils              .              fs              .              mount              (              "s3a://$              {pathtobucket}              "              ,              "/mnt/$              {MountPointName}              "              )