Access Resources From Cloud

Accessing Cloud Provider Services (S3, Google Cloud Storage, Amazon Cloudwatch, etc.)

Anyscale natively runs clusters with a specific identity. This identity can be used to as a way of granting clusters access to specific resources. The exact way to go about this depends on where your Anyscale clusters are running.

Running on AWS [EC2]

On EC2, Anyscale clusters run with the following role:

arn:aws:iam::<your_aws_account_id>:instance-profile/ray-autoscaler-v1

To make sure that all nodes are run with this role, add the following configuration to the Advanced Configuration of a Compute Configuration. Without this, only the head node will run with this role.

{
  "IamInstanceProfile": {
    "Arn": "arn:aws:iam::<aws_account_id>:instance-profile/ray-autoscaler-v1"
  }
}

You will now need to configure the given resource to grant this identity access. We will use granting Read/Write access to S3 as an example.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowClusterOnAnyscaleS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<aws_account_id>:role/ray-autoscaler-v1"
            },
            "Action": "s3:*Object*",
            "Resource": "arn:aws:s3:::BucketName/*"
        },
        {
            "Sid": "AllowClusterOnAnyscaleS3ListBucket",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<aws_account_id>:role/ray-autoscaler-v1"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::BucketName"
        }
    ]
}

Running on Anyscale [AWS]

By default, clusters run with the following role that has S3 full access for external accounts.

The role name is as follows. Please ask your Anyscale contact for the account_id to use!:

arn:aws:sts::<Anyscale's AWS account_id>:assumed-role/<cloud_id>-cluster_node_role/

To grant this role access to list, get & put objects in a bucket named "BucketName".

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowClusterOnAnyscaleS3ModifyObjects",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<Anyscale's AWS Account_id>:role/<cloud_id>-cluster_node_role"
            },
            "Action": "s3:*Object*",
            "Resource": "arn:aws:s3:::BucketName/*"
        },
        {
            "Sid": "AllowClusterOnAnyscaleS3ListBucket",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<Anyscale's AWS Account_id>:role/<cloud_id>-cluster_node_role"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::BucketName"
        }
    ]
}

NOTE: Adding a policy to an S3 Account

Navigate to the following page: https://s3.console.aws.amazon.com/s3/buckets/<bucket_name>?tab=permissions.
Navigate to the Bucket Policy section and click Edit.
Paste the IAM Policy JSON into the Policy box.
Click Save.

Running on GCP

On GCP, the underlying Service Account changes between generations of the underlying infrastructure. It will be of the form gke-service-account@<project_id>, where project_id is the ID that shows up in the GCP console. To determine the current service account, you can run the following snippet:

python -c "import google.auth.transport.requests; c,_=google.auth.default(); c.refresh(google.auth.transport.requests.Request()); print(c.service_account_email)"

To install gs_util on a head node, simply:

wget -qO- https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-359.0.0-linux-x86_64.tar.gz | tar xvz

You can then start using gsutil like:

./google-cloud-sdk/bin/gsutil cp <file> gs://<bucket>

To grant this Service Account access to a Google Cloud Storage bucket:

Go the "Permissions" tab of the bucket
Click "Add"
Type the Service Account Email as a "New principal"
Select a role to grant ("Storage Object Admin" and "Storage Object Viewer" should give full R/W List access) to the principal.
Click Save

If you install gs_util via pip, you may need to add the following to ~/.boto:

[GoogleCompute]
service_account = default

This can be done by running printf "[GoogleCompute]\nservice_account = default\n" > /home/ray/.boto

PreviousClouds NextBring Your Own Cloud

Last updated 4 years ago

Was this helpful?

hashtagRunning on AWS [EC2]

hashtagRunning on Anyscale [AWS]

hashtagRunning on GCP

Running on AWS [EC2]

Running on Anyscale [AWS]

Running on GCP