Access Resources From Cloud
Accessing Cloud Provider Services (S3, Google Cloud Storage, Amazon Cloudwatch, etc.)
Anyscale natively runs clusters with a specific identity. This identity can be used to as a way of granting clusters access to specific resources. The exact way to go about this depends on where your Anyscale clusters are running.
Running on AWS [EC2]
On EC2, Anyscale clusters run with the following role:
arn:aws:iam::<your_aws_account_id>:instance-profile/ray-autoscaler-v1
To make sure that all nodes are run with this role, add the following configuration to the Advanced Configuration of a Compute Configuration. Without this, only the head node will run with this role.
{
"IamInstanceProfile": {
"Arn": "arn:aws:iam::<aws_account_id>:instance-profile/ray-autoscaler-v1"
}
}
You will now need to configure the given resource to grant this identity access. We will use granting Read/Write access to S3 as an example.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowClusterOnAnyscaleS3Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<aws_account_id>:role/ray-autoscaler-v1"
},
"Action": "s3:*Object*",
"Resource": "arn:aws:s3:::BucketName/*"
},
{
"Sid": "AllowClusterOnAnyscaleS3ListBucket",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<aws_account_id>:role/ray-autoscaler-v1"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::BucketName"
}
]
}
Running on Anyscale [AWS]
By default, clusters run with the following role that has S3 full access for external accounts.
The role name is as follows. Please ask your Anyscale contact for the account_id
to use!:
arn:aws:sts::<Anyscale's AWS account_id>:assumed-role/<cloud_id>-cluster_node_role/
To grant this role access to list, get & put objects in a bucket named "BucketName".
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowClusterOnAnyscaleS3ModifyObjects",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<Anyscale's AWS Account_id>:role/<cloud_id>-cluster_node_role"
},
"Action": "s3:*Object*",
"Resource": "arn:aws:s3:::BucketName/*"
},
{
"Sid": "AllowClusterOnAnyscaleS3ListBucket",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<Anyscale's AWS Account_id>:role/<cloud_id>-cluster_node_role"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::BucketName"
}
]
}
Running on GCP
On GCP, the underlying Service Account changes between generations of the underlying infrastructure. It will be of the form gke-service-account@<project_id>
, where project_id
is the ID that shows up in the GCP console. To determine the current service account, you can run the following snippet:
python -c "import google.auth.transport.requests; c,_=google.auth.default(); c.refresh(google.auth.transport.requests.Request()); print(c.service_account_email)"
To install gs_util on a head node, simply:
wget -qO- https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-359.0.0-linux-x86_64.tar.gz | tar xvz
You can then start using gsutil like:
./google-cloud-sdk/bin/gsutil cp <file> gs://<bucket>
To grant this Service Account access to a Google Cloud Storage bucket:
Go the "Permissions" tab of the bucket
Click "Add"
Type the Service Account Email as a "New principal"
Select a role to grant ("Storage Object Admin" and "Storage Object Viewer" should give full R/W List access) to the principal.
Click Save
If you install gs_util
via pip
, you may need to add the following to ~/.boto
:
[GoogleCompute]
service_account = default
This can be done by running printf "[GoogleCompute]\nservice_account = default\n" > /home/ray/.boto
Last updated
Was this helpful?