Migrate from OSS
Migrating your Ray projects onto Anyscale
When moving your Ray project onto Anyscale, you just need to make a few small changes.
Pre-requisites
Onboard to the Anyscale platform.
You or an admin in your organization has already configured a cloud.
Migration steps
Create an anyscale project
From within the directory with your project files, run
anyscale init
and give your project a name. This directory is now associated with the anyscale project you created.
Convert your application dependencies
If you require debian packages, create a cluster environment. This can be done in the UI, API, or SDK. Make note of the cluster environment build name that gets created. Ex:
my_cluster_env:1
Your pip and conda dependencies can be declared in a runtime environment just like in OSS. If they already are, you do not need to make any changes.
Your project directory will automatically upload to the cluster. This allows you to connect to the cluster and import any of your local modules.
Convert your compute configuration
You can create a cluster compute configuration using the UI, API, or SDK. This configuration will define your compute configuration.
Take the
provider
config from your cluster config and set those values in the cluster compute config.Take the
available_node_types
config and set those values in the cluster compute config.Advanced options for
AWS
can be placed in theaws
field in the cluster compute config or in the "Advanced configurations" box when using the UI.
Connect to an Anyscale cluster
Now, we just need to connect to an anyscale cluster instead of a cluster that you are managing. There are two ways to do it:
No code change: set your
RAY_ADDRESS
environment variable toanyscale://my-cluster?cluster_env=my_cluster_env:1&cluster_compute=my_compute_config
RAY_ADDRESS=111.222.101.202 python your_script.py # change to... RAY_ADDRESS=anyscale://my_cluster python your_script.py
1 line code change: In your source code, replace the address in your ray connect call to connect to anyscale by passing in
anyscale://
as the address. Also, pass in the environments created in the previous steps by using their names. For example:ray. \ .client("anyscale://my-cluster") \ .cluster_env("my_cluster_env:1") \ .cluster_compute("my_compute_config") \ .connect()
import ray ray.init(address="anyscale://my_cluster")
If you wish to re-use an existing cluster or want to deploy a long-running service that you wish to modify in the future, you can provide a cluster name in the address. For example, by setting RAY_ADDRESS=anyscale://my-cluster
in your environment or by providing it explicitly in the code ray.init("anyscale://my-cluster")
your script will either create a new cluster with that name or connect to it if it already exists.
Concepts
In Anyscale, the monolithic cluster config has been split to two configurations, a cluster environment and a cluster compute configuration. Cluster environments are automatically built into a image for quick and easy re-use in different clusters.
Anyscale will manage the cluster's lifecycle for you. It will launch clusters when needed and shut them down if they have not been used for a while (this timeout is configurable, down to the second). You can also manually modify the cluster using the UI, API, or SDK.
The API of Anyscale is focused around your code. That is why the primary entry point into Anyscale is to "connect" via a python sdk and then execute a job or deploy a service. The UI is also focused around these concepts. We give you the ability to monitor your clusters for debugging, cost tracking, and advanced uses.
Example
Before
# cluster.yaml
cluster_name: default
min_workers: 2
max_workers: 2
upscaling_speed: 1.0
idle_timeout_minutes: 5
provider:
type: aws
region: us-west-2
availability_zone: us-west-2a
auth:
ssh_user: ubuntu
head_node:
InstanceType: m4.16xlarge
ImageId: ami-0def3275
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 50
worker_nodes:
InstanceType: m4.16xlarge
ImageId: ami-0def3275
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 50
InstanceMarketOptions:
MarketType: spot
setup_commands:
- sudo apt-get update
- sudo apt-get install -y build-essential curl unzip
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --head --port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379
# deploy.py
import ray
from app import do_work
runtime_env = {
"pip": [
"chess"
]
}
ray.init("anyscale://", runtime_env=runtime_env)
do_work.remote()
After
# setup.py
from anyscale.sdk.anyscale_client.sdk import AnyscaleSDK
anyscale_sdk = AnyscaleSDK("sss_authtoken")
# Create Cluster environment
anyscale_sdk.create_app_config({
"name": "my_cluster_env",
"config_json": {
"base_image": "anyscale/ray-ml:1.4.0-py37",
"debian_packages": ["build-essential", "curl", "unzip"],
}
})
# Create Cluster compute config
anyscale_sdk.create_compute_template({
"name": "my_cluster_compute_config",
"config": {
"cloud_id": "cld_4F7k8814aZzGG8TNUGPKnc",
"region": "us-west-2",
"allowed_azs": ["us-west-2a"],
"head_node_type": {
"name": "head_node_type",
"instance_type": "m4.16xlarge"
},
"worker_node_types": [
{
"name": "worker_node_1",
"instance_type": "m4.16xlarge",
"use_spot": True,
"min_workers": 2,
"max_workers": 2,
}
],
"aws": {
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"VolumeSize": 50,
},
},
],
}
}
})
No changes to the deploy.py
file, just changes to how you run it.
RAY_ADDRESS="anyscale://my_cluster?cluster_env=my_cluster_env:1&cluster_compute=my_cluster_compute_config" python deploy.py
Last updated
Was this helpful?