Migrate from OSS

Migrating your Ray projects onto Anyscale

When moving your Ray project onto Anyscale, you just need to make a few small changes.

Pre-requisites

Migration steps

Create an anyscale project

  1. From within the directory with your project files, run anyscale init and give your project a name. This directory is now associated with the anyscale project you created.

Convert your application dependencies

  1. If you require debian packages, create a cluster environment. This can be done in the UI, API, or SDK. Make note of the cluster environment build name that gets created. Ex: my_cluster_env:1

  2. Your pip and conda dependencies can be declared in a runtime environment just like in OSS. If they already are, you do not need to make any changes.

  3. Your project directory will automatically upload to the cluster. This allows you to connect to the cluster and import any of your local modules.

You can also specify a working directory using the working_dir parameter in a runtime environment. This lets you customize what files to sync to your cluster.

Convert your compute configuration

You can create a cluster compute configuration using the UI, API, or SDK. This configuration will define your compute configuration.

  1. Take the provider config from your cluster config and set those values in the cluster compute config.

  2. Take the available_node_types config and set those values in the cluster compute config.

  3. Advanced options for AWS can be placed in the aws field in the cluster compute config or in the "Advanced configurations" box when using the UI.

Connect to an Anyscale cluster

Now, we just need to connect to an anyscale cluster instead of a cluster that you are managing. There are two ways to do it:

  • No code change: set your RAY_ADDRESS environment variable to anyscale://my-cluster?cluster_env=my_cluster_env:1&cluster_compute=my_compute_config

    RAY_ADDRESS=111.222.101.202 python your_script.py
    # change to...
    RAY_ADDRESS=anyscale://my_cluster python your_script.py
  • 1 line code change: In your source code, replace the address in your ray connect call to connect to anyscale by passing in anyscale:// as the address. Also, pass in the environments created in the previous steps by using their names. For example: ray. \ .client("anyscale://my-cluster") \ .cluster_env("my_cluster_env:1") \ .cluster_compute("my_compute_config") \ .connect()

    import ray
    ray.init(address="anyscale://my_cluster")

If you wish to re-use an existing cluster or want to deploy a long-running service that you wish to modify in the future, you can provide a cluster name in the address. For example, by setting RAY_ADDRESS=anyscale://my-cluster in your environment or by providing it explicitly in the code ray.init("anyscale://my-cluster") your script will either create a new cluster with that name or connect to it if it already exists.

Concepts

  • In Anyscale, the monolithic cluster config has been split to two configurations, a cluster environment and a cluster compute configuration. Cluster environments are automatically built into a image for quick and easy re-use in different clusters.

  • Anyscale will manage the cluster's lifecycle for you. It will launch clusters when needed and shut them down if they have not been used for a while (this timeout is configurable, down to the second). You can also manually modify the cluster using the UI, API, or SDK.

  • The API of Anyscale is focused around your code. That is why the primary entry point into Anyscale is to "connect" via a python sdk and then execute a job or deploy a service. The UI is also focused around these concepts. We give you the ability to monitor your clusters for debugging, cost tracking, and advanced uses.

Example

Before

# cluster.yaml
cluster_name: default
min_workers: 2
max_workers: 2
upscaling_speed: 1.0
idle_timeout_minutes: 5
provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a
auth:
    ssh_user: ubuntu
head_node:
    InstanceType: m4.16xlarge
    ImageId: ami-0def3275
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 50
worker_nodes:
    InstanceType: m4.16xlarge
    ImageId: ami-0def3275
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 50
    InstanceMarketOptions:
        MarketType: spot
setup_commands:
    - sudo apt-get update
    - sudo apt-get install -y build-essential curl unzip
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --head --port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379
# deploy.py
import ray
from app import do_work

runtime_env = {
  "pip": [
    "chess"
  ]
}

ray.init("anyscale://", runtime_env=runtime_env)

do_work.remote()

After

# setup.py
from anyscale.sdk.anyscale_client.sdk import AnyscaleSDK

anyscale_sdk = AnyscaleSDK("sss_authtoken")

# Create Cluster environment
anyscale_sdk.create_app_config({
    "name": "my_cluster_env",
    "config_json": {
      "base_image": "anyscale/ray-ml:1.4.0-py37",
      "debian_packages": ["build-essential", "curl", "unzip"],
  }
})

# Create Cluster compute config
anyscale_sdk.create_compute_template({
  "name": "my_cluster_compute_config",
  "config": {
    "cloud_id": "cld_4F7k8814aZzGG8TNUGPKnc",
    "region": "us-west-2",
    "allowed_azs": ["us-west-2a"],
    "head_node_type": {
      "name": "head_node_type",
      "instance_type": "m4.16xlarge"
    },
    "worker_node_types": [
      {
        "name": "worker_node_1",
        "instance_type": "m4.16xlarge",
        "use_spot": True,
        "min_workers": 2,
        "max_workers": 2,
      }
    ],
    "aws": {
      "BlockDeviceMappings": [
        {
          "DeviceName": "/dev/sda1",
          "Ebs": {
            "VolumeSize": 50,
          },
        },
      ],
    }
  }
})

No changes to the deploy.py file, just changes to how you run it.

RAY_ADDRESS="anyscale://my_cluster?cluster_env=my_cluster_env:1&cluster_compute=my_cluster_compute_config" python deploy.py

Last updated

Was this helpful?