Production Services

Going to production with Ray Serve on Anyscale.

Anyscale supports production services that are submitted as a standalone package containing a Ray Serve application. The service will be fully managed by the platform, providing fault tolerance and declarative upgrades.

Services consist of a single, long-running Ray cluster with a persistent DNS name that may be restarted over time.

Defining and creating a service

To create a production service, you must provide the following:

  • A compute config and cluster environment for the cluster the service will run on.

  • An optional runtime environment containing your application code and dependencies.

    • Note: the working_dir option of the runtime environment must be a remote URI to a zip file such as an S3 or Google Cloud Storage bucket or GitHub URL. The cluster running the service must have permissions to download from that URI.

  • The Ray Serve deployments that will be created on the cluster.

    • Note: you can also provide a deployment script that creates Ray Serve deployments manually, but that is not the recommended workflow.

  • A health check that the platform will use to determine if your service is healthy. This should be an HTTP GET endpoint that returns status 200 if the service is healthy.

These options can be specified in a configuration file:

All of these options together define your service, which can then be submitted to Anyscale using the CLI, Python SDK, or HTTP API.

Upgrading a service

The deployments in a service can be upgraded by updating its deployments in the service configuration and re-deploying. This will trigger Ray Serve to do a rolling update for the deployments without needing to create a new Ray cluster.

  • Note: updating cluster-level configuration such as the compute config or cluster environment is not currently supported. To update these, you must create a new service with the new cluster configuration.

Monitoring a service

In addition to querying for status using the CLI or SDK, you can also view the status of your service in the Anyscale UI.

Information about the deployments in your service can be found in the detail page of the cluster or service that they are running in:

For each deployment, Anyscale will generate a dashboard in which you can monitor the health and activity of your deployment. The following graphs are automatically generated for you:

  • CPU Utilization (Cluster-wide)

  • Memory Utilization (Cluster-wide)

  • p95 Latency The p95 processing latency for queries of your deployment

  • Exceptions The exceptions per second for your deployment

  • QPS The queries per second for your deployment

Last updated

Was this helpful?