Production Services
Going to production with Ray Serve on Anyscale.
Anyscale supports production services that are submitted as a standalone package containing a Ray Serve application. The service will be fully managed by the platform, providing fault tolerance and declarative upgrades.
Services consist of a single, long-running Ray cluster with a persistent DNS name that may be restarted over time.
Defining and creating a service
To create a production service, you must provide the following:
A compute config and cluster environment for the cluster the service will run on.
An optional runtime environment containing your application code and dependencies.
Note: the
working_dir
option of the runtime environment must be a remote URI to a zip file such as an S3 or Google Cloud Storage bucket or GitHub URL. The cluster running the service must have permissions to download from that URI.
The Ray Serve deployments that will be created on the cluster.
Note: you can also provide a deployment script that creates Ray Serve deployments manually, but that is not the recommended workflow.
A health check that the platform will use to determine if your service is healthy. This should be an HTTP GET endpoint that returns status 200 if the service is healthy.
These options can be specified in a configuration file:
name: my_production_service
compute_config: my_cluster_compute_config
cluster_env: "cluster-env-name:1"
runtime_env:
working_dir: "s3://my_bucket/my_service_files.zip"
deployments:
- name: model_a
import_path: "my_models.ModelA"
num_replicas: 10
- name: model_b
import_path: "my_models.ModelB"
num_replicas: 5
- name: composed_model
import_path: "my_models.ComposedModel"
num_replicas: 1
route_prefix: "/composed"
health_check: "/composed/check_health"
All of these options together define your service, which can then be submitted to Anyscale using the CLI, Python SDK, or HTTP API.
# Create the service defined above using the CLI.
anyscale service deploy example_service.yaml
Service my_production_service created, id: svc_2xR6uT6t7jJuu1aCwWMsle
Monitor your service in the Anyscale UI at:
https://console.anyscale.com/o/anyscale-internal/projects/prj_2xR6uT6t7jJuu1aCwWMsle/services/svc_2xR6uT6t7jJuu1aCwWMsle
# Query the status of the service using the ID returned above.
anyscale service status svc_2xR6uT6t7jJuu1aCwWMsle
status: CLUSTER_CREATING
# ... wait for cluster to start up ...
anyscale service status svc_2xR6uT6t7jJuu1aCwWMsle
status: UPDATING
# ... wait for service to finish deploying ...
anyscale service status svc_2xR6uT6t7jJuu1aCwWMsle
status: RUNNING
# Stop the service.
anyscale service stop svc_2xR6uT6t7jJuu1aCwWMsle
# Get the status again. Now it should be STOPPED.
anyscale service stop svc_2xR6uT6t7jJuu1aCwWMsle
status: STOPPED
Upgrading a service
The deployments in a service can be upgraded by updating its deployments in the service configuration and re-deploying. This will trigger Ray Serve to do a rolling update for the deployments without needing to create a new Ray cluster.
Note: updating cluster-level configuration such as the compute config or cluster environment is not currently supported. To update these, you must create a new service with the new cluster configuration.
# We have a service already running.
anyscale service status svc_2xR6uT6t7jJuu1aCwWMsle
status: RUNNING
# Modify the deployments in example_service.yaml, then redeploy.
anyscale service deploy example_service.yaml
Updating service my_production_service with id: svc_2xR6uT6t7jJuu1aCwWMsle
Monitor your service in the Anyscale UI at:
https://console.anyscale.com/o/anyscale-internal/projects/prj_2xR6uT6t7jJuu1aCwWMsle/services/svc_2xR6uT6t7jJuu1aCwWMsle
# Query the status of the service using the ID returned above.
anyscale service status svc_2xR6uT6t7jJuu1aCwWMsle
status: UPDATING
# ... wait for service to finish deploying ...
anyscale service status svc_2xR6uT6t7jJuu1aCwWMsle
status: RUNNING
Monitoring a service
In addition to querying for status using the CLI or SDK, you can also view the status of your service in the Anyscale UI.
Information about the deployments in your service can be found in the detail page of the cluster or service that they are running in:

For each deployment, Anyscale will generate a dashboard in which you can monitor the health and activity of your deployment. The following graphs are automatically generated for you:
CPU Utilization (Cluster-wide)
Memory Utilization (Cluster-wide)
p95 Latency The p95 processing latency for queries of your deployment
Exceptions The exceptions per second for your deployment
QPS The queries per second for your deployment

Last updated
Was this helpful?