Anyscale supports production jobs that are submitted as a standalone package and managed by the platform. These types of jobs are best suited for production workflows where you want Anyscale to automatically handle starting up the cluster and handling failures.
After submitting the job definition to Anyscale, Anyscale will automatically create a cluster, run the job on it, and monitor the job until it succeeds. If the job fails, it will automatically be restarted (up to a configurable number of retries).
Defining and submitting a job
When you submit a production job, you must provide the following:
An optional runtime environment containing your application code and dependencies.
Note: the working_dir option of the runtime environment must be a remote URI to a zip file such as an S3 or Google Cloud Storage bucket or GitHub URL. The cluster running the job must have permissions to download from that URI.
The entrypoint command that will be run on the cluster to run the job.
Configuration options for the job, such as a name or the number of times it can be retried before being marked "failed."
These options can be specified in a configuration file:
# Kick off the job defined above using the CLI.
anyscale job submit example_job.yaml
Job my_production_job created, id: job_2xR6uT6t7jJuu1aCwWMsle
Monitor your job in the Anyscale UI at:
https://console.anyscale.com/o/anyscale-internal/projects/prj_2xR6uT6t7jJuu1aCwWMsle/jobs/job_2xR6uT6t7jJuu1aCwWMsle
# Query the status of the job using the ID returned above.
anyscale job status job_2xR6uT6t7jJuu1aCwWMsle
status: RUNNING
num_retries: 0
retries_remaining: 3
# Stop the job.
anyscale job stop job_2xR6uT6t7jJuu1aCwWMsle
# Get the status again. Now it should be STOPPED.
anyscale job status job_2xR6uT6t7jJuu1aCwWMsle
status: STOPPED
num_retries: 0
retries_remaining: 0
// TODO: fill in the Python equivalent to the bash example.