Orchestration with Airflow
Make ML pipelines a breeze with Airflow and Anyscale
An Airflow DAG using the Ray Provider
Building a flow of tasks that depend on each other is a critical piece of infrastructure that most ML or Data Platforms within an organization will need.
In Airflow parlance, the recommended integration point is the Airflow-Ray Provider. See below for a diagram for how this fits together alongside other operators, tasks, and your storage layer:

Writing a DAG that uses Ray as the Provider layer looks like this:
import anyscale
anyscale.connect()
# ... now everything below runs on your Anyscale Ray cloud!
from ray_provider.decorators import ray_task
default_args = {
'on_success_callback': RayBackend.on_success_callback,
'on_failure_callback': RayBackend.on_failure_callback,
.
.
.
}
task_args = {"ray_conn_id": "ray_cluster_connection"}
.
.
.
@dag(
default_args=default_args,
.
.
)
def ray_example_dag():
@ray_task(**task_args)
def sum_cols(df: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame(df.sum()).T
See a full example on GitHub that you can pull down and try out here.
Last updated
Was this helpful?