Known Issues

Known issues in Anyscale and Ray

Known issues

pip packages fail to install within a cluster environment

If you see this message in Cluster Environment build logs:

[INFO] 6/23/2021, 4:20:16 PM:     Running setup.py install for <SOME PIP PACKAGE>: finished with status 'error'
[ERROR] 6/23/2021, 4:20:16 PM:     ERROR: Command errored out with exit status 1:
...
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy in ['/home/ray/anaconda3/lib/python3.8/site-packages/numpy']. One method of fixing this is to repeatedly uninstall numpy until none is found, then reinstall this version.

Workaround

  • Remove from pip section

  • Add the following to post build commands:

/home/ray/anaconda3/bin/python -m pip uninstall -y numpy
rm -rf /home/ray/anaconda3/lib/python3.<7 OR 8>/site-packages/numpy
/home/ray/anaconda3/bin/pip install numpy
/home/ray/anaconda3/bin/pip install --upgrade --no-cache-dir <SOME PIP PACKAGE>

Tensorboard support requires port forwarding

Problem

Future versions of anyscale will include support for tensorboard out of the box. Here is how to use it today.

Workaround

  • After having run your training clusters, there will be log files in your cluster at /home/ray/ray_results.

  • Use the Anyscale CLI to ssh into your cluster, including a port forwarding option for tensorboard:

anyscale ssh -o -L6006:localhost:6006

  • Inside the resulting cluster, launch tensorboard.

  • Open a brower on your local machine to http://localhost:6006 and use the tensorboard UI.

push -a does not update worker nodes

Problem

The anyscale push -a command is expected to copy code from the working directory to all nodes in the cluster. However, it only copies to the cluster’s head node.

Note that anyscale push is deprecated. Using ray.init("anyscale://") to interact with Ray ensures that your code is distrbuted to each node.

Resolution

This issue is mitigated in Ray 1.4 with runtime environments, which automatically upload and sync to all nodes in a cluster.

The deprecated -a option will not be available in future releases of Anyscale. Ray 1.4 has support for file mounts in ray core, which is the supported method for keeping files synchronized among Anyscale head and worker nodes.

Workarounds

To ensure your code gets to all of the nodes, use one of the following strategies:

Use anyscale up

This command will restart your cluster and ensure the code is distributed to all nodes.

Leverage anyscale connect to update clusters

Anyscale connect ensures that all of your local code is shipped to the cluster anyway.

Use push and copy method

This more intricate method is provided as a workaround for updating code in running clusters. It is not recommended.

  • anyscale push to send your local files to the leader node.

  • Use anyscale ssh to connect to the cluster interactively.

  • Use scp to copy files among the hosts in the cluster.

Last updated

Was this helpful?