pip packages fail to install within a cluster environment
If you see this message in Cluster Environment build logs:
[INFO] 6/23/2021, 4:20:16 PM: Running setup.py install for <SOME PIP PACKAGE>: finished with status 'error'
[ERROR] 6/23/2021, 4:20:16 PM: ERROR: Command errored out with exit status 1:
...
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy in ['/home/ray/anaconda3/lib/python3.8/site-packages/numpy']. One method of fixing this is to repeatedly uninstall numpy until none is found, then reinstall this version.
Future versions of anyscale will include support for tensorboard out of the box. Here is how to use it today.
Workaround
After having run your training clusters, there will be log files in your cluster at /home/ray/ray_results.
Use the Anyscale CLI to ssh into your cluster, including a port forwarding option for tensorboard:
anyscale ssh -o -L6006:localhost:6006
Inside the resulting cluster, launch tensorboard.
Open a brower on your local machine to http://localhost:6006 and use the tensorboard UI.
push -a does not update worker nodes
Problem
The anyscale push -a command is expected to copy code from the working directory to all nodes in the cluster. However, it only copies to the cluster’s head node.
Note that anyscale push is deprecated. Using ray.init("anyscale://") to interact with Ray ensures that your code is distrbuted to each node.
Resolution
This issue is mitigated in Ray 1.4 with runtime environments, which automatically upload and sync to all nodes in a cluster.
The deprecated -a option will not be available in future releases of Anyscale. Ray 1.4 has support for file mounts in ray core, which is the supported method for keeping files synchronized among Anyscale head and worker nodes.
Workarounds
To ensure your code gets to all of the nodes, use one of the following strategies:
Use anyscale up
This command will restart your cluster and ensure the code is distributed to all nodes.
Leverage anyscale connect to update clusters
Anyscale connect ensures that all of your local code is shipped to the cluster anyway.
Use push and copy method
This more intricate method is provided as a workaround for updating code in running clusters. It is not recommended.
anyscale push to send your local files to the leader node.
Use anyscale ssh to connect to the cluster interactively.
Use scp to copy files among the hosts in the cluster.