-
Notifications
You must be signed in to change notification settings - Fork 187
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Currently NeMo Curator can be used on Anyscale, however there is some work that needs to be done to support it better
Gotchya
- Set num_cpus / num_gpus on head node to zero
- Set higher idle node timeout
- Ensure no stage has
num_cpus <= 0otherwise task can be scheduled on head node (See Add ability to ignore_head_node for RayDataExecutor and RayActorPoolExecutor #1209 (comment))
TODO
- Allow executors to not schedule on head node / ignore head node (see anyscale docs)
- Ray Data and Ray Actor Pool Add ability to ignore_head_node for RayDataExecutor and RayActorPoolExecutor #1209
- Xenna
- Improve Cloud I/O Support
- [ ] pandas read_parquet fails for cloud reads on list of files #1214
- [ ] pandas read_parquet on a directory might give error on cloud files #1217
- [ ] Semantic Dedup Pairwise IO fails to list files for remote cloud path #1213
- [ ] Look into if Anyscale supports mounting cloud paths s.t. all paths are "local" - See if notebook vs submitting Ray Jobs are any different?
- Have some form of QA
- Have dedicated documentation for Anyscale
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request