Skip to content

Initialization time is too long in scale-out run #1463

@youngeunkwon0405

Description

@youngeunkwon0405

Initialization time is too long to try a scale-out run (# of GPUs > 2000).

When I was trying QWEN3 235B scale out run (2048 GPU-scale), it took 1 hour to see the following setup complete log.

============================================================
                  SETUP COMPLETE
============================================================

This is wasting 2048 GPU hours just for the initialization.

It is crucial to reduce this latency to initiate further scale-out performance study.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions