Skip to content

[examples/llama3] Add FP4 option #123

Open
matthiasdiener wants to merge 7 commits intosudhu/Megatron-IFU-core_0.16.0_syncfrom
mdiener/nvfp4-llama
Open

[examples/llama3] Add FP4 option #123
matthiasdiener wants to merge 7 commits intosudhu/Megatron-IFU-core_0.16.0_syncfrom
mdiener/nvfp4-llama

Conversation

@matthiasdiener
Copy link
Copy Markdown

@matthiasdiener matthiasdiener commented Apr 9, 2026

Motivation

Technical Details

Test Plan

Tested with ROCm/TransformerEngine#518

Test Result

Submission Checklist

sudhu2k and others added 7 commits March 3, 2026 07:05
…e asset handling

- Added `uvicorn` to the Dockerfile dependencies.
- Replaced mamba installation method with `mamba-ssm`
- Included cloning and installation of the Emerging Optimizers repository in the Dockerfile.
- Enhanced the `download_and_extract_asset` function to ensure the assets directory is created if it doesn't exist.
- Updated unit tests to conditionally import and test features from Emerging Optimizers, skipping tests if the package is not available.
- Added pytest markers to ensure compatibility checks for specific versions in various tests.
…orBoard

- Introduced `cpu_offloading_num_layers` argument in the network size configuration.
- Added `log_batch_size_to_tensorboard` option in LoggerConfig to enable batch size logging.
- Fixed variable name in `download_and_extract_asset` function for clarity.
- Updated the logging condition in `HyperCommGrid` to check if the distributed environment is initialized before logging.
- Modified `cuda_graphs.py` to conditionally import `make_weak_ref` and set a flag for its availability.
- Adjusted tests for experimental logging to utilize a new utility function for rank checking, ensuring accurate log assertions across distributed ranks.
- Refactored CPU offloading arguments from the argument parser
- Added a condition to check if `num_experts` is not None before asserting the need for sequence parallelism
Co-authored-by: sudhu2k <Sudharshan.Govindan@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants