-
Notifications
You must be signed in to change notification settings - Fork 598
[GPT-OSS-120B] Reference implementation #2395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
v-shobhit
wants to merge
172
commits into
mlcommons:master
Choose a base branch
from
v-shobhit:gptoss-loadgen
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
172 commits
Select commit
Hold shift + click to select a range
9b59d2e
[Automated Commit] Format Codebase
mlcommons-bot fe51c12
Merge branch 'mlcommons:master' into master
v-shobhit 1f2666c
[Automated Commit] Format Codebase
github-actions[bot] f9c4f61
initial
v-shobhit db9d25e
[Automated Commit] Format Codebase
github-actions[bot] 9daa72c
json fixes
v-shobhit 2d0a179
updates, tokenizer
v-shobhit c8e679d
fix padding
v-shobhit 50453c6
concurrent requests
v-shobhit e76b68d
increase timeout
v-shobhit b6d5671
[Automated Commit] Format Codebase
github-actions[bot] 4fd4f56
add refactor changes
v-shobhit 1df0885
[Automated Commit] Format Codebase
github-actions[bot] eb2f48c
rm truncation
v-shobhit 4f35b8a
rm truncation, wait for server ready
v-shobhit 354cb62
left padding
v-shobhit 75f4307
[Automated Commit] Format Codebase
github-actions[bot] 065bf7c
fixes
v-shobhit 36aa581
add failure check
v-shobhit c75e629
change opts
v-shobhit 040e986
organize files
v-shobhit 991ff7e
rm submodule
v-shobhit 492847a
add infer stuff
v-shobhit f3a3282
add harmonize-tokens.py
v-shobhit 8596b13
move things
v-shobhit 1db2f96
[Automated Commit] Format Codebase
github-actions[bot] 8c08778
add README
v-shobhit 168f210
fix name
v-shobhit 13775b1
add commands
v-shobhit 37c646d
update README
v-shobhit 9cd0c61
accepts output_ids and detokenize
v-shobhit 6abf8fa
[Automated Commit] Format Codebase
github-actions[bot] 8fe8712
add plotter
v-shobhit 889db8c
fix sampling
v-shobhit 381cf60
add reasoning effort option
v-shobhit 209e923
[Automated Commit] Format Codebase
github-actions[bot] fdc518d
doc option
v-shobhit f73da58
draw input histograms
v-shobhit 4219a77
[Automated Commit] Format Codebase
github-actions[bot] 782f066
updates
v-shobhit 326a5fa
[Automated Commit] Format Codebase
github-actions[bot] 1b99263
add more opts
v-shobhit 579ef1c
move
v-shobhit ec8d12e
[Automated Commit] Format Codebase
github-actions[bot] a9de6f4
refactor
v-shobhit 387779c
updates
v-shobhit bde219f
rename opt
v-shobhit 24ef1e0
updates
v-shobhit 141dd8d
[Automated Commit] Format Codebase
github-actions[bot] 684e27a
add healthbench prompt creation
v-shobhit f5b04db
[Automated Commit] Format Codebase
github-actions[bot] d44256e
add healthbench eval
v-shobhit ff9133b
[Automated Commit] Format Codebase
github-actions[bot] 5f1fd8e
add scripts to fetch datasets
v-shobhit ca654c8
[Automated Commit] Format Codebase
github-actions[bot] 8c59f03
update requirements
v-shobhit 108ea9d
add setup enroot script
v-shobhit f302c7c
add changes
v-shobhit 534390c
[Automated Commit] Format Codebase
github-actions[bot] 78bf971
add symlinks to gitmodules
v-shobhit 115f498
add fetch_lcb.py
v-shobhit 9571c11
[Automated Commit] Format Codebase
github-actions[bot] f8244ee
updates
v-shobhit 531c37c
add pass@k; add spec decode option
v-shobhit 5e86d65
add openai client; add pass@k
v-shobhit 8c02839
[Automated Commit] Format Codebase
github-actions[bot] be4109e
restrict lcb to v5
v-shobhit c0b9ef3
[Automated Commit] Format Codebase
github-actions[bot] dc54e98
lcb optimizations
v-shobhit 41cac5a
remove openai client
v-shobhit 06a5387
[Automated Commit] Format Codebase
github-actions[bot] d927fb3
rename
v-shobhit c2fda5e
remove mmlu, healthbench
v-shobhit c5c389b
add fetch_all.py
v-shobhit 341c750
updates
v-shobhit ef142a6
add preprocess
v-shobhit 834175f
update README
v-shobhit e447081
[Automated Commit] Format Codebase
github-actions[bot] a1e668a
add top-p option
v-shobhit 3db5bc2
add summarize_eval
v-shobhit 3ca2cc1
[Automated Commit] Format Codebase
github-actions[bot] 09cd3f7
add trtllm infer script
v-shobhit 1aad396
fixes
v-shobhit a6e05f2
add round-robin for multi-dp
v-shobhit a2936a6
fix timeout issues
v-shobhit 188411c
[Automated Commit] Format Codebase
github-actions[bot] b324d7d
add anthropic
v-shobhit f8a9f43
add stuff
v-shobhit 8429244
optimize lcb multi-pass
v-shobhit d1f2794
[Automated Commit] Format Codebase
github-actions[bot] ee33969
rm healthbench
v-shobhit 20f8916
[Automated Commit] Format Codebase
github-actions[bot] 1e67e0c
lcb bug fixes
v-shobhit 96c90e1
[Automated Commit] Format Codebase
github-actions[bot] e6d9c67
omit top-k if 0
v-shobhit ff892bb
[Automated Commit] Format Codebase
github-actions[bot] e4043d2
add changes and plotting scripts
v-shobhit 1714c09
[Automated Commit] Format Codebase
github-actions[bot] 19fcc80
add overall number
v-shobhit 6cb7698
[Automated Commit] Format Codebase
github-actions[bot] b281727
add glob matching
v-shobhit 9a0c45a
rename
v-shobhit 35ea0e4
add pubmed tokenization
v-shobhit 1c73423
updates
v-shobhit 9a9194f
add tentative gpt-oss fields
v-shobhit 1812c04
remove data dir
v-shobhit 16af1bf
create preprocess module
v-shobhit 9b4a84c
move things to archive
v-shobhit 9450d46
[Automated Commit] Format Codebase
github-actions[bot] 319e5f7
rm unused scripts
v-shobhit 4d89b98
rm unused
v-shobhit be02519
mv things
v-shobhit 18a8444
add mlperf artifacts
v-shobhit 199f476
add mlperf artifacts
v-shobhit 425ce75
[Automated Commit] Format Codebase
github-actions[bot] 7cdc7cb
add utils, gitignore
v-shobhit ab90695
[Automated Commit] Format Codebase
github-actions[bot] 62e4d47
update README
v-shobhit 292c49d
fix request pool size
v-shobhit 98585b8
[Automated Commit] Format Codebase
github-actions[bot] 21a8034
add setup
v-shobhit 734d8f4
updates
v-shobhit e3e22b8
server scenario fix; gpt-oss -> gpt-oss-120b
v-shobhit e40a7da
add fixes
v-shobhit 382fc9e
add accuracy eval script for mlperf
v-shobhit 31f435a
finishing touches
v-shobhit 63592a3
[Automated Commit] Format Codebase
github-actions[bot] a41f882
refactor mode -> scenario
v-shobhit b2bc9e0
Merge branch 'mlcommons:master' into gptoss-loadgen
v-shobhit 81f6ca5
add eval_perf script
v-shobhit f780189
[Automated Commit] Format Codebase
github-actions[bot] 7f47e5e
add pass@k to acc eval
v-shobhit d3a7b58
add repeats_per_sample option to loadgen
v-shobhit 60976f2
Merge branch 'loadgen-repeat-samples' into gptoss-loadgen
v-shobhit 50051f2
[Automated Commit] Format Codebase
github-actions[bot] bee73b2
fix harmonize tokens -> text
v-shobhit 5039fd6
[Automated Commit] Format Codebase
github-actions[bot] db4d290
remove file
v-shobhit dbb0fd9
fix prompt of summarization
v-shobhit 57c6dae
move stuff to sglang
v-shobhit da35468
allow use of parquet
v-shobhit 72cd475
[Automated Commit] Format Codebase
github-actions[bot] 724502b
Merge branch 'master' into gptoss-loadgen
anandhu-eng 7923249
fix scores for pass@1 with k repeats
v-shobhit 957c53d
add extra-args option
v-shobhit 44f662b
[Automated Commit] Format Codebase
github-actions[bot] a45344b
Update user.conf
v-shobhit 100903b
updates to use v4
v-shobhit a6041d8
[Automated Commit] Format Codebase
github-actions[bot] dc699a2
Merge branch 'mlcommons:master' into gptoss-loadgen
v-shobhit bebb328
remove loadgen changes for repeats
v-shobhit 373d57d
gpt-oss -> gpt-oss-120b
v-shobhit 1a5fda6
[Automated Commit] Format Codebase
github-actions[bot] 999fc89
update README
v-shobhit 4983645
remove archive
v-shobhit 34a8c74
update frozen requirements
v-shobhit 02206f7
rm harmonize script + fix score calculation
v-shobhit de831b3
add percentage
v-shobhit 65b71ad
[Automated Commit] Format Codebase
github-actions[bot] 8bebe02
empty commit to trigger CLA
v-shobhit 6346113
remove comments
v-shobhit 71f2a83
add gptoss placeholder values
v-shobhit 8e83c68
rm gpt-oss fields
v-shobhit 9dfb9a5
update user.conf
v-shobhit 715f063
add generation_config.json
v-shobhit 7319e82
add docker command
v-shobhit 0a95e40
add better parsing and check for harmony tokens
v-shobhit 9a4414e
Merge pull request #4 from v-shobhit/gptoss-fix-eval
v-shobhit 6fa49dc
[Automated Commit] Format Codebase
github-actions[bot] 1950b13
add exact_match log for submission_checker
v-shobhit 06b681a
Merge branch 'master' into gptoss-loadgen
arjunsuresh b286618
empty commit to trigger test
v-shobhit File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| *venv* | ||
| *.pkl | ||
| *.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,154 @@ | ||
| # MLPerf Inference reference implementation for GPT-OSS-120B | ||
| This is the reference implementation for GPT-OSS-120B. This is a proposal and is a WIP. | ||
|
|
||
| ## Model and Dataset download | ||
|
|
||
| #### TODO: Replace this with mlc download link when available | ||
|
|
||
| * Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a) | ||
| * Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset** | ||
v-shobhit marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility. | ||
v-shobhit marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Environment setup | ||
| Work on reference implementation is done using the sglang containers at [https://hub.docker.com/r/lmsysorg/sglang/tags](https://hub.docker.com/r/lmsysorg/sglang/tags). For enroot setup, a script is provided under [`setup_enroot.sh`](./setup_enroot.sh). For all sections below, we shall assume this environment is instantiated. | ||
|
|
||
| Once in the environment, install additional requirements using [`setup.sh`](./setup.sh): | ||
| ```bash | ||
| ./setup.sh | ||
| ``` | ||
|
|
||
| ## Running the reference implementation: SGLang | ||
| Use [`./sglang/run_server.sh`](./sglang/run_server.sh) to launch an SGLang server hosting `gpt-oss-120b`. | ||
|
|
||
| ### Run the server | ||
| ```bash | ||
| ./run_server.sh \ | ||
| --model_path path/to/gpt-oss-120b/model \ | ||
| --dp N \ | ||
v-shobhit marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| --stream_interval 100 \ | ||
| --eagle_path optional/path/to/eagle/head | ||
| ``` | ||
| The script uses `python3 -m sglang.launch_server` tp instantiate the model, with `tp=pp=ep=1`, and `dp` as specified. | ||
v-shobhit marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| You may also use docker: | ||
| ```bash | ||
| docker run --runtime nvidia --gpus all --net host \ | ||
| -v ${HF_HOME}:/root/.cache/huggingface \ | ||
| --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \ | ||
| --ipc=host lmsysorg/sglang:latest \ | ||
| python3 -m sglang.launch_server --model-path ${MODEL_NAME} \ | ||
| --host 0.0.0.0 --port 3000 --data-parallel-size=1 --max-running-requests 512 \ | ||
| --mem-fraction-static 0.85 --chunked-prefill-size 16384 --ep-size=1 \ | ||
| --enable-metrics --stream-interval 500 | ||
| ``` | ||
|
|
||
| Then, run a benchmark script that uses the client to send/recv requests. | ||
| ### Run the inference | ||
|
|
||
| **Note:** All scripts now support both Parquet (`.parquet`) and Pickle (`.pkl`) formats for dataset files. Parquet is recommended as it offers: | ||
| - 50% smaller file size | ||
| - Faster loading times | ||
| - Cross-language compatibility | ||
| - Type-safe schema preservation | ||
|
|
||
| Example usage: | ||
| ```bash | ||
| # first, install loadgen | ||
| pip install $(git rev-parse --show-toplevel)/loadgen | ||
|
|
||
| # Using Parquet format (recommended) | ||
| python3 run_mlperf.py \ | ||
v-shobhit marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| --scenario offline \ | ||
| --input-file /path/to/dataset.parquet \ | ||
| --accuracy | ||
|
|
||
| # Using Pickle format (backward compatible) | ||
| python3 run_mlperf.py \ | ||
| --scenario offline \ | ||
| --input-file /path/to/dataset.pkl \ | ||
| --accuracy | ||
| ``` | ||
|
|
||
| Full command-line options: | ||
| ```bash | ||
| python3 run_mlperf.py --help | ||
| usage: run_mlperf.py [-h] [--scenario {offline,server}] --input-file INPUT_FILE [--max-samples MAX_SAMPLES] [--mlperf-conf MLPERF_CONF] | ||
| [--user-conf USER_CONF] [--accuracy] [--output-dir OUTPUT_DIR] [--backend {sglang}] [--server-url SERVER_URL] | ||
| [--generation-config GENERATION_CONFIG] [--max-new-tokens MAX_NEW_TOKENS] [--num-workers NUM_WORKERS] | ||
| [--max-concurrency MAX_CONCURRENCY] | ||
|
|
||
| Run MLPerf inference benchmarks for gpt-oss | ||
|
|
||
| options: | ||
| -h, --help show this help message and exit | ||
| --scenario {offline,server} | ||
| MLPerf scenario mode | ||
| --input-file INPUT_FILE | ||
| Path to tokenized dataset (parquet or pickle file) | ||
| --max-samples MAX_SAMPLES | ||
| Maximum number of samples to use (None for all) | ||
| --mlperf-conf MLPERF_CONF | ||
| Path to MLPerf configuration file | ||
| --user-conf USER_CONF | ||
| Path to user configuration file | ||
| --accuracy Run accuracy mode instead of performance | ||
| --output-dir OUTPUT_DIR | ||
| Directory for MLPerf output logs | ||
| --backend {sglang} Backend to use for inference | ||
| --server-url SERVER_URL | ||
| Server URL for backend (SGLang) | ||
| --generation-config GENERATION_CONFIG | ||
| Path to generation configuration JSON file | ||
| --max-new-tokens MAX_NEW_TOKENS | ||
| Override max_new_tokens from generation config (default: use value from config) | ||
| --num-workers NUM_WORKERS | ||
| Number of worker threads (for server scenario) | ||
| --max-concurrency MAX_CONCURRENCY | ||
| Maximum concurrent requests to backend (SGLang handles batching internally) | ||
|
|
||
| ``` | ||
|
|
||
| ### Evaluate the accuracy | ||
| Run `run_mlperf.py` with `--accuracy`, and then use the generated `mlperf_log_accuracy.json` to evaluate the accuracy of the run. | ||
|
|
||
| Example usage: | ||
| ```bash | ||
| # Using Parquet format (recommended) | ||
| python3 eval_mlperf_accuracy.py \ | ||
| --mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \ | ||
| --reference-data /path/to/acc_eval_inputs.parquet \ | ||
| --tokenizer openai/gpt-oss-120b | ||
|
|
||
| # Using Pickle format (backward compatible) | ||
| python3 eval_mlperf_accuracy.py \ | ||
| --mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \ | ||
| --reference-data /path/to/acc_eval_inputs.pkl \ | ||
| --tokenizer openai/gpt-oss-120b | ||
| ``` | ||
|
|
||
| Full command-line options: | ||
| ```bash | ||
| python3 eval_mlperf_accuracy.py --help | ||
| usage: eval_mlperf_accuracy.py [-h] --mlperf-log MLPERF_LOG --reference-data REFERENCE_DATA [--tokenizer TOKENIZER] [--output-file OUTPUT_FILE] | ||
| [--save-outputs SAVE_OUTPUTS] [--num-lcb-workers NUM_LCB_WORKERS] [--verbose] | ||
|
|
||
| Evaluate MLPerf accuracy logs for gpt-oss-120b | ||
|
|
||
| options: | ||
| -h, --help show this help message and exit | ||
| --mlperf-log MLPERF_LOG | ||
| Path to mlperf_log_accuracy.json | ||
| --reference-data REFERENCE_DATA | ||
| Path to reference parquet or pickle file (DataFrame with dataset, ground_truth, etc.) | ||
| --tokenizer TOKENIZER | ||
| HuggingFace tokenizer name or path | ||
| --output-file OUTPUT_FILE | ||
| Output JSON file for results (optional) | ||
| --save-outputs SAVE_OUTPUTS | ||
| Save detokenized outputs to pickle file (ordered by qsl_idx) for debugging | ||
| --num-lcb-workers NUM_LCB_WORKERS | ||
| Number of parallel workers for LiveCodeBench evaluation (default: 64) | ||
| --verbose Verbose logging | ||
|
|
||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| #!/usr/bin/env python3 | ||
| """Backend implementations for gpt-oss inference.""" | ||
|
|
||
| from .base_backend import BaseBackend | ||
| from .sglang_backend import SGLangBackend | ||
|
|
||
| __all__ = [ | ||
| "BaseBackend", | ||
| "SGLangBackend", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| #!/usr/bin/env python3 | ||
| """Base backend class for gpt-oss inference.""" | ||
|
|
||
| import abc | ||
| import logging | ||
| from typing import List, Dict, Any, Optional | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class BaseBackend(abc.ABC): | ||
| """Abstract base class for inference backends. | ||
|
|
||
| All backends must implement this interface to work with the MLPerf SUT. | ||
| """ | ||
|
|
||
| def __init__(self, config: Optional[Dict[str, Any]] = None): | ||
| """Initialize the backend. | ||
|
|
||
| Args: | ||
| config: Optional configuration dictionary | ||
| """ | ||
| self.config = config or {} | ||
| self.initialized = False | ||
| logger.info(f"Initializing {self.__class__.__name__}") | ||
|
|
||
| @abc.abstractmethod | ||
| def initialize(self) -> None: | ||
| """Initialize the backend (load model, connect to server, etc.).""" | ||
| raise NotImplementedError("Subclasses must implement initialize()") | ||
|
|
||
| @abc.abstractmethod | ||
| def generate( | ||
| self, | ||
| prompts: List[List[int]], | ||
| max_tokens: int = 100, | ||
| temperature: float = 0.001, | ||
| top_k: int = 1, | ||
| top_p: float = 1.0, | ||
| **kwargs | ||
| ) -> List[Dict[str, Any]]: | ||
| """Generate responses for a batch of prompts. | ||
|
|
||
| Args: | ||
| prompts: List of token ID sequences | ||
| max_tokens: Maximum tokens to generate per prompt | ||
| temperature: Sampling temperature | ||
| top_k: Top-k sampling parameter | ||
| top_p: Top-p (nucleus) sampling parameter | ||
| **kwargs: Additional backend-specific parameters | ||
|
|
||
| Returns: | ||
| List of response dictionaries with keys: | ||
| - output_ids: List of generated token IDs | ||
| - output_text: Generated text (optional) | ||
| - metadata: Additional metadata (latencies, etc.) | ||
| """ | ||
| raise NotImplementedError("Subclasses must implement generate()") | ||
|
|
||
| @abc.abstractmethod | ||
| def cleanup(self) -> None: | ||
| """Clean up backend resources.""" | ||
| raise NotImplementedError("Subclasses must implement cleanup()") | ||
|
|
||
| def __enter__(self): | ||
| """Context manager entry.""" | ||
| self.initialize() | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc_val, exc_tb): | ||
| """Context manager exit.""" | ||
| self.cleanup() | ||
|
|
||
| @property | ||
| def is_initialized(self) -> bool: | ||
| """Check if backend is initialized.""" | ||
| return self.initialized |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.