-
Notifications
You must be signed in to change notification settings - Fork 597
[DLRMv3] Reference Implementation #2410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
@tanvi-mlcommons @pgmpablo157321 to take a look (can ignore the whole HSTU submodules) and help resolve the CLA issues for Linjian. |
|
Hi @mrmhodak , could you help review this PR? All comments of @nvzhihanj have been addressed |
| ``` | ||
| python accuracy.py --path path/to/mlperf_log_accuracy.json | ||
| ``` | ||
| We use normalized entropy (NE), accuracy, and AUC as the metrics to evaluate the model quality. The accuracy for the reference implementation evaluated on 34,996 requests across 10 inference timestamps are listed below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using all 3 scores for 99%, or just the AUC? (If it's just AUC please help point it out)
Also it seems like the loadgen/mlperf.conf is missing (for latency threshold and sample count)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need all 3 scores to be 99% close, I will clarify in the README. I will also add latency threshold to mlperf.conf.
Regarding sample count, could you explain when will we use it? I'm a bit confused as I think performance_sample_count_override
Lines 6 to 30 in 8999c4d
| # Set performance_sample_count for each model. | |
| # User can optionally set this to higher values in user.conf. | |
| resnet50.*.performance_sample_count_override = 1024 | |
| ssd-mobilenet.*.performance_sample_count_override = 256 | |
| retinanet.*.performance_sample_count_override = 64 | |
| bert.*.performance_sample_count_override = 10833 | |
| dlrm.*.performance_sample_count_override = 204800 | |
| dlrm-v2.*.performance_sample_count_override = 204800 | |
| rnnt.*.performance_sample_count_override = 2513 | |
| gptj.*.performance_sample_count_override = 13368 | |
| mixtral-8x7b.*.performance_sample_count_override = 15000 | |
| llama2-70b.*.performance_sample_count_override = 24576 | |
| llama2-70b-interactive.*.performance_sample_count_override = 24576 | |
| llama3_1-405b.*.performance_sample_count_override = 8313 | |
| llama3_1-405b-interactive.*.performance_sample_count_override = 8313 | |
| llama3_1-8b.*.performance_sample_count_override = 13368 | |
| llama3_1-8b-edge.*.performance_sample_count_override = 5000 | |
| llama3_1-8b-interactive.*.performance_sample_count_override = 13368 | |
| stable-diffusion-xl.*.performance_sample_count_override = 5000 | |
| rgat.*.performance_sample_count_override = 788379 | |
| pointpainting.*.performance_sample_count_override = 1024 | |
| deepseek-r1.*.performance_sample_count_override = 4388 | |
| whisper.*.performance_sample_count_override = 1633 | |
| # set to 0 to let entire sample set to be performance sample | |
| 3d-unet.*.performance_sample_count_override = 0 |
My understanding is that submitters are allowed to change the sample count, and as long as the minimum duration requirement for the benchmark is met, this should be acceptable.
| DLRMv3SyntheticStreamingDataset, | ||
| { | ||
| "ratings_file_prefix": os.path.join( | ||
| new_path_prefix, "data/streaming-100b/sampled_data/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest changing it to "sampled_data/", because the
"--dataset-path-prefix", is already pointing to the location where it contains sampled_data, so this data/streaming-100b/ is no longer a valid path without user self modifying
| "--scenario-name", default="Server", choices={"Server", "Offline"}, help="inference benchmark scenario" | ||
| ) | ||
| parser.add_argument( | ||
| "--batchsize", default=10, help="batch size used in the benchmark" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a test run of this ref implementation, it throws an error because there is no type enforcement in arg parser, python will parse all of them to string instead of actual datatype. Can you add them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides this, accuracy successfully reproduced. Thanks!
This PR introduces the reference implementation for DLDRMv3.
Instructions for running the benchmark are provided in the README. Please note: The dataset setup is still pending. The README will be updated with download instructions once the dataset becomes available.