[DLRMv3] Reference Implementation #2410

LinjianMa · 2025-12-12T00:55:26Z

This PR introduces the reference implementation for DLDRMv3.
Instructions for running the benchmark are provided in the README. Please note: The dataset setup is still pending. The README will be updated with download instructions once the dataset becomes available.

github-actions · 2025-12-12T00:55:34Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

recommendation/dlrm_v3/README.md

recommendation/dlrm_v3/checkpoint.py

recommendation/dlrm_v3/datasets/dataset.py

recommendation/dlrm_v3/datasets/synthetic_streaming.py

recommendation/dlrm_v3/README.md

recommendation/dlrm_v3/main.py

recommendation/dlrm_v3/user.conf

nvzhihanj · 2025-12-16T04:30:34Z

@tanvi-mlcommons @pgmpablo157321 to take a look (can ignore the whole HSTU submodules) and help resolve the CLA issues for Linjian.
@zihaok to help review as well

LinjianMa · 2025-12-16T23:46:50Z

Hi @mrmhodak , could you help review this PR? All comments of @nvzhihanj have been addressed

nvzhihanj · 2025-12-17T23:47:42Z

recommendation/dlrm_v3/README.md

+```
+python accuracy.py --path path/to/mlperf_log_accuracy.json
+```
+We use normalized entropy (NE), accuracy, and AUC as the metrics to evaluate the model quality. The accuracy for the reference implementation evaluated on 34,996 requests across 10 inference timestamps are listed below:


Are we using all 3 scores for 99%, or just the AUC? (If it's just AUC please help point it out)
Also it seems like the loadgen/mlperf.conf is missing (for latency threshold and sample count)

Yes we need all 3 scores to be 99% close, I will clarify in the README. I will also add latency threshold to mlperf.conf.

Regarding sample count, could you explain when will we use it? I'm a bit confused as I think performance_sample_count_override

inference/loadgen/mlperf.conf

Lines 6 to 30 in 8999c4d

# Set performance_sample_count for each model.

# User can optionally set this to higher values in user.conf.

resnet50.*.performance_sample_count_override = 1024

ssd-mobilenet.*.performance_sample_count_override = 256

retinanet.*.performance_sample_count_override = 64

bert.*.performance_sample_count_override = 10833

dlrm.*.performance_sample_count_override = 204800

dlrm-v2.*.performance_sample_count_override = 204800

rnnt.*.performance_sample_count_override = 2513

gptj.*.performance_sample_count_override = 13368

mixtral-8x7b.*.performance_sample_count_override = 15000

llama2-70b.*.performance_sample_count_override = 24576

llama2-70b-interactive.*.performance_sample_count_override = 24576

llama3_1-405b.*.performance_sample_count_override = 8313

llama3_1-405b-interactive.*.performance_sample_count_override = 8313

llama3_1-8b.*.performance_sample_count_override = 13368

llama3_1-8b-edge.*.performance_sample_count_override = 5000

llama3_1-8b-interactive.*.performance_sample_count_override = 13368

stable-diffusion-xl.*.performance_sample_count_override = 5000

rgat.*.performance_sample_count_override = 788379

pointpainting.*.performance_sample_count_override = 1024

deepseek-r1.*.performance_sample_count_override = 4388

whisper.*.performance_sample_count_override = 1633

# set to 0 to let entire sample set to be performance sample

3d-unet.*.performance_sample_count_override = 0

are never used. In particular, performance_sample_count at https://github.com/mlcommons/inference/blob/8999c4d686f6e4a180da14597c97063fce7c9f33/loadgen/test_settings_internal.cc#L123C3-L123C27 are only used when performance_issue_unique is True, but I don't see a case in existing benchmarks (and also DLRMv3) that we need this flag to be True.
My understanding is that submitters are allowed to change the sample count, and as long as the minimum duration requirement for the benchmark is met, this should be acceptable.

zihaok · 2025-12-18T20:04:12Z

recommendation/dlrm_v3/utils.py

+            DLRMv3SyntheticStreamingDataset,
+            {
+                "ratings_file_prefix": os.path.join(
+                    new_path_prefix, "data/streaming-100b/sampled_data/"


suggest changing it to "sampled_data/", because the
"--dataset-path-prefix", is already pointing to the location where it contains sampled_data, so this data/streaming-100b/ is no longer a valid path without user self modifying

zihaok · 2025-12-18T20:06:52Z

recommendation/dlrm_v3/main.py

+        "--scenario-name", default="Server", choices={"Server", "Offline"}, help="inference benchmark scenario"
+    )
+    parser.add_argument(
+        "--batchsize", default=10, help="batch size used in the benchmark"


I did a test run of this ref implementation, it throws an error because there is no type enforcement in arg parser, python will parse all of them to string instead of actual datatype. Can you add them

besides this, accuracy successfully reproduced. Thanks!

LinjianMa requested a review from a team as a code owner December 12, 2025 00:55

LinjianMa force-pushed the master branch from 3a87c2d to dcfbf8a Compare December 12, 2025 02:31

nvzhihanj approved these changes Dec 16, 2025

View reviewed changes

dlrm_v3: squash all changes into one diff

38850a2

LinjianMa force-pushed the master branch from 760bde1 to 38850a2 Compare December 16, 2025 23:43

minor update

3927e91

nvzhihanj reviewed Dec 17, 2025

View reviewed changes

zihaok reviewed Dec 18, 2025

View reviewed changes

Merge branch 'master' into master

5263095

	# Set performance_sample_count for each model.
	# User can optionally set this to higher values in user.conf.
	resnet50.*.performance_sample_count_override = 1024
	ssd-mobilenet.*.performance_sample_count_override = 256
	retinanet.*.performance_sample_count_override = 64
	bert.*.performance_sample_count_override = 10833
	dlrm.*.performance_sample_count_override = 204800
	dlrm-v2.*.performance_sample_count_override = 204800
	rnnt.*.performance_sample_count_override = 2513
	gptj.*.performance_sample_count_override = 13368
	mixtral-8x7b.*.performance_sample_count_override = 15000
	llama2-70b.*.performance_sample_count_override = 24576
	llama2-70b-interactive.*.performance_sample_count_override = 24576
	llama3_1-405b.*.performance_sample_count_override = 8313
	llama3_1-405b-interactive.*.performance_sample_count_override = 8313
	llama3_1-8b.*.performance_sample_count_override = 13368
	llama3_1-8b-edge.*.performance_sample_count_override = 5000
	llama3_1-8b-interactive.*.performance_sample_count_override = 13368
	stable-diffusion-xl.*.performance_sample_count_override = 5000
	rgat.*.performance_sample_count_override = 788379
	pointpainting.*.performance_sample_count_override = 1024
	deepseek-r1.*.performance_sample_count_override = 4388
	whisper.*.performance_sample_count_override = 1633
	# set to 0 to let entire sample set to be performance sample
	3d-unet.*.performance_sample_count_override = 0

[DLRMv3] Reference Implementation #2410

Are you sure you want to change the base?

[DLRMv3] Reference Implementation #2410

Conversation

LinjianMa commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nvzhihanj commented Dec 16, 2025

Uh oh!

LinjianMa commented Dec 16, 2025

Uh oh!

nvzhihanj Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LinjianMa Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

zihaok Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

zihaok Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

zihaok Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 12, 2025 •

edited

Loading

nvzhihanj Dec 17, 2025 •

edited

Loading