Skip to content

Commit 38850a2

Browse files
committed
dlrm_v3: squash all changes into one diff
1 parent 8999c4d commit 38850a2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+21985
-0
lines changed

recommendation/dlrm_v3/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# MLPerf Inference reference implementation for DLRMv3
2+
3+
## Install dependencies and build loadgen
4+
5+
The reference implementation has been tested on a single host, with x86_64 CPUs and 8 NVIDIA H100/B200 GPUs. Dependencies can be installed below,
6+
```
7+
sh setup.sh
8+
```
9+
10+
## Dataset download
11+
12+
DLRMv3 uses a synthetic dataset specifically designed to match the model and system characteristics of large-scale sequential recommendation (large item set and long average sequence length for each request). To generate the dataset used for both training and inference, run
13+
```
14+
python streaming_synthetic_data.py
15+
```
16+
The generated dataset has 2TB size, and contains 5 million users interacting with a billion items over 100 timestamps.
17+
18+
Only 1% of the dataset is used in the inference benchmark. The sampled DLRMv3 dataset and trained checkpoint are available at https://inference.mlcommons-storage.org/.
19+
20+
Script to download the sampled dataset used in inference benchmark:
21+
```
22+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/dlrm-v3-dataset.uri
23+
```
24+
Script to download the 1TB trained checkpoint:
25+
```
26+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/dlrm-v3-checkpoint.uri
27+
```
28+
29+
## Inference benchmark
30+
31+
```
32+
WORLD_SIZE=8 python main.py --dataset sampled-streaming-100b
33+
```
34+
35+
`WORLD_SIZE` is the number of GPUs used in the inference benchmark.
36+
37+
```
38+
usage: main.py [-h] [--dataset {streaming-100b,sampled-streaming-100b}] [--model-path MODEL_PATH] [--scenario-name {Server,Offline}] [--batchsize BATCHSIZE]
39+
[--output-trace OUTPUT_TRACE] [--data-producer-threads DATA_PRODUCER_THREADS] [--compute-eval COMPUTE_EVAL] [--find-peak-performance FIND_PEAK_PERFORMANCE]
40+
[--dataset-path-prefix DATASET_PATH_PREFIX] [--warmup-ratio WARMUP_RATIO] [--num-queries NUM_QUERIES] [--target-qps TARGET_QPS] [--numpy-rand-seed NUMPY_RAND_SEED]
41+
[--sparse-quant SPARSE_QUANT] [--dataset-percentage DATASET_PERCENTAGE]
42+
43+
options:
44+
-h, --help show this help message and exit
45+
--dataset {streaming-100b,sampled-streaming-100b}
46+
name of the dataset
47+
--model-path MODEL_PATH
48+
path to the model checkpoint. Example: /home/username/ckpts/streaming_100b/89/
49+
--scenario-name {Server,Offline}
50+
inference benchmark scenario
51+
--batchsize BATCHSIZE
52+
batch size used in the benchmark
53+
--output-trace OUTPUT_TRACE
54+
Whether to output trace
55+
--data-producer-threads DATA_PRODUCER_THREADS
56+
Number of threads used in data producer
57+
--compute-eval COMPUTE_EVAL
58+
If true, will run AccuracyOnly mode and outputs both predictions and labels for accuracy calcuations
59+
--find-peak-performance FIND_PEAK_PERFORMANCE
60+
Whether to find peak performance in the benchmark
61+
--dataset-path-prefix DATASET_PATH_PREFIX
62+
Prefix to the dataset path. Example: /home/username/
63+
--warmup-ratio WARMUP_RATIO
64+
The ratio of the dataset used to warmup SUT
65+
--num-queries NUM_QUERIES
66+
Number of queries to run in the benchmark
67+
--target-qps TARGET_QPS
68+
Benchmark target QPS. Needs to be tuned for different implementations to balance latency and throughput
69+
--numpy-rand-seed NUMPY_RAND_SEED
70+
Numpy random seed
71+
--sparse-quant SPARSE_QUANT
72+
Whether to quantize sparse arch
73+
--dataset-percentage DATASET_PERCENTAGE
74+
Percentage of the dataset to run in the benchmark
75+
```
76+
77+
## Accuracy test
78+
79+
Set `run.compute_eval` will run the accuracy test and dump prediction outputs in
80+
`mlperf_log_accuracy.json`. To check the accuracy, run
81+
82+
```
83+
python accuracy.py --path path/to/mlperf_log_accuracy.json
84+
```
85+
We use normalized entropy (NE), accuracy, and AUC as the metrics to evaluate the model quality. The accuracy for the reference implementation evaluated on 34,996 requests across 10 inference timestamps are listed below:
86+
```
87+
NE: 86.687%
88+
Accuracy: 69.651%
89+
AUC: 78.663%
90+
```

recommendation/dlrm_v3/accuracy.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# pyre-strict
16+
"""
17+
Tool to calculate accuracy for loadgen accuracy output found in mlperf_log_accuracy.json
18+
"""
19+
20+
import argparse
21+
import json
22+
import logging
23+
24+
import numpy as np
25+
import torch
26+
from configs import get_hstu_configs
27+
from utils import MetricsLogger
28+
29+
logger: logging.Logger = logging.getLogger("main")
30+
31+
32+
def get_args() -> argparse.Namespace:
33+
"""Parse commandline."""
34+
parser = argparse.ArgumentParser()
35+
parser.add_argument(
36+
"--path",
37+
required=True,
38+
help="path to mlperf_log_accuracy.json",
39+
)
40+
args = parser.parse_args()
41+
return args
42+
43+
44+
def main() -> None:
45+
"""
46+
Main function to calculate accuracy metrics from loadgen output.
47+
48+
Reads the mlperf_log_accuracy.json file, parses the results, and computes
49+
accuracy metrics using the MetricsLogger. Each result entry contains
50+
predictions, labels, and weights packed as float32 numpy arrays.
51+
"""
52+
args = get_args()
53+
logger.warning("Parsing loadgen accuracy log...")
54+
with open(args.path, "r") as f:
55+
results = json.load(f)
56+
hstu_config = get_hstu_configs(dataset="sampled-streaming-100b")
57+
metrics = MetricsLogger(
58+
multitask_configs=hstu_config.multitask_configs,
59+
batch_size=1,
60+
window_size=3000,
61+
device=torch.device("cpu"),
62+
rank=0,
63+
)
64+
logger.warning(f"results have {len(results)} entries")
65+
for result in results:
66+
data = np.frombuffer(bytes.fromhex(result["data"]), np.float32)
67+
num_candidates = data[-1].astype(int)
68+
assert len(data) == 1 + num_candidates * 3
69+
mt_target_preds = torch.from_numpy(data[0:num_candidates])
70+
mt_target_labels = torch.from_numpy(data[num_candidates : num_candidates * 2])
71+
mt_target_weights = torch.from_numpy(
72+
data[num_candidates * 2 : num_candidates * 3]
73+
)
74+
num_candidates = torch.tensor([num_candidates])
75+
metrics.update(
76+
predictions=mt_target_preds.view(1, -1),
77+
labels=mt_target_labels.view(1, -1),
78+
weights=mt_target_weights.view(1, -1),
79+
num_candidates=num_candidates,
80+
)
81+
for k, v in metrics.compute().items():
82+
logger.warning(f"{k}: {v}")
83+
84+
85+
if __name__ == "__main__":
86+
main()

0 commit comments

Comments
 (0)