Skip to content

Commit 4b2ad51

Browse files
Dcperf mini (First version of Feedsim mini) : Add graph storage/loading optimization and eliminate per-thread graph building (#201)
Summary: Pull Request resolved: #201 This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities, eliminating redundant graph building across multiple thread runs, and replacing fixed sleep time with checking for server readiness. **Key changes:** 1. **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`): * Added `-S` flag to store generated graphs to a file for reuse across instances * Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread * Added `-I` flag to enable instrumenting graph generation * Enhanced help documentation to explain the new optimization options * Updated command line parsing to handle the new flags and pass them through to the underlying executables 2. **Command line options** (`LeafNodeRankCmdline.ggo`): * Added `store_graph` option to enable saving generated graphs to a specified file * Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones * Added `instrument_graph` option to enable measuring the time for graph generation 3. **Performance optimizations:** * Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version. This also optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads * Reduces benchmark initialization time by replacing the fixed sleep time with checking for server readiness Reviewed By: excelle08 Differential Revision: D80288337 fbshipit-source-id: 9b1fc935d3c3106e44dd8ef3238b78f953e1e58a
1 parent d16095a commit 4b2ad51

File tree

7 files changed

+377
-9
lines changed

7 files changed

+377
-9
lines changed

benchpress/config/jobs.yml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,36 @@
461461
- 'benchmarks/feedsim/feedsim-multi-inst-*.log'
462462
- 'benchmarks/feedsim/src/perf.data'
463463

464+
- name: feedsim_autoscale_mini
465+
benchmark: feedsim_autoscale
466+
description: >
467+
Aggregator like workload. Latency sensitive.
468+
The feedsim_autoscale mini benchmark jobs
469+
are configured with a fixed QPS.
470+
args:
471+
- '-n {num_instances}'
472+
- '-q {fixed_qps}'
473+
- '-d {fixed_qps_duration}'
474+
- '-w {warmup_time}'
475+
- '-S {graph_store_path}'
476+
- '-L {graph_load_path}'
477+
- '{extra_args}'
478+
vars:
479+
- 'num_instances=-1'
480+
- 'fixed_qps=100'
481+
- 'fixed_qps_duration=300'
482+
- 'warmup_time=120'
483+
- 'graph_store_path=default_do_not_store'
484+
- 'graph_load_path=default_do_not_load'
485+
- 'extra_args='
486+
hooks:
487+
- hook: copymove
488+
options:
489+
is_move: true
490+
after:
491+
- 'benchmarks/feedsim/feedsim_results*.txt'
492+
- 'benchmarks/feedsim/feedsim-multi-inst-*.log'
493+
- 'benchmarks/feedsim/src/perf.data'
464494

465495
- name: feedsim_autoscale_arm
466496
benchmark: feedsim_autoscale
@@ -492,6 +522,38 @@
492522
- 'benchmarks/feedsim/feedsim-multi-inst-*.log'
493523
- 'benchmarks/feedsim/src/perf.data'
494524

525+
- name: feedsim_autoscale_arm_mini
526+
benchmark: feedsim_autoscale
527+
description: >
528+
Aggregator like workload. Latency sensitive.
529+
The feedsim_autoscale mini benchmark jobs
530+
are configured with a fixed QPS.
531+
Parameters tuned for arm.
532+
args:
533+
- '-n {num_instances}'
534+
- '-i {icache_iterations}'
535+
- '-q {fixed_qps}'
536+
- '-d {fixed_qps_duration}'
537+
- '-w {warmup_time}'
538+
- '{extra_args}'
539+
vars:
540+
- 'num_instances=-1'
541+
- 'icache_iterations=400000'
542+
- 'fixed_qps=100'
543+
- 'fixed_qps_duration=300'
544+
- 'warmup_time=120'
545+
- 'graph_store_path=default_do_not_store'
546+
- 'graph_load_path=default_do_not_load'
547+
- 'extra_args='
548+
hooks:
549+
- hook: copymove
550+
options:
551+
is_move: true
552+
after:
553+
- 'benchmarks/feedsim/feedsim_results*.txt'
554+
- 'benchmarks/feedsim/feedsim-multi-inst-*.log'
555+
- 'benchmarks/feedsim/src/perf.data'
556+
495557

496558
- benchmark: spark_standalone
497559
name: spark_standalone_local

packages/feedsim/run-feedsim-multi.sh

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,30 @@ NUM_INSTANCES="$(( ( NCPU + 99 ) / 100 ))"
2121

2222
NUM_ICACHE_ITERATIONS="1600000"
2323

24+
show_help() {
25+
cat <<EOF
26+
Usage: ${0##*/} [OPTION]...
27+
28+
-h Display this help and exit
29+
-n Number of parallel instances to run. Default: $(( ( NCPU + 99 ) / 100 ))
30+
-i Number of icache iterations to use. Default: 1600000
31+
-S Store the generated graph to a file (requires a file path)
32+
-L Load a graph from a file instead of generating one (requires a file path)
33+
-I Enable timing instrumentation for graph operations (build, store, load)
34+
35+
Any remaining arguments are passed to run.sh
36+
37+
EOF
38+
}
39+
2440
SCRIPT_NAME="$(basename "$0")"
2541
echo "${SCRIPT_NAME}: DCPERF_PERF_RECORD=${DCPERF_PERF_RECORD}"
2642

43+
# Initialize variables for graph storage and loading
44+
STORE_GRAPH=""
45+
LOAD_GRAPH=""
46+
INSTRUMENT_GRAPH=""
47+
2748
while [ $# -ne 0 ]; do
2849
case $1 in
2950
-n)
@@ -34,6 +55,15 @@ while [ $# -ne 0 ]; do
3455
-i)
3556
NUM_ICACHE_ITERATIONS="$2"
3657
;;
58+
-S)
59+
STORE_GRAPH="-S $2"
60+
;;
61+
-L)
62+
LOAD_GRAPH="-L $2"
63+
;;
64+
-I)
65+
INSTRUMENT_GRAPH="-I"
66+
;;
3767
-h|--help)
3868
show_help >&2
3969
exit 1
@@ -43,7 +73,7 @@ while [ $# -ne 0 ]; do
4373
esac
4474

4575
case $1 in
46-
-n|-i)
76+
-n|-i|-S|-L)
4777
if [ -z "$2" ]; then
4878
echo "Invalid option: '$1' requires an argument" 1>&2
4979
exit 1
@@ -99,10 +129,10 @@ echo > $BREPS_LFILE
99129
# shellcheck disable=SC2086
100130
for i in $(seq 1 ${NUM_INSTANCES}); do
101131
CORE_RANGE="$(get_cpu_range "${NUM_INSTANCES}" "$((i - 1))")"
102-
CMD="IS_AUTOSCALE_RUN=${NUM_INSTANCES} taskset --cpu-list ${CORE_RANGE} ${FEEDSIM_ROOT}/run.sh -p ${PORT} -i ${NUM_ICACHE_ITERATIONS} -o feedsim_results_${FIXQPS_SUFFIX}${i}.txt $*"
132+
CMD="IS_AUTOSCALE_RUN=${NUM_INSTANCES} taskset --cpu-list ${CORE_RANGE} ${FEEDSIM_ROOT}/run.sh -p ${PORT} -i ${NUM_ICACHE_ITERATIONS} -o feedsim_results_${FIXQPS_SUFFIX}${i}.txt ${STORE_GRAPH} ${LOAD_GRAPH} ${INSTRUMENT_GRAPH} $*"
103133
echo "$CMD" > "${FEEDSIM_LOG_PREFIX}${i}.log"
104134
# shellcheck disable=SC2068,SC2069
105-
IS_AUTOSCALE_RUN=${NUM_INSTANCES} stdbuf -i0 -o0 -e0 taskset --cpu-list "${CORE_RANGE}" "${FEEDSIM_ROOT}"/run.sh -p "${PORT}" -i "${NUM_ICACHE_ITERATIONS}" -o "feedsim_results_${FIXQPS_SUFFIX}${i}.txt" $@ 2>&1 > "${FEEDSIM_LOG_PREFIX}${i}.log" &
135+
IS_AUTOSCALE_RUN=${NUM_INSTANCES} stdbuf -i0 -o0 -e0 taskset --cpu-list "${CORE_RANGE}" "${FEEDSIM_ROOT}"/run.sh -p "${PORT}" -i "${NUM_ICACHE_ITERATIONS}" -o "feedsim_results_${FIXQPS_SUFFIX}${i}.txt" ${STORE_GRAPH} ${LOAD_GRAPH} ${INSTRUMENT_GRAPH} $@ 2>&1 > "${FEEDSIM_LOG_PREFIX}${i}.log" &
106136
PIDS+=("$!")
107137
PHY_CORE_ID=$((PHY_CORE_ID + CORES_PER_INST))
108138
SMT_ID=$((SMT_ID + CORES_PER_INST))

packages/feedsim/run.sh

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@ Usage: ${0##*/} [OPTION]...
6565
-d Duration of each load testing experiment, in seconds. Default: 300
6666
-p Port to use by the LeafNodeRank server and the load drivers. Default: 11222
6767
-o Result output file name. Default: "feedsim_results.txt"
68+
-S Store the generated graph to a file (requires a file path)
69+
-L Load a graph from a file instead of generating one (requires a file path)
70+
-I Enable timing instrumentation for graph operations (build, store, load)
6871
EOF
6972
}
7073

@@ -122,6 +125,17 @@ main() {
122125
local icache_iterations
123126
icache_iterations="1600000"
124127

128+
# Graph storage and loading options
129+
local store_graph
130+
store_graph=""
131+
132+
local load_graph
133+
load_graph=""
134+
135+
local instrument_graph
136+
instrument_graph=""
137+
138+
125139
if [ -z "$IS_AUTOSCALE_RUN" ]; then
126140
echo > $BREPS_LFILE
127141
fi
@@ -162,6 +176,19 @@ main() {
162176
-i)
163177
icache_iterations="$2"
164178
;;
179+
-S)
180+
if [ "$2" != "default_do_not_store" ]; then
181+
store_graph="--store_graph=$2"
182+
fi
183+
;;
184+
-L)
185+
if [ "$2" != "default_do_not_load" ]; then
186+
load_graph="--load_graph=$2"
187+
fi
188+
;;
189+
-I)
190+
instrument_graph="--instrument_graph"
191+
;;
165192
-h|--help)
166193
show_help >&2
167194
exit 1
@@ -172,7 +199,7 @@ main() {
172199
esac
173200

174201
case $1 in
175-
-t|-c|-s|-d|-p|-q|-o|-w|-i|-l)
202+
-t|-c|-s|-d|-p|-q|-o|-w|-i|-l|-S|-L)
176203
if [ -z "$2" ]; then
177204
echo "Invalid option: '$1' requires an argument" 1>&2
178205
exit 1
@@ -208,13 +235,29 @@ main() {
208235
--num_objects=2000 \
209236
--graph_max_iters=1 \
210237
--noaffinity \
211-
--min_icache_iterations="$icache_iterations" &
238+
--min_icache_iterations="$icache_iterations" \
239+
"$store_graph" \
240+
"$load_graph" \
241+
"$instrument_graph" >> $BREPS_LFILE 2>&1 &
212242

213243
LEAF_PID=$!
214244

215-
# FIXME(cltorres)
216-
# Remove sleep, expose an endpoint or print a message to notify service is ready
217-
sleep 30
245+
# Wait for server to be fully ready using monitoring endpoint
246+
echo "Waiting for LeafNodeRank server to be ready on monitor port $monitor_port..."
247+
max_attempts=30
248+
attempt=0
249+
while [ $attempt -lt $max_attempts ]; do
250+
if curl -f -s "http://localhost:$monitor_port/topology" > /dev/null 2>&1; then
251+
echo "LeafNodeRank server is ready (monitor port responding)"
252+
break
253+
fi
254+
attempt=$((attempt + 1))
255+
if [ $attempt -eq $max_attempts ]; then
256+
echo "ERROR: Server failed to become ready within $max_attempts seconds"
257+
exit 1
258+
fi
259+
sleep 1
260+
done
218261

219262
# FIXME(cltorres)
220263
# Skip ParentNode for now, and talk directly to LeafNode

packages/feedsim/third_party/src/workloads/ranking/LeafNodeRank.cc

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,9 @@ struct ThreadData {
7575
std::string random_string;
7676
};
7777

78+
// Global graph that will be shared across threads
79+
CSRGraph<int32_t> g_shared_graph;
80+
7881
void ThreadStartup(
7982
oldisim::NodeThread& thread,
8083
std::vector<ThreadData>& thread_data,
@@ -85,7 +88,8 @@ void ThreadStartup(
8588
const std::shared_ptr<folly::IOThreadPoolExecutor>& ioThreadPool,
8689
const std::shared_ptr<ranking::TimekeeperPool>& timekeeperPool) {
8790
auto& this_thread = thread_data[thread.get_thread_num()];
88-
auto graph = params.buildGraph();
91+
// auto graph = params.buildGraph();
92+
auto graph = params.makeGraphCopy(g_shared_graph);
8993
this_thread.cpuThreadPool = cpuThreadPool;
9094
this_thread.srvCPUThreadPool = srvCPUThreadPool;
9195
this_thread.srvIOThreadPool = srvIOThreadPool;
@@ -307,6 +311,42 @@ int main(int argc, char** argv) {
307311
std::vector<ThreadData> thread_data(args.threads_arg);
308312
ranking::dwarfs::PageRankParams params{
309313
args.graph_scale_arg, args.graph_degree_arg};
314+
315+
// create or load a graph
316+
317+
if (args.load_graph_given) {
318+
if (args.instrument_graph_given) {
319+
auto start_load = std::chrono::steady_clock::now();
320+
g_shared_graph = params.loadGraphFromFile(args.load_graph_arg);
321+
auto end_load = std::chrono::steady_clock::now();
322+
auto load_duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_load - start_load).count();
323+
std::cout << "Graph loading time: " << load_duration << " ms" << std::endl;
324+
} else {
325+
g_shared_graph = params.loadGraphFromFile(args.load_graph_arg);
326+
}
327+
} else {
328+
if (args.instrument_graph_given) {
329+
auto start_build = std::chrono::steady_clock::now();
330+
g_shared_graph = params.buildGraph();
331+
auto end_build = std::chrono::steady_clock::now();
332+
auto build_duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_build - start_build).count();
333+
std::cout << "Graph building time: " << build_duration << " ms" << std::endl;
334+
335+
if (args.store_graph_given) {
336+
auto start_store = std::chrono::steady_clock::now();
337+
params.storeGraphToFile(g_shared_graph, args.store_graph_arg);
338+
auto end_store = std::chrono::steady_clock::now();
339+
auto store_duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_store - start_store).count();
340+
std::cout << "Graph storing time: " << store_duration << " ms" << std::endl;
341+
}
342+
} else {
343+
g_shared_graph = params.buildGraph();
344+
if (args.store_graph_given) {
345+
params.storeGraphToFile(g_shared_graph, args.store_graph_arg);
346+
}
347+
}
348+
}
349+
310350
oldisim::LeafNodeServer server(args.port_arg);
311351
server.SetThreadStartupCallback([&](auto&& thread) {
312352
return ThreadStartup(

packages/feedsim/third_party/src/workloads/ranking/LeafNodeRankCmdline.ggo

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ option "graph_max_iters" - "Perform at most 'graph_max_iters' iterations during
2828
option "graph_subset" - "Perform partial PageRank over these numbers of nodes. 0 indicates all nodes." int default="3145728"
2929
option "num_objects" - "Number of objects to serialize." int default="40"
3030
option "random_data_size" - "Number of bytes of string random data." int default="3145728"
31+
option "store_graph" - "Enable storing the generated graph to a file." string typestr="filename" optional
32+
option "load_graph" - "Enable loading a graph from a file instead of generating one." string typestr="filename" optional
33+
option "instrument_graph" - "Enable timing instrumentation for graph operations (build, store, load)."
3134
option "max_response_size" - "Maximum response size in bytes returned by the leaf server." int default="131072"
3235
option "compression_data_size" - "Number of bytes to compress per request." int default="131072"
3336
option "rank_trials_per_thread" - "Number of iterations each CPU thread executes of rank work." int default="1"

0 commit comments

Comments
 (0)