Profile callprofiler with different testcases#829
Conversation
22b55de to
65972c4
Compare
ThreeMonth03
left a comment
There was a problem hiding this comment.
@yungyuc Please take a look. This pull request is quite long, so I'm wondering whether there are better ways to generate code and profile the benchmark on different platform.
| - name: make cprof | ||
| if: runner.os == 'Linux' | ||
| run: | | ||
| make cprof | ||
There was a problem hiding this comment.
Profile profiler only on linux.
| return nullptr; | ||
| } | ||
|
|
||
| bool run_named_case(std::string_view label, std::size_t size, std::size_t repeat_count) |
There was a problem hiding this comment.
Run different types of functions with different hyperparameter.
| std::cout << "RESULT workload=" << label | ||
| << " operations=" << operation_count | ||
| << " repeats=" << repeat_count | ||
| << " workload_seconds=" << elapsed.count() | ||
| << '\n'; |
There was a problem hiding this comment.
This file will print the wall time of benchmark, because we cannot obtain wall time from gprof
| void configure_large_stack() | ||
| { | ||
| #if defined(__linux__) | ||
| rlimit limit{}; | ||
| if (getrlimit(RLIMIT_STACK, &limit) == 0) | ||
| { | ||
| if (RLIM_INFINITY == limit.rlim_max || limit.rlim_cur < limit.rlim_max) | ||
| { | ||
| limit.rlim_cur = limit.rlim_max; | ||
| static_cast<void>(setrlimit(RLIMIT_STACK, &limit)); | ||
| } | ||
| } | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Configure enough stack size at first, because the depth of callers may be 50000.
| std::array<case_definition, 4> const case_definitions{{ | ||
| {"wide_siblings", &workload::run_wide_siblings}, | ||
| {"deep_chain", &workload::run_deep_chain}, | ||
| {"balanced_tree", &workload::run_balanced_tree}, | ||
| {"hot_name_reuse", &workload::run_hot_name_reuse}, | ||
| }}; |
There was a problem hiding this comment.
4 kinds of benchmark. They are generated by python scripts.
| add_custom_command( | ||
| OUTPUT ${CPROF_GENERATED_SOURCES} | ||
| COMMAND "${PYTHON_EXECUTABLE}" "${CPROF_GENERATOR}" | ||
| --output-dir "${CPROF_GENERATED_DIR}" | ||
| --shards "${CPROF_SHARD_COUNT}" | ||
| DEPENDS "${CPROF_GENERATOR}" | ||
| VERBATIM | ||
| ) |
There was a problem hiding this comment.
Generate benchmarks.
There was a problem hiding this comment.
I've tried to generate cpp files with macro, but it is too slow to generate 50000 functions.
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
Scripts to run the execute file of profiling/cprof/callprofiler_gprof.cpp.
There was a problem hiding this comment.
I'm not sure whether to put cpp files in /profiling.
65972c4 to
45dc6fb
Compare
To solve issue #831, this pull request profilers callprofiler with different testcases with gprof.
Because gprof is integrated with g++, scripts in this pull request are only supported for linux platform now, and the following data are measured on WSL2, with intel 13700 CPU.
As for the benchmarks, there are 4 types of functions, and we assume that the number of operations is 200:
To obtain a precise profiling result, this pull request also repeats and resets the profiler when the number of operations is small, because gprof is sampling-based profiler.
As for the result, it is obvious that
modmesh::CallProfiler::start_caller()is the hotspot, because it searches the target children node with linear time. We might optimize this hotspot later.CallProfiler gprof
wide_siblings
gprof top 5: operations
100, repeats10000gprof top 5: operations
1000, repeats1000gprof top 5: operations
10000, repeats5gprof top 5: operations
50000, repeats1deep_chain
gprof top 5: operations
100, repeats10000gprof top 5: operations
1000, repeats1000gprof top 5: operations
10000, repeats5gprof top 5: operations
50000, repeats1balanced_tree
gprof top 5: operations
100, repeats10000gprof top 5: operations
1000, repeats1000gprof top 5: operations
10000, repeats5gprof top 5: operations
50000, repeats1hot_name_reuse
gprof top 5: operations
100, repeats10000gprof top 5: operations
1000, repeats1000gprof top 5: operations
10000, repeats5gprof top 5: operations
50000, repeats1