Skip to content

Profile callprofiler with different testcases#829

Open
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:profile-callprofiler-cases
Open

Profile callprofiler with different testcases#829
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:profile-callprofiler-cases

Conversation

@ThreeMonth03
Copy link
Copy Markdown
Collaborator

@ThreeMonth03 ThreeMonth03 commented May 25, 2026

To solve issue #831, this pull request profilers callprofiler with different testcases with gprof.
Because gprof is integrated with g++, scripts in this pull request are only supported for linux platform now, and the following data are measured on WSL2, with intel 13700 CPU.

As for the benchmarks, there are 4 types of functions, and we assume that the number of operations is 200:

  1. wide_siblings: All subfunctions are at the same layer.
root
├─ f0
├─ f1
├─ ...
└─ f200
  1. deep_chain: All subfunctions are at the different layers.
root
└─ f0
   └─ f1
      └─ ...
         └─ f200
  1. balanced_tree: The dependency of subfunctions looks like a balanced tree.
root
└─ f100
   ├─ ...
   │  └─ f2
   |  └─ f3
   └─ ...
         └─ f1
         └─ f0
  1. hot_name_reuse: All subfunctions are at the same layer, but reuse the same name for 100 consecutive subfunctions, then switch to the next name.
root
├─ f0
├─ f0
├─ ... repeat existing names 100 times
├─ f1
└─ ...

To obtain a precise profiling result, this pull request also repeats and resets the profiler when the number of operations is small, because gprof is sampling-based profiler.

As for the result, it is obvious that modmesh::CallProfiler::start_caller() is the hotspot, because it searches the target children node with linear time. We might optimize this hotspot later.

CallProfiler gprof

wide_siblings

operations repeats workload seconds
100 10000 2.293520E-01
1000 1000 2.107130E+00
10000 5 8.072800E-01
50000 1 6.167120E+00

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 55.56      0.05     0.05  1010000     0.05     0.05  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 22.22      0.07     0.02    59617     0.34     0.34  modmesh::profiling::detail::profile_empty_049983(unsigned long, unsigned long)
 11.11      0.08     0.01  1000000     0.01     0.06  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
 11.11      0.09     0.01                             modmesh::profiling::detail::profile_empty_000090(unsigned long, unsigned long)
  0.00      0.09     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 92.43      1.71     1.71  1001000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  2.16      1.75     0.04  1001000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  1.08      1.77     0.02  1001000     0.00     0.00  modmesh::CallProfiler::end_caller()
  1.08      1.79     0.02     8727     0.00     0.00  modmesh::profiling::detail::profile_empty_049983(unsigned long, unsigned long)
  0.54      1.83     0.01                             modmesh::profiling::detail::profile_empty_000495(unsigned long, unsigned long)

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 96.00      0.72     0.72    50005     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  1.33      0.73     0.01   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  1.33      0.74     0.01    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  1.33      0.75     0.01                             modmesh::profiling::detail::profile_empty_003445(unsigned long, unsigned long)
  0.00      0.75     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 98.69      6.03     6.03    50001     0.12     0.12  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.33      6.05     0.02    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.16      6.06     0.01    50000     0.00     0.12  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
  0.16      6.07     0.01        1    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
  0.16      6.08     0.01                             modmesh::profiling::detail::profile_empty_011490(unsigned long, unsigned long)

deep_chain

operations repeats workload seconds
100 10000 1.706320E-01
1000 1000 2.533460E-01
10000 5 3.774810E-02
50000 1 6.268040E-02

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 33.33      0.01     0.01  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 33.33      0.02     0.01  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
 33.33      0.03     0.01    10000     0.00     0.00  modmesh::RadixTreeNode<modmesh::CallerProfile>::~RadixTreeNode()
  0.00      0.03     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.03     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 33.33      0.01     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 33.33      0.02     0.01  1001000     0.01     0.01  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 33.33      0.03     0.01     1000    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
  0.00      0.03     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.03     0.00  1001000     0.00     0.02  modmesh::CallProfiler::end_caller()

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00    50000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.00      0.01     0.01    50001     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 25.00      0.02     0.01       13     0.77     0.77  std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&)
 25.00      0.03     0.01        1    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
 25.00      0.04     0.01                             modmesh::profiling::detail::profile_empty_043209(unsigned long, unsigned long)
  0.00      0.04     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)

balanced_tree

operations repeats workload seconds
100 10000 1.841660E-01
1000 1000 2.679980E-01
10000 5 3.871620E-02
50000 1 5.947280E-02

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 60.00      0.03     0.03  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 20.00      0.04     0.01  1010000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.05     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.05     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.05     0.00  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 83.33      0.05     0.05  1001000     0.05     0.05  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 16.67      0.06     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.06     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.06     0.00  1001000     0.00     0.05  modmesh::CallProfiler::end_caller()
  0.00      0.06     0.00  1000000     0.00     0.01  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
100.00      0.01     0.01                             modmesh::profiling::detail::profile_empty_000026(unsigned long, unsigned long)
  0.00      0.01     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.00      0.01     0.01        2     5.00     5.00  modmesh::RadixTreeNode<modmesh::CallerProfile>::~RadixTreeNode()
  0.00      0.01     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

hot_name_reuse

operations repeats workload seconds
100 10000 8.356190E-02
1000 1000 8.568760E-02
10000 5 7.355400E-03
50000 1 2.580690E-02

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00  1010000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 50.00      0.01     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 50.00      0.02     0.01  1000000     0.01     0.02  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
  0.00      0.02     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.02     0.00  1001000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.02     0.00  1001000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00    50000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
100.00      0.01     0.01    50001   200.00   200.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.01     0.00    50000     0.00   200.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

@ThreeMonth03 ThreeMonth03 marked this pull request as draft May 25, 2026 17:55
@ThreeMonth03 ThreeMonth03 force-pushed the profile-callprofiler-cases branch 2 times, most recently from 22b55de to 65972c4 Compare May 26, 2026 06:03
@ThreeMonth03 ThreeMonth03 marked this pull request as ready for review May 26, 2026 07:05
Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please take a look. This pull request is quite long, so I'm wondering whether there are better ways to generate code and profile the benchmark on different platform.

Comment on lines +77 to +81
- name: make cprof
if: runner.os == 'Linux'
run: |
make cprof
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile profiler only on linux.

return nullptr;
}

bool run_named_case(std::string_view label, std::size_t size, std::size_t repeat_count)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run different types of functions with different hyperparameter.

Comment on lines +75 to +79
std::cout << "RESULT workload=" << label
<< " operations=" << operation_count
<< " repeats=" << repeat_count
<< " workload_seconds=" << elapsed.count()
<< '\n';
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will print the wall time of benchmark, because we cannot obtain wall time from gprof

Comment on lines +43 to +56
void configure_large_stack()
{
#if defined(__linux__)
rlimit limit{};
if (getrlimit(RLIMIT_STACK, &limit) == 0)
{
if (RLIM_INFINITY == limit.rlim_max || limit.rlim_cur < limit.rlim_max)
{
limit.rlim_cur = limit.rlim_max;
static_cast<void>(setrlimit(RLIMIT_STACK, &limit));
}
}
#endif
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configure enough stack size at first, because the depth of callers may be 50000.

Comment on lines +36 to +41
std::array<case_definition, 4> const case_definitions{{
{"wide_siblings", &workload::run_wide_siblings},
{"deep_chain", &workload::run_deep_chain},
{"balanced_tree", &workload::run_balanced_tree},
{"hot_name_reuse", &workload::run_hot_name_reuse},
}};
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 kinds of benchmark. They are generated by python scripts.

Comment on lines +18 to +25
add_custom_command(
OUTPUT ${CPROF_GENERATED_SOURCES}
COMMAND "${PYTHON_EXECUTABLE}" "${CPROF_GENERATOR}"
--output-dir "${CPROF_GENERATED_DIR}"
--shards "${CPROF_SHARD_COUNT}"
DEPENDS "${CPROF_GENERATOR}"
VERBATIM
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate benchmarks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to generate cpp files with macro, but it is too slow to generate 50000 functions.

Comment thread profiling/cprof/run.py


if __name__ == "__main__":
main()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scripts to run the execute file of profiling/cprof/callprofiler_gprof.cpp.

Comment thread CMakeLists.txt
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether to put cpp files in /profiling.

@ThreeMonth03 ThreeMonth03 force-pushed the profile-callprofiler-cases branch from 65972c4 to 45dc6fb Compare May 26, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant