riak_sysmon is an Erlang/OTP application that manages the event
messages that can be generated by the Erlang virtual machine's
system_monitor BIF (Built-In Function). These messages can notify a
central data-gathering process about the following events:
- Processes that have their private heaps grow beyond a certain size.
- Processes whose private heap garbage collection ops take too long
- Ports that are busy, e.g., blocking file & socket I/O
- Network distribution ports are busy, e.g., lots of communication with a slow peer Erlang node.
The problem with system_monitor events is that there isn't a
mechanism within the Erlang virtual machine that limits the rate at
which the events are generated. A busy VM can easily create many
hundreds of these messages per second. Some kind of rate-limiting
filter is required to avoid further overloading a system that may
already be overloaded.
This app will use two processes for system_monitor message handling.
- A
gen_serverprocess to provide a rate-limiting filter. - A
gen_eventserver to allow flexible, user-defined functions to respond tosystem_monitorevents that pass through the first stage filter.
(Silly reference to The Highlander omitted....)
The Erlang/OTP documentation is pretty clear on this point: only one
process can receive system_monitor messages. But using the
riak_sysmon OTP app, if multiple parties are interested in receiving
system_monitor events, each party can add an event handler to the
riak_sysmon_handler event handler.
The event handler process in this application uses the registered name
riak_sysmon_handler. To add your handler, use something like:
gen_event:add_sup_handler(riak_sysmon_handler, yourModuleName, YourInitialArgs).
See the
gen_event documentation for add_sup_event/3
for API details. See the example event handler module in the source
repository, src/riak_sysmon_example_handler.erl, for example usage.
The following events can be sent from the riak_sysmon
filtering/rate-limiting process (a.k.a. riak_sysmon_filter) to the
event handler process (a.k.a. riak_sysmon_handler).
{monitor, pid(), atom(), term()}... These aresystem_monitormessages as they are received verbatim by theriak_sysmon_filterprocess. See the reference documentation forerlang:system_monitor/2for details.{suppressed, proc_events | port_events, Num::integer()}... These messages inform your event handler thatNumevents of a certain type (proc_eventsorport_events) were suppressed in the last second (i.e. their arrival rate exceeded the configured rate limit).