Nonblocking, autoscaling, self-contained Kubernetes proxy.
Suppose you have one, or more, components trying to send
HTTP requests to services. In cases where those requests are "events", the
services should accept the event and return a 202 immediately since typically
events are one-way messages. However, not all receivers will do so. Many will
hold on to the incoming HTTP connection until they have completely finished
processing the event. Doing this causes back-pressure on the sender because
there will be a limited number of outbound connections any one sender can
support at a time.
This means that as the number of lingering outbound connections grows, the number of events the sender will send decreases - impacting it's overall throughput.
One option here is to scale the senders up as the number of lingering connections grows. This presents a challenge in some environments where scaling of the sender might cause an interruption in the flow of processing events. Or, there could simply be cases where the sender can't, or just doesn't want to, scale itself.
In a situation where neither the sender or receiver can be radically changed, this proxy proves to be useful.
This proxy will receive requests from a sender, open a connection to the
final destination, wait a certain amount of time (e.g. 200ms) and if no
response is received, it'll then send back a 202 to the sender, but keep the
outbound connection to the service intact. This allows for the sender to
assume there will always be a well-behaving receiver (doesn't block)
and thus keep their overall throughput at their desired levels.
In addition, this proxy autoscales and maintains itself without any metrics hub, so it is both easily portable and scalable.
The proxy is split into two packages:
client/- Client HTTP library that communicates with the proxiesproxy/- Actual K8s StatefulSet proxy
There is also a sample:
sample/recipient- Recipient that doesn't respond instantlysample/sender- Sender utilizing the proxy HTTP library
The proxy itself is just a StatefulSet that can be deployed normally, see proxy.
Proxy configuration can be changed in the StatefulSet's annotations:
minProxiesis the minimum number of proxies to be running at any given time.maxProxiesis the maximum number of proxies to be running at any given time.maxRequestsis the maximum number of active requests any proxy can be processing.maxLoadFactoris a scaling factor representing the target number of active requests a proxy should process before scaling.- For example, a
maxLoadFactorof 0.5 with 100maxRequestsmeans when the last proxy exceeds50requests, it will attempt to scale up. The effectmaxLoadFactorhas is creating a buffer request region in between the time it takes K8s to spin up a new proxy and the new incoming requests.
- For example, a
proxyTimeoutis the time in milliseconds a proxy should wait for a response from the recipient. If this timeout is reached without a response, the proxy returns a202to the sender.idleTimeoutis the time in seconds the last proxy should wait before scaling itself down due to inactivity.debugLevelis the debug verbosity level.
The annotations can be changed in real-time. Meaning one can do
kubectl edit <STATEFULSET>, change one of these configs, and the proxies
will immediately reflect these changes.
- A proxy will return a
202if it can connect to the destination but no response is returned within theproxyTimeoutvalue (in milliseconds). - A proxy will return a
429if it can not process an incoming request due to it reaching the maximum number of outbound connections (maxRequests). While ideally this should not happen, it takes a few seconds for Kubernetes to create another proxy. So if there is a sudden burst of incoming requests, then a proxy may not be able to handle the load. This is whymaxLoadFactorshould be tuned to create an optimal buffer region.- The sender can also warn the proxies of the burst (through the client's
Ensurefunction), so the proxies can scale up in preparation.
- The sender can also warn the proxies of the burst (through the client's
- Scaling up is not based on the maximum number of outbound requests per proxy, but rather the maximum multiplied by the
maxLoadFactorpercentage to create a buffer region. - The client library will choose the least busy proxy instance, but will avoid the most recently created proxy when possible. This allows that last proxy to scale itself down when it is not required to sustain throughput due to the idle timeout.
- The client library bases its routing decisions on statistics returned from the proxies on each response or from a "ping". Pings are only sent if no requests have been sent to a proxy for a certain amount of time, so the client is aware of any down-scaling.
- The proxies themselves do not have any retry logic. Any failure, from the final destination or from within the proxy, will be returned to the client. However, the client library has retry logic by default incase a proxy has terminated.