-
Notifications
You must be signed in to change notification settings - Fork 603
feat(loadbalancer): Add LoadBalancerType Client Side Weighted Round Robin #7407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(loadbalancer): Add LoadBalancerType Client Side Weighted Round Robin #7407
Conversation
9a814bd to
573a483
Compare
|
@jukie I have added the implementation and also tested it on my local setup PS: the repo is so easy to contribute everything just works with the docs given on the site :) |
|
Also I wanted to know should I include slow start in client wrr? So the thing is that I have submitted the proposal in grpc-xds grpc/proposal#498 and also in envoy I have got the proto updated. It is not implemented yet, but I am trying to pick it up this month if my time allows |
|
Also I am unsure of how to test this e2e, so I have just included an AI generated e2e test suite. The challenge here is that we need multiple replicas with each server respond with a specific header containing rps and cpu_utilisation and then the traffic is distributed by calculating the weight (rps / cpu) I don't know what the current e2e tests allow and if this type of test case is feasible to write |
api/v1alpha1/loadbalancer_types.go
Outdated
| // The multiplier used to adjust endpoint weights with the error rate calculated as eps/qps. | ||
| // Must be non-negative. Default is 1.0. | ||
| // +kubebuilder:validation:Minimum=0 | ||
| // +optional | ||
| ErrorUtilizationPenalty *float32 `json:"errorUtilizationPenalty,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try and avoid floats in the API layer. Could you adjust this to something like uint32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it, but won't that make this configuration more aggresive? This config might usually be used for smaller numbers like may be 1, 2, 5, 10, since this is a multipler of how aggressive are we observing errors.
I would say mostly it can just be in range of 0 to 2 from practical point of view. Having a uint32 value just restricts the user to use this configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By that I meant adjusted to be a percentage based int such that 100=1.0 or something else reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any other example that I can refer, because I guess percentage wise also this is a value which goes above 100
So from my understanding I see the configuration value as if my rps is 1000 and my application is very sensistive to errors then I would usually reduce the traffic by a factor of 1.2 or 1.5, so not sure if making it 150%, 120% would be more understandble.
If the nature of my application is as such that the errors are bound to happen, may be because the traffic is coming from an external source then I would to keep it lower like 0.3 - 0.5 such that only when it crosses this boundary I would say that there is some issue with the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preconnect Policy is one example where float is used for the Envoy proto but Envoy Gateway uses an int -
gateway/api/v1alpha1/connection_types.go
Line 97 in 5d11530
| PerEndpointPercent *uint32 `json:"perEndpointPercent,omitempty"` |
I wasn't proposing 100 as the maximum allowed value but was using that as an example of what the default would translate to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jukie would you be able to check the changes if you get sometime?
5f3d64b to
ff2a353
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7407 +/- ##
==========================================
- Coverage 72.36% 72.34% -0.02%
==========================================
Files 231 231
Lines 34042 34100 +58
==========================================
+ Hits 24633 24669 +36
- Misses 7633 7655 +22
Partials 1776 1776 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I have started the implementation of slow_start_config and locality lb config with WRR as well, if possible I would also want to include them in the gatway implementation. |
|
We wait to add features here until they've made it into a full envoy release. The flow would be getting this lb support added for 1.7 and if your envoy changes get merged we can add that support to gateway in 1.8. Let's keep the scope of this PR to what's currently available and we can always include additional features in a follow-up. Are you able to make the suggested changes or can you join the contributors call next week to discuss? |
|
Sure, I just paused because the other changes were also approved, but I understand I will make the respective changes as suggested. Will try to complete them by today / tomorrow @jukie |
f926f90 to
c7ecca3
Compare
|
@jukie I have made the respective changes |
…Gateway CRDs, ensuring configurable parameters and validation rules are integrated. Includes e2e test for validation. Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
…eway CRDs and related configurations. Update associated test data and documentation. Signed-off-by: anurag.ag <[email protected]>
…entSideWeightedRoundRobin configuration, update affected tests and CRDs. Signed-off-by: anurag.ag <[email protected]>
…cross Gateway CRDs, configuration files, and related tests. Adjust documentation to reflect percentage-based representation. Signed-off-by: anurag.ag <[email protected]>
a0471e0 to
dd5d285
Compare
| BlackoutPeriod *metav1.Duration `json:"blackoutPeriod,omitempty" yaml:"blackoutPeriod,omitempty"` | ||
| WeightExpirationPeriod *metav1.Duration `json:"weightExpirationPeriod,omitempty" yaml:"weightExpirationPeriod,omitempty"` | ||
| WeightUpdatePeriod *metav1.Duration `json:"weightUpdatePeriod,omitempty" yaml:"weightUpdatePeriod,omitempty"` | ||
| ErrorUtilizationPenalty *float32 `json:"errorUtilizationPenalty,omitempty" yaml:"errorUtilizationPenalty,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also use a non-float value and use similar conversion logic as you made during IR translation but instead during xds translation.
| // ClientSideWeightedRoundRobin defines configuration for Envoy's Client-Side Weighted Round Robin policy. | ||
| // See Envoy proto: envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin | ||
| // Note: SlowStart is not supported for this policy in Envoy Gateway at this time. | ||
| type ClientSideWeightedRoundRobin struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your pr adding slow_start was merged in as well so feel free to include that now if you'd like and we'll get this into v1.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the existing slow_start field be reused ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep it uses extensions.load_balancing_policies.common.v3.SlowStartConfig which is the same as used in other lb policies.
| BlackoutPeriod: ptr.To(metav1.Duration{Duration: 30 * time.Second}), | ||
| WeightExpirationPeriod: ptr.To(metav1.Duration{Duration: 10 * time.Second}), | ||
| WeightUpdatePeriod: ptr.To(metav1.Duration{Duration: 1 * time.Second}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you use MetaV1DurationPtr(time.Duration(${your-dur})) here similar to https://github.com/envoyproxy/gateway/blob/14d1d596af7083901f24c3f1eaf22848dc8f7637/internal/ir/xds_test.go#L1474C26-L1474C43
|
Overall looks good, just a few more comments @anuragagarwal561994! Thanks for adding this and make sure to run Sorry for the delayed review on this. I'll prioritize helping you with this next week. |
| // ClientSideWeightedRoundRobin defines configuration for Envoy's Client-Side Weighted Round Robin policy. | ||
| // See Envoy proto: envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin | ||
| // Note: SlowStart is not supported for this policy in Envoy Gateway at this time. | ||
| type ClientSideWeightedRoundRobin struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the first iteration, do we need to add all these fields, or can be rely on the defaults that are preset ?
can you share the motivation for adding these fields
| } | ||
|
|
||
| // ClientSideWeightedRoundRobin defines configuration for Envoy's Client-Side Weighted Round Robin policy. | ||
| // See Envoy proto: envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto is not descriptive enough, and doesnt mention how this needs to be instrumented in the upstream, are there other links than can be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the docs aren't great... for example out-of-band reporting isn't supported at all yet reading the docs would indicate otherwise.
We could contribute some docs changes upstream but agreed that including a reference to how endpoint-load-metrics and endpoint-load-metrics-bin are instrumented would be useful here.
| // RoundRobinLoadBalancerType load balancer policy. | ||
| RoundRobinLoadBalancerType LoadBalancerType = "RoundRobin" | ||
| // ClientSideWeightedRoundRobinLoadBalancerType load balancer policy. | ||
| ClientSideWeightedRoundRobinLoadBalancerType LoadBalancerType = "ClientSideWeightedRoundRobin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on other names involving BackendMetrics or Utilization or ORCA ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If length is your concern ClientSideWRR is an abbreviation I've seen used in a few places.
I agree the name is confusing but it's hard to rename this as the exclusive ORCA or Utilization lb type because similar lb policies exist such as WrrLocality.
What type of PR is this?
What this PR does / why we need it:
This PR provides addition of new load balancer type client side weighted round robin. This is a new load balancing extension introduced since envoy 1.32
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto
Which issue(s) this PR fixes:
Fixes #7305
Release Notes: Yes/No