-
Notifications
You must be signed in to change notification settings - Fork 355
feat: add Memory Protection Middleware #2691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2691 +/- ##
==========================================
- Coverage 77.01% 77.01% -0.00%
==========================================
Files 462 464 +2
Lines 49134 49184 +50
==========================================
+ Hits 37837 37872 +35
- Misses 8509 8520 +11
- Partials 2788 2792 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5ec48e4 to
4ddac59
Compare
| if c.MemoryProtectionEnabled { | ||
| return memoryprotection.NewRealTimeMemoryUsageProvider() | ||
| } | ||
| return &memoryprotection.HarcodedMemoryLimitProvider{AcceptAllRequests: true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would've expected that if the feature was disabled we would just not install the middleware at all, instead of installing it with a no-op limiter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering now why do we have the MemoryProtectionEnabled flag at all 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of having it be always-on, and then we just don't install the middleware in tests (if that's possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at our existing middlewares to see if we have an example of one that is conditionally installed, and it doesn't exist... if a middleware isn't enabled, it still runs, but it just no-ops. Example:
spicedb/internal/middleware/perfinsights/perfinsights.go
Lines 157 to 160 in d4ff660
| func (r *reporter) ServerReporter(ctx context.Context, callMeta interceptors.CallMeta) (interceptors.Reporter, context.Context) { | |
| if !r.isEnabled { | |
| return &interceptors.NoopReporter{}, ctx | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Captured in #2723
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think I'm going to split the difference and not expose it as a user-facing flag, but leave the ability to configure it. My thought is that if (e.g.) our steelthread tests start hitting OOM conditions that trigger this middleware, it'll be good to disable the middleware so that we get failures instead of ResourceExhausteds
| type GoRealTimeMemoryLimiter struct{} | ||
|
|
||
| func (g GoRealTimeMemoryLimiter) IsMemLimitReached() bool { | ||
| return rtml.IsMemLimitReached() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we had a way to set a threshold here. Since this will get called both for dispatch and primary api calls, it will start rejecting them both equally, when really we want to reject primary api before we start rejecting dispatch.
I don't think this is worth blocking on, but I do wonder if we should contribute a threshold % to the upstream library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Opened odigos-io/go-rtml#3
| "-failfast", | ||
| "-count=1", | ||
| "-race", | ||
| "-timeout=20m", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why increasing from 10m to 20m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, that was a leftover from a different PR. The answer is https://github.com/authzed/spicedb/pull/2674/files#r2505759338
Tl;dr instead of putting timeouts in the test code itself, i think it's better to put it in go test -timeout
pkg/cmd/server/defaults.go
Outdated
| WithInterceptor(grpcauth.UnaryServerInterceptor(opts.AuthFunc)). | ||
| EnsureAlreadyExecuted(DefaultMiddlewareGRPCProm). // so that prom middleware reports auth failures | ||
| EnsureAlreadyExecuted(DefaultMiddlewareGRPCProm). // so that prom middleware reports auth failures | ||
| EnsureAlreadyExecuted(DefaultMiddlewareMemoryProtection). // so that memory protection happens before auth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to run before auth?
The rationale for EnsureAlreadyExecuted is to declare a dependency between middlewares explicitly: "I need this to run before me because I depend on its side effects". So why does the auth middleware depend on memory protection?
If not, you can remove it.
If anything, I'd argue that memory protection depends on all the previous middlewares, because we want observability to take place before we kick memory protection, otherwise we would be running blind when ResourceExhausted is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here, I just thought that the earlier the memory protection could run the better.
4ddac59 to
269a1c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments, otherwise looks good.
| acceptedStr := strconv.FormatBool(accepted) | ||
|
|
||
| RequestsProcessed.WithLabelValues(endpointType, acceptedStr).Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we're doing it with a label? Hmm...
112d59b to
491398d
Compare
491398d to
6844c39
Compare
tstirrat15
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be keen to see what @ecordell has to say, and I'm down with the idea of making this memory middleware non-optional.
| if c.MemoryProtectionEnabled { | ||
| return memoryprotection.NewRealTimeMemoryUsageProvider() | ||
| } | ||
| return &memoryprotection.HarcodedMemoryLimitProvider{AcceptAllRequests: true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of having it be always-on, and then we just don't install the middleware in tests (if that's possible).
10b5284 to
8f5d0b9
Compare
8f5d0b9 to
3d26690
Compare
173b7ce to
27e46d0
Compare
tstirrat15
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be good.
| return sh.RunWithV( | ||
| map[string]string{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No vars, so don't need With
27e46d0 to
05c7766
Compare
tstirrat15
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
83189c4 to
967b1b9
Compare
Pre-reqs
Description
This PR introduces a new memory protection middleware to help prevent out-of-memory conditions in SpiceDB by implementing admission control based on current memory usage.
This is not a perfect solution (doesn't prevent non-traffic-related sources of OOM) and is meant to support other future improvements to resource sharing in a single SpiceDB node.
The middleware is installed both in the main API and in Dispatch APIs, and it uses this lib: https://github.com/odigos-io/go-rtml
References
Alternative implementation: #2646.
Upon testing both implementations, it was found that:
mem_limitwas hit.