-
Notifications
You must be signed in to change notification settings - Fork 355
feat: add Memory Protection Middleware #2646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2646 +/- ##
==========================================
- Coverage 79.46% 77.08% -2.37%
==========================================
Files 455 461 +6
Lines 47161 48940 +1779
==========================================
+ Hits 37470 37722 +252
- Misses 6945 8449 +1504
- Partials 2746 2769 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d65a7c0 to
7afdbc9
Compare
| - job_name: "spicedb" | ||
| static_configs: | ||
| - targets: ["spicedb:9090"] | ||
| - targets: ["spicedb-1:9090"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI so that we can verify the new metrics in Grafana
7d7fb92 to
dd062d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI i can move the mock generation to a different PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine.
| @@ -0,0 +1,4 @@ | |||
| internal/mocks/*.go linguist-generated=true | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI so that when changing these, the "View PR" view on Github says that the files are automatically generated and people can just mark "viewed" on them
dd062d7 to
9ff3096
Compare
pkg/cmd/serve.go
Outdated
|
|
||
| // Memory Protection flags | ||
| apiFlags.BoolVar(&config.MemoryProtectionEnabled, "memory-protection-enabled", true, "enables a memory-based middleware that rejects requests when memory usage is too high") | ||
| apiFlags.IntVar(&config.MemoryProtectionAPIThresholdPercent, "memory-protection-api-threshold", 90, "memory usage threshold percentage for regular API requests (0-100)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't like that we have the default 90 here and also in the struct itself 😭 i'd love a unified approach in the future
pkg/cmd/serve.go
Outdated
|
|
||
| // Memory Protection flags | ||
| apiFlags.BoolVar(&config.MemoryProtectionEnabled, "memory-protection-enabled", true, "enables a memory-based middleware that rejects requests when memory usage is too high") | ||
| apiFlags.IntVar(&config.MemoryProtectionAPIThresholdPercent, "memory-protection-api-threshold", 90, "memory usage threshold percentage for regular API requests (0-100)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some percent-based flags that are defined as [0...1] floats. Worth having a look and deciding which approach to commit to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it similar to this
--datastore-revision-quantization-max-staleness-percent float float percentage (where 1 = 100%)
i don't love it but I prefer being consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I just found that you can pass a large number to that flag and the server accepts it 😆
- "SPICEDB_DATASTORE_REVISION_QUANTIZATION_MAX_STALENESS_PERCENT=10000"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like it's missing some validation :D
77c4458 to
e73f76c
Compare
The commit introduces a new memory protection middleware to help prevent out-of-memory conditions in SpiceDB by implementing admission control based on current memory usage. This is not a perfect solution (doesn't prevent non-traffic-related sources of OOM) and is meant to support other future improvements to resource sharing in a single SpiceDB node. The middleware is installed both in the main api and in dispatch, but at different thresholds. Memory usage is polled in the background, and if in-flight memory rises above the threshold, backpressure is placed on incoming requests. The API threshold is higher than the dispatch threshold to preserve already admitted traffic as much as possible.
e73f76c to
4a317d3
Compare
tstirrat15
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
| _ MemoryLimitProvider = (*DefaultMemoryLimitProvider)(nil) | ||
| _ MemoryLimitProvider = (*HardCodedMemoryLimitProvider)(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this syntax as opposed to constructing one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYM? This is standard go code to assert that a struct satisfies an interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine.
| synctest.Wait() | ||
| t.Log("When we get here, the sampling is guaranteed to have run") | ||
|
|
||
| require.True(t, sampler.GetTimestampLastMemorySample().After(now)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a goroutine :P
|
We had a discussion on slack where we looked at some implementation details of |
vroldanbet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I defer to y'all looking into an alternative implementation to retrieve runtime stats with lower overhead
772f37b to
24b9025
Compare
24b9025 to
8b47b79
Compare
|
@ecordell @vroldanbet please see #2691 |
Description
This introduces a new memory protection middleware to help prevent out-of-memory conditions in SpiceDB by implementing admission control based on current memory usage.
This is not a perfect solution (doesn't prevent non-traffic-related sources of OOM) and is meant to support other future improvements to resource sharing in a single SpiceDB node.
The middleware is installed both in the main api and in dispatch, but at different thresholds. Memory usage is polled in the background, and if in-flight memory rises above the threshold, backpressure is placed on incoming requests.
The dispatch threshold is higher than the API threshold to preserve already admitted traffic as much as possible.
Testing
mem_limit: "300mb"to thespicedbcontainersdocker-compose up --buildyou should see logs such as:
and this graph in Grafana: