|
| 1 | +--- |
| 2 | +title: forwarder-to-s3 |
| 3 | +authors: |
| 4 | + - "@jcantrill" |
| 5 | +reviewers: |
| 6 | + - "@apahim" |
| 7 | + - "@alanconway" |
| 8 | + - "@cahartma" |
| 9 | + - "@cuppett" |
| 10 | + - "@xperimental" |
| 11 | +approvers: |
| 12 | + - "@alanconway" |
| 13 | +api-approvers: |
| 14 | + - "@alanconway" |
| 15 | +creation-date: 2025-09-08 |
| 16 | +last-updated: 2025-09-08 |
| 17 | +tracking-link: |
| 18 | + - https://issues.redhat.com/browse/OBSDA-1099 |
| 19 | + - https://issues.redhat.com/browse/LOG-7680 |
| 20 | +see-also: [] |
| 21 | +replaces: [] |
| 22 | +superseded-by: [] |
| 23 | +--- |
| 24 | + |
| 25 | +# Log Forward to S3 Endoint |
| 26 | + |
| 27 | +## Summary |
| 28 | + |
| 29 | +This feature adds support for collecting logs using the Red Hat Logging Operator and forwarding them |
| 30 | +to an S3 configured endpoint. The enhancements to **ClusterLogForwarder** include API changes to: allow |
| 31 | +administrators to utilize "assume role" authentication functionality that is provided by the underlying platform, |
| 32 | +and rely upon "sane" defaults for organizing records in an S3 bucket. |
| 33 | + |
| 34 | +## Motivation |
| 35 | + |
| 36 | +The primary motivation for this proposal is to satisfy functionality requests from Red Hat managed services teams |
| 37 | +which are providing managed clusters for customers. They have requirements to be able to collect, forward, and store logs |
| 38 | +from both the hosted control plane and the management clusters utilizing credentials from multiple organizations in a |
| 39 | +cost efficient manner. |
| 40 | + |
| 41 | +### User Stories |
| 42 | + |
| 43 | +* As an administrator, I want to forward logs to an S3 endpoint |
| 44 | +so that I can store low access logs (i.e. audit logs) and |
| 45 | +retain them for longer periods with reduced costs when compared to Cloudwatch |
| 46 | +* As an administrator, I want to forward logs to an S3 endpoint that might |
| 47 | +otherwise exceed the size limits of Cloudwatch |
| 48 | + |
| 49 | + |
| 50 | +### Goals |
| 51 | + |
| 52 | +* A simple API for an specifying log forwarding to an S3 output |
| 53 | +* A set of sane defaults for organizing log streams written to the specified S3 bucket |
| 54 | +* The capability to define how log streams are organized when written to the specified S3 bucket |
| 55 | +* Re-use existing AWS authentication features provided by the Cloudwatch output |
| 56 | + |
| 57 | +### Non-Goals |
| 58 | + |
| 59 | +* To provide an API the exposes all the configuration points of the underlying collector implementation |
| 60 | + |
| 61 | +## Proposal |
| 62 | + |
| 63 | +This enhancement proposes to: |
| 64 | + |
| 65 | +* Enhance the **ClusterLogForwarder** API to add an S3 output |
| 66 | + * Define a default schema for writing log records to an S3 bucket that is based |
| 67 | +upon the log type and source in order to be consistent with other output types |
| 68 | + * Allow the schema for writting log records to be modified by the administrator |
| 69 | + * Reuse the authorization mechanisms that are available with the Cloudwatch output |
| 70 | +* Add a generator to support generating collector configuration based upon the spec defined by the **ClusterLogForwarder** API |
| 71 | + |
| 72 | + |
| 73 | +### Workflow Description |
| 74 | + |
| 75 | +**Cluster administrator** is a human responsible for administering the **cluster-logging-operator** |
| 76 | +and **ClusterLogForwarders** |
| 77 | + |
| 78 | +1. The cluster administrator creates an S3 bucket on their host platform (i.e. AWS) |
| 79 | +1. The cluster administrator grants a platform role (i.e. IAM Role) the permissions to write to the S3 bucket |
| 80 | +1. The cluster administrator deployes the cluster-logging-operator if it is already not deployed |
| 81 | +1. The cluster administrator edits or creates a **ClusterLogForwarder** and defines an S3 output |
| 82 | +1. The cluster administrator references the S3 output in a pipeline |
| 83 | +1. The cluster-logging-operator reconciles the **ClusterLogForwarder**, generates a new collector configuration, |
| 84 | +and updates the collector deployment |
| 85 | + |
| 86 | +### API Extensions |
| 87 | + |
| 88 | +#### ClusterLogForwarder API |
| 89 | + |
| 90 | +```yaml |
| 91 | +apiVersion: "observability.openshift.io/v1" |
| 92 | +kind: ClusterLogForwarder |
| 93 | +spec: |
| 94 | + outputs: |
| 95 | + - name: |
| 96 | + type: s3 # add s3 to the enum |
| 97 | + s3: |
| 98 | + url: # (optional) string is an alternate to the well-known AWS endpoints |
| 99 | + region: # (optional) string that is different from the configured service default |
| 100 | + bucket: # string for the S3 bucket absent leading 's3://' or trailing '/' and |
| 101 | + # truncated to 63 characters to meet length restrictions |
| 102 | + keyPrefix: # (optional) templated string (see note 1) |
| 103 | + authentication: |
| 104 | + type: # enum: awsAccessKey, iamRole |
| 105 | + awsAccessKey: |
| 106 | + assumeRole: # (optional) |
| 107 | + roleARN: # secret reference |
| 108 | + externalID: # (optional) secret reference |
| 109 | + iamRole: |
| 110 | + roleARN: # secret reference |
| 111 | + token: # bearer token |
| 112 | + assumeRole: # (optional) |
| 113 | + roleARN: # secret reference |
| 114 | + externalID: # (optional)string |
| 115 | + tuning: |
| 116 | + deliveryMode: # (optional) enum: atLeastOnce, atMostOnce |
| 117 | + maxWrite: # (optional) quantity (e.g. 500k) |
| 118 | + compression: # (optional) none, gzip,zstd,snappy,zlib |
| 119 | + minRetryDuration: # (optional) duration |
| 120 | + maxRetryDuration: # (optional) duration |
| 121 | +``` |
| 122 | +
|
| 123 | +**Note 1:** A combination of date formatters, static or dynamic values consisting of field paths followed by "||" followed by another field path or a static value (e.g `foo.{"%Y-%m-%d"}/{.bar.baz||.qux.quux.corge||.grault||"nil"}-waldo.fred{.plugh||"none"}`) |
| 124 | + |
| 125 | +Date formatters are specified using one or more of the following subset of [chrono](https://docs.rs/chrono/latest/chrono/format/strftime/index.html#specifiers) |
| 126 | +specifiers to format the `.timestamp` field value: |
| 127 | + |
| 128 | +| Spec | Example | Description | |
| 129 | +|------|---------|-------------| |
| 130 | +| %F | 2001-07-08| Year-month-day format (ISO 8601). Same as %Y-%m-%d.| |
| 131 | +| %Y | 2001 |The full proleptic Gregorian year, zero-padded to 4 digits |
| 132 | +| %m | 07 | Month number (01–12), zero-padded to 2 digits.| |
| 133 | +| %d |08|Day number (01–31), zero-padded to 2 digits.| |
| 134 | +| %H |00|Hour number (00–23), zero-padded to 2 digits.| |
| 135 | +| %M |34|Minute number (00–59), zero-padded to 2 digits.| |
| 136 | +| %S |60|Second number (00–60), zero-padded to 2 digits.| |
| 137 | + |
| 138 | +The collector will write logs to the s3 bucket defaulting the key prefix that is constructed using attributes of the log entries when not defined by the **ClusterLogForwarder** spec as follows: |
| 139 | + |
| 140 | +| log type| log source | key prefix | |
| 141 | +| --- | --- | --- | |
| 142 | +| Application | container |`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<namespace_name>/<pod_name>/<container_name>/`| |
| 143 | +| Infrastructure | container|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<namespace_name>/<pod_name>/<container_name>/`| |
| 144 | +| Infrastructure | node (Journal)|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<host_name>/`| |
| 145 | +| Audit | auditd|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<host_name>/`| |
| 146 | +| Audit | kubeAPI|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/`| |
| 147 | +| Audit | openshiftAPI|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/`| |
| 148 | +| Audit | ovn|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/`| |
| 149 | + |
| 150 | +**Note 2:** The collector will encode events as [JSON](https://www.rfc-editor.org/rfc/rfc8259) |
| 151 | + |
| 152 | +### Topology Considerations |
| 153 | + |
| 154 | +#### Hypershift / Hosted Control Planes |
| 155 | + |
| 156 | + |
| 157 | +#### Standalone Clusters |
| 158 | + |
| 159 | + |
| 160 | +#### Single-node Deployments or MicroShift |
| 161 | + |
| 162 | +### Implementation Details/Notes/Constraints |
| 163 | + |
| 164 | +Implementation includes: |
| 165 | + |
| 166 | +* `ClusterLogForwarder` API updates |
| 167 | +* Log collector config generator updates with S3 code config template additions |
| 168 | + |
| 169 | +### Risks and Mitigations |
| 170 | + |
| 171 | +This feature is being requested by HCP with a very short deadline for providing a deliverable. This change |
| 172 | +is dependent upon another change that introduces "assumeRole" functionality which has not been completed. The |
| 173 | +risk to the Logging team is HCP may choose to utilize an alternate product if these changes can not be realized |
| 174 | +within their time constraints. |
| 175 | + |
| 176 | +### Drawbacks |
| 177 | + |
| 178 | +The drawbacks to this change is we may be providing users with an alternative to the product's LokiStack |
| 179 | +offereing which may delay its adoption. The feature set of the receivers addresses separate usecases but |
| 180 | +this choice may be construed as a "cheap" or "simple" alternative. |
| 181 | + |
| 182 | +Additionally, this change may be interpreted as a "reliable" delivery mechanism for forwarding logs which |
| 183 | +is still misleading. The OpenShift logging product is not a guaranted log collection and storage system and this |
| 184 | +output will remain subject to the same set of limitations as all other outputs. |
| 185 | + |
| 186 | +Lastly, using this output provides no mechanism to query log records in a useful manner that is offered by other outputs (i.e. LokiStack). The available "metadata" is dependent upon the definition of the "keyPrefix" when the logs are written to S3. If the "keyPrefix" does not provide useful way to organize the data then retrieval of that data will be challenging. |
| 187 | + |
| 188 | +## Alternatives (Not Implemented) |
| 189 | + |
| 190 | + |
| 191 | +## Open Questions [optional] |
| 192 | + |
| 193 | +1. Do we need to support `filename_time_format` to address the key prefix functionality proposed by the draft [PR](https://github.com/openshift/cluster-logging-operator/pull/3096) |
| 194 | +* All indicators are that we need some way to provide a way for users to inject a formatted date into the "keyPrefix" field in order to provide logical organization of the records when written to the bucket |
| 195 | +2. Is there a need to introduce this feature as tech-preview with a `v2beta1` API to allow the "soak" time for the API and additional testing? |
| 196 | + |
| 197 | +## Test Plan |
| 198 | + |
| 199 | +Aside from the usual testing by logging QE, the intent is to deploy, potentially early candidate releases, to the HCP environment in order to exercise their S3 lambda design |
| 200 | + |
| 201 | +## Graduation Criteria |
| 202 | + |
| 203 | + |
| 204 | +### Dev Preview -> Tech Preview |
| 205 | + |
| 206 | +- Ability to utilize the enhancement end to end |
| 207 | +- End user documentation, relative API stability |
| 208 | +- Sufficient test coverage |
| 209 | +- Gather feedback from users rather than just developers |
| 210 | +- Enumerate service level indicators (SLIs), expose SLIs as metrics |
| 211 | +- Write symptoms-based alerts for the component(s) |
| 212 | + |
| 213 | +### Tech Preview -> GA |
| 214 | + |
| 215 | +- More testing (upgrade, downgrade, scale) |
| 216 | +- Sufficient time for feedback |
| 217 | +- Available by default |
| 218 | +- Backhaul SLI telemetry |
| 219 | +- Document SLOs for the component |
| 220 | +- Conduct load testing |
| 221 | +- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/) |
| 222 | + |
| 223 | +**For non-optional features moving to GA, the graduation criteria must include |
| 224 | +end to end tests.** |
| 225 | + |
| 226 | +### Removing a deprecated feature |
| 227 | + |
| 228 | +## Upgrade / Downgrade Strategy |
| 229 | + |
| 230 | + |
| 231 | +## Version Skew Strategy |
| 232 | + |
| 233 | +## Operational Aspects of API Extensions |
| 234 | + |
| 235 | +## Support Procedures |
| 236 | + |
| 237 | + |
| 238 | +## Infrastructure Needed [optional] |
| 239 | + |
| 240 | +HCP deployment |
0 commit comments