Skip to content

Commit c0d0570

Browse files
Merge pull request #1841 from jcantrill/log7680
LOG-7680: Log forwarder to s3
2 parents 6ee830c + 207d7f8 commit c0d0570

File tree

1 file changed

+240
-0
lines changed

1 file changed

+240
-0
lines changed
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
---
2+
title: forwarder-to-s3
3+
authors:
4+
- "@jcantrill"
5+
reviewers:
6+
- "@apahim"
7+
- "@alanconway"
8+
- "@cahartma"
9+
- "@cuppett"
10+
- "@xperimental"
11+
approvers:
12+
- "@alanconway"
13+
api-approvers:
14+
- "@alanconway"
15+
creation-date: 2025-09-08
16+
last-updated: 2025-09-08
17+
tracking-link:
18+
- https://issues.redhat.com/browse/OBSDA-1099
19+
- https://issues.redhat.com/browse/LOG-7680
20+
see-also: []
21+
replaces: []
22+
superseded-by: []
23+
---
24+
25+
# Log Forward to S3 Endoint
26+
27+
## Summary
28+
29+
This feature adds support for collecting logs using the Red Hat Logging Operator and forwarding them
30+
to an S3 configured endpoint. The enhancements to **ClusterLogForwarder** include API changes to: allow
31+
administrators to utilize "assume role" authentication functionality that is provided by the underlying platform,
32+
and rely upon "sane" defaults for organizing records in an S3 bucket.
33+
34+
## Motivation
35+
36+
The primary motivation for this proposal is to satisfy functionality requests from Red Hat managed services teams
37+
which are providing managed clusters for customers. They have requirements to be able to collect, forward, and store logs
38+
from both the hosted control plane and the management clusters utilizing credentials from multiple organizations in a
39+
cost efficient manner.
40+
41+
### User Stories
42+
43+
* As an administrator, I want to forward logs to an S3 endpoint
44+
so that I can store low access logs (i.e. audit logs) and
45+
retain them for longer periods with reduced costs when compared to Cloudwatch
46+
* As an administrator, I want to forward logs to an S3 endpoint that might
47+
otherwise exceed the size limits of Cloudwatch
48+
49+
50+
### Goals
51+
52+
* A simple API for an specifying log forwarding to an S3 output
53+
* A set of sane defaults for organizing log streams written to the specified S3 bucket
54+
* The capability to define how log streams are organized when written to the specified S3 bucket
55+
* Re-use existing AWS authentication features provided by the Cloudwatch output
56+
57+
### Non-Goals
58+
59+
* To provide an API the exposes all the configuration points of the underlying collector implementation
60+
61+
## Proposal
62+
63+
This enhancement proposes to:
64+
65+
* Enhance the **ClusterLogForwarder** API to add an S3 output
66+
* Define a default schema for writing log records to an S3 bucket that is based
67+
upon the log type and source in order to be consistent with other output types
68+
* Allow the schema for writting log records to be modified by the administrator
69+
* Reuse the authorization mechanisms that are available with the Cloudwatch output
70+
* Add a generator to support generating collector configuration based upon the spec defined by the **ClusterLogForwarder** API
71+
72+
73+
### Workflow Description
74+
75+
**Cluster administrator** is a human responsible for administering the **cluster-logging-operator**
76+
and **ClusterLogForwarders**
77+
78+
1. The cluster administrator creates an S3 bucket on their host platform (i.e. AWS)
79+
1. The cluster administrator grants a platform role (i.e. IAM Role) the permissions to write to the S3 bucket
80+
1. The cluster administrator deployes the cluster-logging-operator if it is already not deployed
81+
1. The cluster administrator edits or creates a **ClusterLogForwarder** and defines an S3 output
82+
1. The cluster administrator references the S3 output in a pipeline
83+
1. The cluster-logging-operator reconciles the **ClusterLogForwarder**, generates a new collector configuration,
84+
and updates the collector deployment
85+
86+
### API Extensions
87+
88+
#### ClusterLogForwarder API
89+
90+
```yaml
91+
apiVersion: "observability.openshift.io/v1"
92+
kind: ClusterLogForwarder
93+
spec:
94+
outputs:
95+
- name:
96+
type: s3 # add s3 to the enum
97+
s3:
98+
url: # (optional) string is an alternate to the well-known AWS endpoints
99+
region: # (optional) string that is different from the configured service default
100+
bucket: # string for the S3 bucket absent leading 's3://' or trailing '/' and
101+
# truncated to 63 characters to meet length restrictions
102+
keyPrefix: # (optional) templated string (see note 1)
103+
authentication:
104+
type: # enum: awsAccessKey, iamRole
105+
awsAccessKey:
106+
assumeRole: # (optional)
107+
roleARN: # secret reference
108+
externalID: # (optional) secret reference
109+
iamRole:
110+
roleARN: # secret reference
111+
token: # bearer token
112+
assumeRole: # (optional)
113+
roleARN: # secret reference
114+
externalID: # (optional)string
115+
tuning:
116+
deliveryMode: # (optional) enum: atLeastOnce, atMostOnce
117+
maxWrite: # (optional) quantity (e.g. 500k)
118+
compression: # (optional) none, gzip,zstd,snappy,zlib
119+
minRetryDuration: # (optional) duration
120+
maxRetryDuration: # (optional) duration
121+
```
122+
123+
**Note 1:** A combination of date formatters, static or dynamic values consisting of field paths followed by "||" followed by another field path or a static value (e.g `foo.{"%Y-%m-%d"}/{.bar.baz||.qux.quux.corge||.grault||"nil"}-waldo.fred{.plugh||"none"}`)
124+
125+
Date formatters are specified using one or more of the following subset of [chrono](https://docs.rs/chrono/latest/chrono/format/strftime/index.html#specifiers)
126+
specifiers to format the `.timestamp` field value:
127+
128+
| Spec | Example | Description |
129+
|------|---------|-------------|
130+
| %F | 2001-07-08| Year-month-day format (ISO 8601). Same as %Y-%m-%d.|
131+
| %Y | 2001 |The full proleptic Gregorian year, zero-padded to 4 digits
132+
| %m | 07 | Month number (01–12), zero-padded to 2 digits.|
133+
| %d |08|Day number (01–31), zero-padded to 2 digits.|
134+
| %H |00|Hour number (00–23), zero-padded to 2 digits.|
135+
| %M |34|Minute number (00–59), zero-padded to 2 digits.|
136+
| %S |60|Second number (00–60), zero-padded to 2 digits.|
137+
138+
The collector will write logs to the s3 bucket defaulting the key prefix that is constructed using attributes of the log entries when not defined by the **ClusterLogForwarder** spec as follows:
139+
140+
| log type| log source | key prefix |
141+
| --- | --- | --- |
142+
| Application | container |`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<namespace_name>/<pod_name>/<container_name>/`|
143+
| Infrastructure | container|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<namespace_name>/<pod_name>/<container_name>/`|
144+
| Infrastructure | node (Journal)|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<host_name>/`|
145+
| Audit | auditd|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/<host_name>/`|
146+
| Audit | kubeAPI|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/`|
147+
| Audit | openshiftAPI|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/`|
148+
| Audit | ovn|`<cluster_id>/<yyyy-mm-dd>/<log_type>/<log_source>/`|
149+
150+
**Note 2:** The collector will encode events as [JSON](https://www.rfc-editor.org/rfc/rfc8259)
151+
152+
### Topology Considerations
153+
154+
#### Hypershift / Hosted Control Planes
155+
156+
157+
#### Standalone Clusters
158+
159+
160+
#### Single-node Deployments or MicroShift
161+
162+
### Implementation Details/Notes/Constraints
163+
164+
Implementation includes:
165+
166+
* `ClusterLogForwarder` API updates
167+
* Log collector config generator updates with S3 code config template additions
168+
169+
### Risks and Mitigations
170+
171+
This feature is being requested by HCP with a very short deadline for providing a deliverable. This change
172+
is dependent upon another change that introduces "assumeRole" functionality which has not been completed. The
173+
risk to the Logging team is HCP may choose to utilize an alternate product if these changes can not be realized
174+
within their time constraints.
175+
176+
### Drawbacks
177+
178+
The drawbacks to this change is we may be providing users with an alternative to the product's LokiStack
179+
offereing which may delay its adoption. The feature set of the receivers addresses separate usecases but
180+
this choice may be construed as a "cheap" or "simple" alternative.
181+
182+
Additionally, this change may be interpreted as a "reliable" delivery mechanism for forwarding logs which
183+
is still misleading. The OpenShift logging product is not a guaranted log collection and storage system and this
184+
output will remain subject to the same set of limitations as all other outputs.
185+
186+
Lastly, using this output provides no mechanism to query log records in a useful manner that is offered by other outputs (i.e. LokiStack). The available "metadata" is dependent upon the definition of the "keyPrefix" when the logs are written to S3. If the "keyPrefix" does not provide useful way to organize the data then retrieval of that data will be challenging.
187+
188+
## Alternatives (Not Implemented)
189+
190+
191+
## Open Questions [optional]
192+
193+
1. Do we need to support `filename_time_format` to address the key prefix functionality proposed by the draft [PR](https://github.com/openshift/cluster-logging-operator/pull/3096)
194+
* All indicators are that we need some way to provide a way for users to inject a formatted date into the "keyPrefix" field in order to provide logical organization of the records when written to the bucket
195+
2. Is there a need to introduce this feature as tech-preview with a `v2beta1` API to allow the "soak" time for the API and additional testing?
196+
197+
## Test Plan
198+
199+
Aside from the usual testing by logging QE, the intent is to deploy, potentially early candidate releases, to the HCP environment in order to exercise their S3 lambda design
200+
201+
## Graduation Criteria
202+
203+
204+
### Dev Preview -> Tech Preview
205+
206+
- Ability to utilize the enhancement end to end
207+
- End user documentation, relative API stability
208+
- Sufficient test coverage
209+
- Gather feedback from users rather than just developers
210+
- Enumerate service level indicators (SLIs), expose SLIs as metrics
211+
- Write symptoms-based alerts for the component(s)
212+
213+
### Tech Preview -> GA
214+
215+
- More testing (upgrade, downgrade, scale)
216+
- Sufficient time for feedback
217+
- Available by default
218+
- Backhaul SLI telemetry
219+
- Document SLOs for the component
220+
- Conduct load testing
221+
- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/)
222+
223+
**For non-optional features moving to GA, the graduation criteria must include
224+
end to end tests.**
225+
226+
### Removing a deprecated feature
227+
228+
## Upgrade / Downgrade Strategy
229+
230+
231+
## Version Skew Strategy
232+
233+
## Operational Aspects of API Extensions
234+
235+
## Support Procedures
236+
237+
238+
## Infrastructure Needed [optional]
239+
240+
HCP deployment

0 commit comments

Comments
 (0)