Skip to content

Conversation

@mjudeikis
Copy link
Contributor

Summary

This PR adds detailed deployment documentation on how to deploy kcp in a production configuration.
/kind documentation

What Type of PR Is This?

Related Issue(s)

Fixes #

Release Notes

Production deployment documentation

@kcp-ci-bot kcp-ci-bot added kind/documentation Categorizes issue or PR as related to documentation. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the DCO. labels Nov 11, 2025
@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from mjudeikis. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 11, 2025
Comment on lines +40 to +43
garbageCollectionPolicy: Exponential
garbageCollectionPeriod: 43200s
deltaSnapshotPeriod: 300s
deltaSnapshotMemoryLimit: 1Gi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much important because this is about backups, but maybe it would be useful to have a comment why are these values chosen (e.g. if they're default, changed because it's better in prod this way, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was taken from etcd-druid docs. would be good if they could comment on this. Never teted this myself :/


The core `kcp-dev/kcp` repository is a monorepo containing the kcp core and some close to the core libraries.
See the [monorepo document](./monorepo/) for more details.
See the [monorepo document](./monorepo.md) for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need .md extension here? The web page URL doesn't have it, e.g. https://docs.kcp.io/kcp/main/contributing/monorepo/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdocs logs were complainings about this:

Doc file 'contributing/index.md' contains an unrecognized relative link './monorepo', it was left as is. Did you mean 'monorepo.md'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of these: It works due to backwards compatability but mkdocs gives warrning

@kcp-ci-bot
Copy link
Contributor

@mjudeikis: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kcp-test-e2e-sharded 269d3b4 link true /test pull-kcp-test-e2e-sharded

Full PR test history

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.


This directory contains assets and configuration files for production deployment of kcp.

!!! Note: We, understand that maintaining static assets in the repository can be challenging. If you have noticed any discrepancies between these assets and the latest version of the kcp - please open an issue or submit a pull request to help us keep them up to date.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!!! Note: We, understand that maintaining static assets in the repository can be challenging. If you have noticed any discrepancies between these assets and the latest version of the kcp - please open an issue or submit a pull request to help us keep them up to date.
!!! Note: We understand that maintaining static assets in the repository can be challenging. If you have noticed any discrepancies between these assets and the latest version of the kcp - please open an issue or submit a pull request to help us keep them up to date.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharding and High Availability use the term workload, I have a feeling this is kind of wrong and it associates with Pods, containers, etc. Maybe we should try to find a better term just to avoid confusion. (This was present prior to this PR, I just had a chance to read this doc as a whole)


**CloudFlare CA Certificate**: Download the CloudFlare edge certificate for extended trust:

**Important**: Verify this URL with CloudFlare documentation before production use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd put this in a note box.

- "*.alpha-peer.kcp-comer.svc"
- "*.alpha-peer.kcp-comer.svc.cluster.local"
issuerRef:
name: etcd-ca
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not using etcd-ca from the kcp-comer namespace, similar to etcd-tls-ca?

secretName: etcd-client-tls
commonName: etcd-client
issuerRef:
name: etcd-ca
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, why is this not using etcd-ca from the kcp-comer namespace, similar to etcd-tls-ca?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense that we put comments such as # CHANGE ME for fields that should be changed (e.g. dnsNames)? That way, people can easier find it, and we can say in docs search for CHANGE ME and change those values.

apiVersion: operator.kcp.io/v1alpha1
kind: FrontProxy
metadata:
name: frontproxy-internal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this called frontproxy-internal if it's exposed to the internet?

- https://root-client.kcp-comer.svc.cluster.local:2379
tlsConfig:
secretRef:
name: etcd-ca-tls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, shouldn't this have client TLS? Just CA is not enough to do things in etcd AFAIK.

username: kcp-admin
groups:
- system:kcp:admin
validity: 8766h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so it's clear:

Suggested change
validity: 8766h
validity: 8766h # 1 year

username: kcp-admin
groups:
- system:kcp:admin
validity: 8766h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
validity: 8766h
validity: 8766h # 1 year

kubectl apply -f contrib/production/kcp-comer/kcp-front-proxy-internal.yaml
```

**Get the LoadBalancer IP**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm a fan of bold-ing these, I think it would be better to have numbered list, like 1,2...

Deploy kcp with dual front-proxy and CDN integration for enterprise environments requiring edge acceleration and advanced networking.
---

# kcp-comer: Dual Front-Proxy with CDN Integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a little hard time understanding what do you gain by using CDN. The power of CDN is in caching assets, think of images, files... What would be cached here exactly?


### 5. Configure DNS for Front-Proxy

1. **Get the LoadBalancer IP**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is personally the way I like it, with numbered steps, but still, I think too much bold can be annoying.


```bash
# macOS
brew install int128/kubelogin/kubelogin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also mention krew as an option.

root https://root-kcp.kcp-dekker.svc.cluster.local:6443 https://api.dekker.example.com:6443 11m
```

### Install kubectl OIDC Plugin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have this for kcp-comer as well?


## Deployment Steps

### 1. Configure DNS Records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing to me, how can you create these records when you don't know the IP addresses yet?


Because we use Let's Encrypt, and since kubectl needs explisit CA configuration, we need to deploy kcp components with extended CA bundle trust. This mighgt be different in your environment.

```bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is duplicated

Comment on lines +143 to +145
```bash
curl -k https://api.vespucci.example.com:6443/healthz
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think this should be indented.

The main API endpoint for clients to access kcp. This is the main entry point for all external consumer clients.

**Communication patterns:**
- Configured in both shards and front-proxy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't read well to me, what did you mean exactly here?

- Set `--externalHostname` or `spec.external.hostname` in front-proxy or shard configurations

### Shards
Individual kcp shards that host workspaces. Shards can be exposed publicly or kept private.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to explain why you might want to consider having public or private shard.

### Shards
Individual kcp shards that host workspaces. Shards can be exposed publicly or kept private.

**Configuration:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are shards registering with the front proxy? Is it done automatically? Is there anything needed to configure? Might be useful to clarify that

### Virtual Workspaces
kcp supports running virtual workspaces outside shards, but the recommended approach is to run virtual workspaces inside shards.

**Configuration:**
Copy link
Member

@xmudrii xmudrii Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Configuration:**
**Configuration for external virtual workspaces:**

(I assume you don't need to do anything for internal VWs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you use to generate those diagrams? Are there sources anywhere?

https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.26/releases/cnpg-1.26.0.yaml
```

### 4.2. Deploy PostgreSQL Database
Copy link
Member

@xmudrii xmudrii Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just use etcd or even sqlite for Dex? PostgreSQL is quite an overhead, operational and resource usage wise.

-f contrib/production/oidc-dex/values.yaml
```

### 5. DNS Configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this done along the way in each of these setups?

issuer: https://auth.example.com

logger:
level: "debug"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is debug level really needed in prod? I'd guess something like info should be fine and safer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. kind/documentation Categorizes issue or PR as related to documentation. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants