Refine sandbox CRD design #23

flpanbin · 2025-09-15T03:49:18Z

Related issue #22

Background

Following the meeting on September 10th, 2025, we discussed contributing our production-tested sandbox design to the Kubernetes community. This design has been successfully used in production environments.

Overview

This PR introduces the initial draft CRD design for the Sandbox custom resource, which aims to provide a declarative, standardized API for managing isolated, stateful, singleton workloads - particularly ideal for AI agent runtimes and development environments.
We welcome community feedback and will iterate on the CRD design based on comments and suggestions.

Call for Community Feedback

We're actively seeking input from the community on:
API Design: Are the field names and structure intuitive?
Missing Features: What additional capabilities should we consider?
Use Cases: How does this align with your specific requirements?
Compatibility: Any concerns with existing Kubernetes patterns?
Please share your thoughts, suggestions, and concerns in the comments below!

k8s-ci-robot · 2025-09-15T03:49:26Z

Welcome @flpanbin!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

lengrongfu · 2025-09-16T06:22:45Z

@justinsb @janetkuo Based on our discussion last week, we have renamed some fields and removed obsolete ones. The Sandbox CRD has been refined to align with the new design. We welcome your feedback on the changes in this PR.

justinsb · 2025-09-16T13:22:44Z

Thanks for sharing @flpanbin ! There's a lot going on here, and I think one of the principles of the Sandbox CRD as a kubernetes project is that it should follow the kube design patterns and be relatively unopinionated. So right now we are using PodTemplate, because that's what Deployment and StatefulSet and Daemonset do. We're consuming the whole thing, even though maybe some fields are less relevant, because we want to enable people to build more opinionated layers on top. (For example, lots of companies build their own BigcoDeployment on top of Deployment, with just the features they want)

To move forwards I can think of these two ways:

Have your CRD create a Sandbox type, instead of creating a Pod/Deployment/whatever-it-is-currently-creating. Because Sandbox exposes the whole PodTemplate, this should be possible today, and where it isn't we want to add fields to Sandbox. Your CRD could live in your own repo or in our examples/ folder. If it is in our examples/ folder it would be nice to have a README describing the key fields you don't want to expose to end-users (or other reasons for creating an abstraction on top) - there are many good reasons, it just helps to motivate your CRD.
If you think your users could use Sandbox directly, except that Sandbox is missing some fields/features, then let's figure out what those are and add them more individually. I know status.conditions has come up before and @barney-s is adding them in active PRs, but I think we could use more use-cases for status.conditions to know what is important (for example, do you just want a Ready condition, or do you want more granular information).

k8s-ci-robot · 2025-09-17T07:53:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: flpanbin
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-09-17T07:53:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: flpanbin
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

flpanbin · 2025-09-17T08:07:22Z

@justinsb Thanks for the thoughtful feedback. I fully understand your viewpoint on following the Kubernetes design pattern. First, let me explain the reason why we chose custom fields instead of directly using PodTemplateSpec.

Simplified User Experience for Development Environment Use Case: Most users only need to specify image, resources, networking, and storage. Exposing the full PodTemplateSpec pushes them into container probes, security contexts, and other advanced settings that are hard to set correctly for this use case.
Security considerations: We intentionally limit configurable fields to avoid risky or confusing configurations (e.g., not exposing the full securityContext). This allows safer defaults while still enabling a functional environment.

I fully agree exposing PodTemplateSpec directly. The sandbox project’s scope is a general, Kubernetes-native abstraction for single-instance, stateful workloads; “developer environment” is just one use case on top of that. To improve the core Sandbox, I’d like to propose adding a few generic fields.

Proposal: Add a few fields to Sandbox

networking: Provide optional Service-level exposure and/or references to externally managed routes, without binding Sandbox to a specific ingress/gateway stack.
- Service exposure and ports:
  - Optional spec.networking.service block; if omitted, the controller does not create an external Service (headless Service for discovery can remain internal).
  - spec.networking.service.type: ClusterIP | NodePort | LoadBalancer
  - spec.networking.service.ports[] with:
    - name (string) — aligns with container port name for mapping
    - port (int32) — Service port
    - targetPort (int32|string, optional) — defaults to the named container port if omitted
    - protocol (TCP|UDP, default TCP)
- Optionally spec.networking.routeRefs[] to reference externally managed HTTPRoute/TCPRoute (Gateway API) or Ingress. Sandbox does not create these; it only references them and can surface reachability in status.
schedule: Allow an RFC3339 shutdownTime for automatic stop. This encodes a common lifecycle action for single-instance, stateful workloads.
pause: A boolean to explicitly stop the runtime (delete the Pod while preserving the object and persistent state), and resume when false. This aligns with the project’s focus on long-running, stateful, singleton workloads.

Status implications

If networking is added, surface reachability in status (e.g., URL/IP/ports/ready).
If schedule/pause are added, reflect lifecycle states via conditions (e.g., Stopping, Resuming, Scheduled, Ready) so callers can reason about transitions.

On alternative of creating a higher-level CRD example

If it is helpful to the user experience, we can add an example controller/CRD under examples/ in the future.

We’re open to adjusting details based on community feedback.

janetkuo

@flpanbin Thanks for breaking down the features. It's good to discuss the APIs first and then add each feature one by one. In general we prefer smaller PRs that can be reviewed more quickly, less likely to introduce bugs, and easier to roll back if needed.

spec.networking

With #9, sandbox creates a headless service automatically. Adding a networking field could be good for users to customize the network layer further.

spec.schedule

There's a related design proposal for TTL #21 which handles a similar case but in a different way. Let's discuss which we prefer (or do we need both?)
In terms of field name, I'd just call it something more explicit, perhaps shutdownTime. The term "schedule" could be confused with node scheduling, and is less clear on what this schedule is for.

spec.pause

This looks interesting. Would you provide more details on how the state is saved when pausing and restored when resuming? How is it different from the shutdownTime/TTL (pausing saves state, and shutdown doesn't)? This change is likely bigger and might require multiple PRs.

flpanbin · 2025-09-18T01:36:11Z

@janetkuo Thanks for the feedback! You're absolutely right about smaller, focused PRs being easier to review and less risky. I completely agree with that approach. I will close the PR and submit issues and PRs separately for each feature.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 15, 2025

k8s-ci-robot requested review from justinsb and soltysh September 15, 2025 03:49

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 15, 2025

flpanbin marked this pull request as draft September 15, 2025 05:33

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2025

flpanbin force-pushed the crd-design branch from d0a3fee to 7ccbcf0 Compare September 15, 2025 13:02

flpanbin changed the title ~~[Draft] Refine sandbox CRD design~~ Refine sandbox CRD design Sep 15, 2025

flpanbin marked this pull request as ready for review September 16, 2025 06:12

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 16, 2025

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2025

flpanbin closed this Sep 17, 2025

flpanbin force-pushed the crd-design branch from 7ccbcf0 to 5714e2d Compare September 17, 2025 07:51

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 17, 2025

refine sandbox crd design

9334084

flpanbin reopened this Sep 17, 2025

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 17, 2025

janetkuo reviewed Sep 17, 2025

View reviewed changes

flpanbin closed this Sep 18, 2025

barney-s mentioned this pull request Sep 18, 2025

Feature Request: TTL support for sandbox #18

Closed

janetkuo mentioned this pull request Sep 19, 2025

Add TTL support for sandbox #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refine sandbox CRD design #23

Refine sandbox CRD design #23

flpanbin commented Sep 15, 2025

Uh oh!

k8s-ci-robot commented Sep 15, 2025

Uh oh!

lengrongfu commented Sep 16, 2025

Uh oh!

justinsb commented Sep 16, 2025

Uh oh!

k8s-ci-robot commented Sep 17, 2025

Uh oh!

k8s-ci-robot commented Sep 17, 2025

Uh oh!

flpanbin commented Sep 17, 2025 •

edited

Loading

Uh oh!

janetkuo left a comment •

edited

Loading

Uh oh!

flpanbin commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refine sandbox CRD design #23

Refine sandbox CRD design #23

Conversation

flpanbin commented Sep 15, 2025

Background

Overview

Call for Community Feedback

Uh oh!

k8s-ci-robot commented Sep 15, 2025

Uh oh!

lengrongfu commented Sep 16, 2025

Uh oh!

justinsb commented Sep 16, 2025

Uh oh!

k8s-ci-robot commented Sep 17, 2025

Uh oh!

k8s-ci-robot commented Sep 17, 2025

Uh oh!

flpanbin commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal: Add a few fields to Sandbox

Status implications

On alternative of creating a higher-level CRD example

Uh oh!

janetkuo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flpanbin commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

flpanbin commented Sep 17, 2025 •

edited

Loading

janetkuo left a comment •

edited

Loading