Add Tekton tasks to install and scale Karpenter #538

DerekFrank · 2025-07-29T21:19:34Z

Issue #, if available:

Description of changes:

This change introduces the tasks necessary to install and leverage Karpenter to scale a cluster. This was tested in a dev cluster

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tests/tekton-resources/tasks/generators/karpenter/kubectl-drift.yaml

tests/tekton-resources/tasks/generators/karpenter/kubectl-scale.yaml

tests/tekton-resources/tasks/generators/karpenter/kubectl-cluster-wait.yaml

tests/tekton-resources/tasks/setup/karpenter/awscli-instanceprofiles.yaml

tests/tekton-resources/tasks/setup/karpenter/awscli-node-role.yaml

tests/tekton-resources/tasks/setup/karpenter/helm-karpenter-install.yaml

tests/tekton-resources/tasks/teardown/karpenter/awscli-instanceprofiles.yaml

tests/tekton-resources/tasks/generators/karpenter/kubectl-scale.yaml

hakuna-matatah · 2025-09-17T02:26:06Z

tests/tekton-resources/pipelines/eks/karpenter-ultra.yaml

+kind: Pipeline
+apiVersion: tekton.dev/v1
+metadata:
+  name: derekff-karpenter-testing


Let's use generic naming please.

also can we name the pipeline.yaml file to karpenter-titan-pipeline or something that actually says what the file is ?

I was mostly including this as an example file. Where do you actually want the pipeline? Also, we can't put the pipeline in OSS because it has to contain secrets that I've manually removed like the slack hook url

We check in pipelines here - https://github.com/awslabs/kubernetes-iteration-toolkit/tree/main/tests/tekton-resources/pipelines/eks

For secrets and all you don't have to define the param value in the pipeline. In tekton, you have an option of not specifying defaults for a parameter.

hakuna-matatah · 2025-09-17T16:34:07Z

tests/assets/karpenter/nodepool.yaml

+        operator: In
+        values:
+        - medium
+        - large


why do we need larges for the purposes of karpenter test ?

EC2 indicated that they want us to use a variety of instances to prevent ICEing

IIRC its the other way around, when we want to leverage larges, they raised concern about ICEing , hence medium in the mix, if we just go with Medium, they should be fine iiuc. I was just questioning from the cost perspective give the large scale test. We should avoid larges unless tests absolutely demands it. We don't have to be blocked for this PR purposes, but please follow back on this.

I will remove larges and let you follow up with EC2 on the cost item

hakuna-matatah · 2025-09-17T17:10:21Z

tests/tekton-resources/tasks/setup/karpenter/awscli-mng.yaml

+          --cluster-name $(params.cluster-name) \
+          --nodegroup-name karpenter-system-large \
+          --node-role arn:aws:iam::$(params.aws-account-id):role/$(params.cluster-name)-node-role \
+          --instance-types r5.24xlarge \


shouldn't this be a a param ? so depending on the scale of the cluster, we can put Karp node proportionally on node type that makes sense ?

It could be, I don't think it needs to be until we have a usecase. There's no harm in using a large instance for now and tuning it later. We'd also have to tune the karpenter install requests which we don't right now

There are already plans to run this at smaller scale as well. But i am ok with punting it for now, if you want to follow it up later.

Lets circle back on that when we decide the scale we want to run it at

hakuna-matatah · 2025-09-17T17:12:06Z

tests/tekton-resources/tasks/setup/karpenter/helm-karpenter-install.yaml

+        --set settings.preferencePolicy=Ignore \
+        --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::$(params.aws-account-id):role/KarpenterControllerRole-$(params.cluster-name)" \
+        --set controller.resources.requests.cpu=60 \
+        --set controller.resources.requests.memory=200Gi \


resource allocation to controller, should be dependent on the size of the cluster ? can we parameterize these ?

We can parameterize everything. Lets not until we have a usecase. Its an easy fast follow

we do have a use to run it at smaller scale, hence the question. We can punt it for later as mentioned in other comment.

hakuna-matatah · 2025-09-17T17:15:41Z

tests/tekton-resources/tasks/teardown/awscli-eks.yaml

+        aws iam delete-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-$(params.cluster-name)"
+        echo "Instance profile KarpenterNodeInstanceProfile-$(params.cluster-name) deleted successfully."
+      else
+        echo "Instance profile KarpenterNodeInstanceProfile-$(params.cluster-name) does not exist. Skipping deletion..."


just curious if you have tested to see if this would fall into this else when instance profile couldn't be deleted successfully ?

It falls into the else when the instance profile doesn't exist, which is the desired state. It errs when the instance profile can't be deleted, which is the desired state

hakuna-matatah · 2025-09-18T17:10:55Z

tests/tekton-resources/tasks/teardown/awscli-eks.yaml

          echo "Role $PIA_ROLE_NAME does not exist, no action needed."
        fi
      done
+  - name: delete-karpenter-role


lets have a separate tear down for ultra clusters to ensure it won't break exisitng pipelines when there is a bug.

DerekFrank added 13 commits June 16, 2025 11:17

add script to validate that karpenter will not drift a cluster

f18afea

WIP draft of karpenter setup

f9abe30

Merge branch 'awslabs:main' into main

207628d

fix env var in policy templates

2513ce5

fix env var in policy templates 2

02618b5

fixing env vars 3

621267e

Adding nodepool and nodeclass yaml

6399244

Fixing nodepool name

f6fb725

Fixing nodepool AZ

bb40c79

Tested tasks

321a192

Merge branch 'awslabs:main' into main

6e637ab

Removing erroneously included script

b2cd61e

updating hardcoded version

c303454

hakuna-matatah reviewed Aug 20, 2025

View reviewed changes

updating budget for nodepool

c546b89

DerekFrank force-pushed the main branch 2 times, most recently from 1f0c547 to e7800e3 Compare September 16, 2025 23:48

hakuna-matatah reviewed Sep 17, 2025

View reviewed changes

DerekFrank force-pushed the main branch from e7800e3 to d39fda0 Compare September 18, 2025 17:02

hakuna-matatah previously approved these changes Sep 18, 2025

View reviewed changes

review feedback

47a83f3

hakuna-matatah reviewed Sep 18, 2025

View reviewed changes

DerekFrank dismissed hakuna-matatah’s stale review via 47a83f3 September 18, 2025 17:11

DerekFrank force-pushed the main branch from d39fda0 to 47a83f3 Compare September 18, 2025 17:11

hakuna-matatah approved these changes Sep 18, 2025

View reviewed changes

hakuna-matatah merged commit f999043 into awslabs:main Sep 18, 2025
4 checks passed

Add Tekton tasks to install and scale Karpenter #538

Add Tekton tasks to install and scale Karpenter #538

Uh oh!

Conversation

DerekFrank commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DerekFrank Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DerekFrank Sep 17, 2025 •

edited

Loading