-
Notifications
You must be signed in to change notification settings - Fork 47
Add Tekton tasks to install and scale Karpenter #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/tekton-resources/tasks/generators/karpenter/kubectl-drift.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/generators/karpenter/kubectl-scale.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/generators/karpenter/kubectl-scale.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/generators/karpenter/kubectl-cluster-wait.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/setup/karpenter/awscli-instanceprofiles.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/setup/karpenter/awscli-node-role.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/setup/karpenter/awscli-node-role.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/setup/karpenter/helm-karpenter-install.yaml
Show resolved
Hide resolved
tests/tekton-resources/tasks/teardown/karpenter/awscli-instanceprofiles.yaml
Outdated
Show resolved
Hide resolved
tests/tekton-resources/tasks/generators/karpenter/kubectl-scale.yaml
Outdated
Show resolved
Hide resolved
1f0c547 to
e7800e3
Compare
| kind: Pipeline | ||
| apiVersion: tekton.dev/v1 | ||
| metadata: | ||
| name: derekff-karpenter-testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use generic naming please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can we name the pipeline.yaml file to karpenter-titan-pipeline or something that actually says what the file is ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly including this as an example file. Where do you actually want the pipeline? Also, we can't put the pipeline in OSS because it has to contain secrets that I've manually removed like the slack hook url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check in pipelines here - https://github.com/awslabs/kubernetes-iteration-toolkit/tree/main/tests/tekton-resources/pipelines/eks
For secrets and all you don't have to define the param value in the pipeline. In tekton, you have an option of not specifying defaults for a parameter.
tests/assets/karpenter/nodepool.yaml
Outdated
| operator: In | ||
| values: | ||
| - medium | ||
| - large No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need larges for the purposes of karpenter test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EC2 indicated that they want us to use a variety of instances to prevent ICEing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC its the other way around, when we want to leverage larges, they raised concern about ICEing , hence medium in the mix, if we just go with Medium, they should be fine iiuc. I was just questioning from the cost perspective give the large scale test. We should avoid larges unless tests absolutely demands it. We don't have to be blocked for this PR purposes, but please follow back on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove larges and let you follow up with EC2 on the cost item
| --cluster-name $(params.cluster-name) \ | ||
| --nodegroup-name karpenter-system-large \ | ||
| --node-role arn:aws:iam::$(params.aws-account-id):role/$(params.cluster-name)-node-role \ | ||
| --instance-types r5.24xlarge \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be a a param ? so depending on the scale of the cluster, we can put Karp node proportionally on node type that makes sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, I don't think it needs to be until we have a usecase. There's no harm in using a large instance for now and tuning it later. We'd also have to tune the karpenter install requests which we don't right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already plans to run this at smaller scale as well. But i am ok with punting it for now, if you want to follow it up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets circle back on that when we decide the scale we want to run it at
| --set settings.preferencePolicy=Ignore \ | ||
| --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::$(params.aws-account-id):role/KarpenterControllerRole-$(params.cluster-name)" \ | ||
| --set controller.resources.requests.cpu=60 \ | ||
| --set controller.resources.requests.memory=200Gi \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource allocation to controller, should be dependent on the size of the cluster ? can we parameterize these ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can parameterize everything. Lets not until we have a usecase. Its an easy fast follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have a use to run it at smaller scale, hence the question. We can punt it for later as mentioned in other comment.
| aws iam delete-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-$(params.cluster-name)" | ||
| echo "Instance profile KarpenterNodeInstanceProfile-$(params.cluster-name) deleted successfully." | ||
| else | ||
| echo "Instance profile KarpenterNodeInstanceProfile-$(params.cluster-name) does not exist. Skipping deletion..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious if you have tested to see if this would fall into this else when instance profile couldn't be deleted successfully ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It falls into the else when the instance profile doesn't exist, which is the desired state. It errs when the instance profile can't be deleted, which is the desired state
| echo "Role $PIA_ROLE_NAME does not exist, no action needed." | ||
| fi | ||
| done | ||
| - name: delete-karpenter-role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets have a separate tear down for ultra clusters to ensure it won't break exisitng pipelines when there is a bug.
Issue #, if available:
Description of changes:
This change introduces the tasks necessary to install and leverage Karpenter to scale a cluster. This was tested in a dev cluster
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.