Skip to content

Service Account Token for Image Pulls doesn't work #1280

@jicowan

Description

@jicowan

What happened:
I enabled the KubeletServiceAccountTokenForCredentialProviders feature gate on the kubelet and created the following CredentialProviderConfig:

{
  "apiVersion": "kubelet.config.k8s.io/v1",
  "kind": "CredentialProviderConfig",
  "providers": [
    {
      "name": "ecr-credential-provider",
      "matchImages": [
        "${ACCOUNT}.dkr.ecr.*.amazonaws.com",
        "${ACCOUNT}.dkr.ecr-*.*.amazonaws.com"
      ],
      "defaultCacheDuration": "12h",
      "apiVersion": "credentialprovider.kubelet.k8s.io/v1",
      "args": ["get-credentials"],
      "env": [
        {
          "name": "AWS_REGION",
          "value": "${REGION}"
        }
      ],
      "tokenAttributes": {
        "serviceAccountTokenAudience": "sts.amazonaws.com",
        "cacheType": "Token",
        "requireServiceAccount": false,
        "optionalServiceAccountAnnotationKeys": ["eks.amazonaws.com/role-arn"]
      }
    }
  ]
}

With this configuration, the provider appears to look for the project token in ALL pods instead on the pods from my private registry. I couldn't find a way to tell the provider to use the EC2 instance profile to pull the images (I believe it should do that when no token is found, i.e. when the pod is does not have a service account that is mapped to an IAM role using IRSA). The following appears repeatedly in the kubelet logs:

Oct 14 16:42:44 ip-192-168-40-236.us-west-2.compute.internal kubelet[3037]: E1014 16:42:44.210176    3037 pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"aws-vpc-cni-init\" with ImagePullBackOff: \"Back-off pulling image \\\"602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.20.1-eksbuild.3\\\": ErrImagePull: failed to pull and unpack image \\\"602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.20.1-eksbuild.3\\\": failed to resolve reference \\\"602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.20.1-eksbuild.3\\\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials\"" pod="kube-system/aws-node-f7sf8" podUID="226e4610-501f-485d-a173-b96d596e5285"
Oct 14 16:42:47 ip-192-168-40-236.us-west-2.compute.internal kubelet[3037]: E1014 16:42:47.645317    3037 kubelet.go:3021] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

If I try running the ecr-credential-provider manually with a pod that uses IRSA, I am able to get a ECR login.

What you expected to happen:
I expect the provider to fall back to the instance profile of the node if the token is not present.

Anything else we need to know?:
Here's the userdata I'm using:

#!/bin/bash
set -ex

# Disable the automatic nodeadm service since we'll run it manually
systemctl disable nodeadm-config.service || true

# Get cluster details from EC2 tags and EKS API
REGION=$(ec2-metadata --availability-zone | sed 's/placement: \(.*\).$/\1/')
INSTANCE_ID=$(ec2-metadata --instance-id | cut -d " " -f 2)
CLUSTER_NAME=$(aws ec2 describe-tags --region ${REGION} --filters "Name=resource-id,Values=${INSTANCE_ID}" "Name=key,Values=eks:cluster-name" --query 'Tags[0].Value' --output text)

# Get cluster details from EKS
API_SERVER=$(aws eks describe-cluster --region ${REGION} --name ${CLUSTER_NAME} --query 'cluster.endpoint' --output text)
CA_CERT=$(aws eks describe-cluster --region ${REGION} --name ${CLUSTER_NAME} --query 'cluster.certificateAuthority.data' --output text)
CIDR=$(aws eks describe-cluster --region ${REGION} --name ${CLUSTER_NAME} --query 'cluster.kubernetesNetworkConfig.serviceIpv4Cidr' --output text)
DNS_IP=$(echo ${CIDR} | awk -F'.' '{print $1"."$2"."$3".10"}')

# Generate NodeConfig and run nodeadm
cat > /tmp/nodeadm-config.yaml <<EOF
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: ${CLUSTER_NAME}
    apiServerEndpoint: ${API_SERVER}
    certificateAuthority: ${CA_CERT}
    cidr: ${CIDR}
  kubelet:
    config:
      clusterDNS:
      - ${DNS_IP}
      featureGates:
        ContainerCheckpoint: true
        KubeletServiceAccountTokenForCredentialProviders: true
EOF

# Run nodeadm with the generated config
# nodeadm will generate the default credential provider config
/usr/bin/nodeadm init --config-source file:///tmp/nodeadm-config.yaml

# Now overwrite the ECR credential provider config to include tokenAttributes
# This happens AFTER nodeadm has finished initialization
cat > /etc/eks/image-credential-provider/config.json <<EOF
{
  "apiVersion": "kubelet.config.k8s.io/v1",
  "kind": "CredentialProviderConfig",
  "providers": [
    {
      "name": "ecr-credential-provider",
      "matchImages": [
        "820537372947.dkr.ecr.*.amazonaws.com",
        "820537372947.dkr.ecr-*.*.amazonaws.com"
      ],
      "defaultCacheDuration": "12h",
      "apiVersion": "credentialprovider.kubelet.k8s.io/v1",
      "args": ["get-credentials"],
      "env": [
        {
          "name": "AWS_REGION",
          "value": "${REGION}"
        }
      ],
      "tokenAttributes": {
        "serviceAccountTokenAudience": "sts.amazonaws.com",
        "cacheType": "Token",
        "requireServiceAccount": false,
        "optionalServiceAccountAnnotationKeys": ["eks.amazonaws.com/role-arn"]
      }
    }
  ]
}
EOF

# Restart kubelet to pick up the new credential provider config
systemctl restart kubelet

Environment:

  • Kubernetes version (use kubectl version): 1.34
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): EKS Optimized AMI with custom launch template
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions