From e14dc5cefd7244eac0d1a324014334064f7c42b8 Mon Sep 17 00:00:00 2001 From: ryojsb Date: Thu, 10 Jul 2025 15:13:57 +0900 Subject: [PATCH 1/3] added new README file and added the line to README.md Signed-off-by: ryojsb --- quickstart/README-step-by-step.md | 346 ++++++++++++++++++++++++++++++ quickstart/README.md | 2 + 2 files changed, 348 insertions(+) create mode 100644 quickstart/README-step-by-step.md diff --git a/quickstart/README-step-by-step.md b/quickstart/README-step-by-step.md new file mode 100644 index 0000000..37647b0 --- /dev/null +++ b/quickstart/README-step-by-step.md @@ -0,0 +1,346 @@ +# llm-d Quick Start - Step by step + +Getting Started with llm-d through step by step procedures. + +This guide will walk you through the steps to install and deploy llm-d on a Kubernetes cluster, with the place of customization. + +## Client Configuration + +### Required tools + +Following prerequisite are required for the installer to work. + +- [Helm – quick-start install](https://helm.sh/docs/intro/install/) +- [kubectl – install & setup](https://kubernetes.io/docs/tasks/tools/install-kubectl/) + + +### Required credentials and configuration + +- [HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) + +> Depending on which model you use, you have to visit Hugging Face and +> accept the usage terms to pull this with your HF token if you have not already done so. + +### Target Platform + +Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, this installation procedure will be updated to support them. + +## llm-d Installation + +This document instruct you the totally following 4 steps to deploy llm-d. + +1. Installing GAIE Kubernetes infrastructure +2. Installing Network stack +3. Creating HF token secret +4. Installing llm-d + +Before proceeding with the installation, ensure you have completed the prerequisites and are able to issue kubectl commands to your cluster by configuring your ~/.kube/config file. + +### 1. Installing GAIE Kubernetes infrastructure + +Apply CRDs for Gateway API. + +```bash +kubectl apply -k https://github.com/llm-d/llm-d-inference-scheduler/deploy/components/crds-gateway-api +``` + +Then, Apply CRDs for Gateway API Inference Extention. + +```bash +kubectl apply -k https://github.com/llm-d/llm-d-inference-scheduler/deploy/components/crds-gie +``` + +### 2. Installing Network stack + +Currently you can choose the network stack from either [istio](https://istio.io/) or [kgateway](https://kgateway.dev/). + +Select the appropriate option for your environment. + +#### Installing istio + +To beging with, export the environmental variables. + +Before doing this, please check the appropriate hub and tag from the link below. + +https://github.com/llm-d/llm-d-deployer/blob/main/chart-dependencies/istio/install.sh + +```bash +export TAG=1.27-alpha.0551127f00634403cddd4634567e65a8ecc499a7 +export HUB=gcr.io/istio-testing +``` + +Then deploy istio-base. + +```bash +helm upgrade -i istio-base oci://$HUB/charts/base --version $TAG -n istio-system --create-namespace +``` + +After that, deploy istiod. + +```bash +helm upgrade -i istiod oci://$HUB/charts/istiod --version $TAG -n istio-system --set tag=$TAG --set hub=$HUB --wait +``` + +The resources are created as follows. + +```bash +kubectl get pods,svc -n istio-system +``` + +```bash +NAME READY STATUS RESTARTS AGE +pod/istiod-774dfd9b6-xxngd 1/1 Running 0 41s + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/istiod ClusterIP [Cluster IP] 15010/TCP,15012/TCP,443/TCP,15014/TCP 41s +``` + +#### Installing kgateway + +Apply kgateway CRD. + +```bash +helm upgrade -i \ + --namespace kgateway-system \ + --create-namespace \ + --version v2.0.3 \ + kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds +``` + +After that, deploy kgateway. + +```bash +helm upgrade -i \ + --namespace kgateway-system \ + --create-namespace \ + --version v2.0.3 \ + --set inferenceExtension.enabled=true \ + --set securityContext.allowPrivilegeEscalation=false \ + --set securityContext.capabilities.drop={ALL} \ + --set podSecurityContext.seccompProfile.type=RuntimeDefault \ + --set podSecurityContext.runAsNonRoot=true \ + kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway +``` + +The resources are created as follows. + +```bash +kubectl get pods,svc -n kgateway-system +NAME READY STATUS RESTARTS AGE +pod/kgateway-ddbb7668c-8vjz8 1/1 Running 0 114s + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/kgateway ClusterIP [Cluster IP] 9977/TCP 114s +``` + +### 3. Creating HF token secret + +Create a namespace to deploy llm-d. + +```bash +export NAMESPACE="llm-d" +kubectl create ns "${NAMESPACE}" +``` + +Then create a secret to clone the models from HuggingFace. + +```bash +export HF_TOKEN="" +kubectl create secret generic llm-d-hf-token \ + --namespace "${NAMESPACE}" \ + --from-literal=HF_TOKEN="${HF_TOKEN}" \ + --dry-run=client -o yaml | kubectl apply -n "${NAMESPACE}" -f - +``` + +### 4. Installing llm-d + +Apply modelservice CRD. + +```bash +kubectl apply -f https://raw.githubusercontent.com/llm-d/llm-d-deployer/refs/heads/main/charts/llm-d/crds/modelservice-crd.yaml +``` + + +Clone the llm-d-deployer repository and change directory. + +```bash +git clone https://github.com/llm-d/llm-d-deployer.git +cd llm-d-deployer/charts/llm-d +``` + +Resolve the helm package's dependencies. + +```bash +helm repo add bitnami https://charts.bitnami.com/bitnami +helm dependency build . +``` + +We have everything we need to deploy llm-d. + +Important: The installation command and its options differ depending on the Network Stack selected in step 2. + + +#### with istio + +```bash +helm upgrade -i llm-d . --namespace "${NAMESPACE}" \ +--set gateway.gatewayClassName=istio \ +--set gateway.kGatewayParameters.proxyUID=0 \ +--set ingress.clusterRouterBase="" \ +--set modelservice.metrics.enabled=false \ +--set modelservice.epp.metrics.enabled=false \ +--set modelservice.vllm.metrics.enabled=false \ +--set sampleApplication.enabled=false +``` + +If you've already deployed kube-prometheus-stack, you can deploy llm-d with `modelservice.metrics.enabled=true` option to create ServiceMonitor resources. + +```bash +helm upgrade -i llm-d . --namespace "${NAMESPACE}" \ +--set gateway.gatewayClassName=istio \ +--set gateway.kGatewayParameters.proxyUID=0 \ +--set ingress.clusterRouterBase="" \ +--set modelservice.metrics.enabled=true \ +--set sampleApplication.enabled=false +``` + +llm-d resources are created as below. + +```bash +kubectl get pods,svc,gateway -n llm-d +``` + +```bash +NAME READY STATUS RESTARTS AGE +pod/llm-d-inference-gateway-istio-69cbf58fb4-ckzkw 1/1 Running 0 58s +pod/llm-d-modelservice-574d4f76b8-98qpv 1/1 Running 0 59s +pod/llm-d-redis-master-5f77dd4bf9-4s5sp 1/1 Running 0 59s + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/llm-d-inference-gateway-istio ClusterIP [Cluster IP] 15021/TCP,80/TCP 58s +service/llm-d-modelservice ClusterIP [Cluster IP] 8443/TCP 59s +service/llm-d-redis-headless ClusterIP None 8100/TCP 59s +service/llm-d-redis-master ClusterIP [Cluster IP] 8100/TCP 59s + +NAME CLASS ADDRESS PROGRAMMED AGE +gateway.gateway.networking.k8s.io/llm-d-inference-gateway istio llm-d-inference-gateway-istio.llm-d.svc.cluster.local True 59s +``` + +#### with kgateway + +```bash +helm upgrade -i llm-d . --namespace "${NAMESPACE}" \ +--set gateway.gatewayClassName=kgateway \ +--set gateway.kGatewayParameters.proxyUID=0 \ +--set ingress.clusterRouterBase="" \ +--set modelservice.metrics.enabled=false \ +--set modelservice.epp.metrics.enabled=false \ +--set modelservice.vllm.metrics.enabled=false \ +--set sampleApplication.enabled=false +``` + +If you've already deployed kube-prometheus-stack, you can deploy llm-d with `modelservice.metrics.enabled=true` option to create ServiceMonitor resources. + +```bash +helm upgrade -i llm-d . --namespace "${NAMESPACE}" \ +--set gateway.gatewayClassName=kgateway \ +--set gateway.kGatewayParameters.proxyUID=0 \ +--set ingress.clusterRouterBase="" \ +--set modelservice.metrics.enabled=true \ +--set sampleApplication.enabled=false +``` + +llm-d resources are created as below. + +```bash +kubectl get pods,svc,gateway -n llm-d +``` + +```bash +NAME READY STATUS RESTARTS AGE +pod/llm-d-inference-gateway-6c5786bf77-rtpgq 1/1 Running 0 102s +pod/llm-d-modelservice-57d64db5c8-cqfkn 1/1 Running 0 102s +pod/llm-d-redis-master-5f85898675-wgq4d 1/1 Running 0 102s + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/llm-d-inference-gateway LoadBalancer [Cluster IP] [LoadBalancer IP] 80:31924/TCP 103s +service/llm-d-modelservice ClusterIP [Cluster IP] 8443/TCP 103s +service/llm-d-redis-headless ClusterIP None 8100/TCP 103s +service/llm-d-redis-master ClusterIP [Cluster IP] 8100/TCP 103s + +NAME CLASS ADDRESS PROGRAMMED AGE +gateway.gateway.networking.k8s.io/llm-d-inference-gateway kgateway [LoadBalancer IP] True 103s +``` + +## Validation + +Currently, You can apply ModelService to deploy inference service. + +This is the example of ModelService CR. + +```YAML +apiVersion: llm-d.ai/v1alpha1 +kind: ModelService +metadata: + name: meta-llama-llama-3-2-3b-instruct + namespace: llm-d +spec: + baseConfigMapRef: + name: basic-gpu-with-nixl-and-redis-lookup-preset + modelArtifacts: + uri: hf://meta-llama/Llama-3.2-3B-Instruct + prefill: + containers: + - args: + - --served-model-name + - meta-llama/Llama-3.2-3B-Instruct + env: + - name: HF_TOKEN + valueFrom: + secretKeyRef: + key: HF_TOKEN + name: llm-d-hf-token + name: vllm + resources: + limits: + nvidia.com/gpu: "1" + replicas: 1 + decode: + containers: + - args: + - --served-model-name + - meta-llama/Llama-3.2-3B-Instruct + env: + - name: HF_TOKEN + valueFrom: + secretKeyRef: + key: HF_TOKEN + name: llm-d-hf-token + name: vllm + resources: + limits: + nvidia.com/gpu: "1" + replicas: 1 + endpointPicker: + containers: + - name: epp + replicas: 1 + routing: + modelName: meta-llama/Llama-3.2-3B-Instruct + decoupleScaling: false +``` + +ModelService resources are created. + +```bash +kubectl get pods -n llm-d +``` +```bash +NAME READY STATUS RESTARTS AGE +llm-d-inference-gateway-istio-69cbf58fb4-ckzkw 1/1 Running 0 19m +llm-d-modelservice-574d4f76b8-98qpv 1/1 Running 0 19m +llm-d-redis-master-5f77dd4bf9-4s5sp 1/1 Running 0 19m +meta-llama-llama-3-2-3b-instruct-decode-6f5c75fc45-rbndl 2/2 Running 0 32s +meta-llama-llama-3-2-3b-instruct-epp-6f5556dddd-x99s5 1/1 Running 0 32s +meta-llama-llama-3-2-3b-instruct-prefill-d85997579-f7mts 1/1 Running 0 32s +``` \ No newline at end of file diff --git a/quickstart/README.md b/quickstart/README.md index 253371f..7aa4663 100644 --- a/quickstart/README.md +++ b/quickstart/README.md @@ -2,6 +2,8 @@ Getting Started with llm-d on Kubernetes. For specific instructions on how to install llm-d on minikube, see the [README-minikube.md](README-minikube.md) instructions. +If you want to deploy llm-d and related tools step by step, see the [README-step-by-step.md](README-step-by-step.md) instructions. + For more information on llm-d, see the llm-d git repository [here](https://github.com/llm-d/llm-d) and website [here](https://llm-d.ai). ## Overview From 2672c3038ec588a299272cdfbb0a5fdc2f8114ba Mon Sep 17 00:00:00 2001 From: ryojsb Date: Wed, 16 Jul 2025 01:16:59 +0900 Subject: [PATCH 2/3] modified the document to pass the markdown lint --- quickstart/README-step-by-step.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/quickstart/README-step-by-step.md b/quickstart/README-step-by-step.md index 37647b0..9a912a8 100644 --- a/quickstart/README-step-by-step.md +++ b/quickstart/README-step-by-step.md @@ -13,7 +13,6 @@ Following prerequisite are required for the installer to work. - [Helm – quick-start install](https://helm.sh/docs/intro/install/) - [kubectl – install & setup](https://kubernetes.io/docs/tasks/tools/install-kubectl/) - ### Required credentials and configuration - [HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) @@ -58,11 +57,9 @@ Select the appropriate option for your environment. #### Installing istio -To beging with, export the environmental variables. - -Before doing this, please check the appropriate hub and tag from the link below. +To begin with, export the environmental variables. -https://github.com/llm-d/llm-d-deployer/blob/main/chart-dependencies/istio/install.sh +Before doing this, please check the appropriate hub and tag from [the istio installation script](https://github.com/llm-d/llm-d-deployer/blob/main/chart-dependencies/istio/install.sh). ```bash export TAG=1.27-alpha.0551127f00634403cddd4634567e65a8ecc499a7 @@ -160,7 +157,6 @@ Apply modelservice CRD. kubectl apply -f https://raw.githubusercontent.com/llm-d/llm-d-deployer/refs/heads/main/charts/llm-d/crds/modelservice-crd.yaml ``` - Clone the llm-d-deployer repository and change directory. ```bash @@ -179,7 +175,6 @@ We have everything we need to deploy llm-d. Important: The installation command and its options differ depending on the Network Stack selected in step 2. - #### with istio ```bash @@ -335,6 +330,7 @@ ModelService resources are created. ```bash kubectl get pods -n llm-d ``` + ```bash NAME READY STATUS RESTARTS AGE llm-d-inference-gateway-istio-69cbf58fb4-ckzkw 1/1 Running 0 19m From 9104caba734a9b8ebc0f6d5c60b507ff9fe997f0 Mon Sep 17 00:00:00 2001 From: ryojsb Date: Wed, 16 Jul 2025 03:21:16 +0000 Subject: [PATCH 3/3] modified typos --- quickstart/README-step-by-step.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/quickstart/README-step-by-step.md b/quickstart/README-step-by-step.md index 9a912a8..eb93184 100644 --- a/quickstart/README-step-by-step.md +++ b/quickstart/README-step-by-step.md @@ -43,7 +43,7 @@ Apply CRDs for Gateway API. kubectl apply -k https://github.com/llm-d/llm-d-inference-scheduler/deploy/components/crds-gateway-api ``` -Then, Apply CRDs for Gateway API Inference Extention. +Then, Apply CRDs for Gateway API Inference Extension. ```bash kubectl apply -k https://github.com/llm-d/llm-d-inference-scheduler/deploy/components/crds-gie @@ -78,7 +78,7 @@ After that, deploy istiod. helm upgrade -i istiod oci://$HUB/charts/istiod --version $TAG -n istio-system --set tag=$TAG --set hub=$HUB --wait ``` -The resources are created as follows. +The resources are created as follows. ```bash kubectl get pods,svc -n istio-system @@ -119,7 +119,7 @@ helm upgrade -i \ kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway ``` -The resources are created as follows. +The resources are created as follows. ```bash kubectl get pods,svc -n kgateway-system @@ -325,7 +325,7 @@ spec: decoupleScaling: false ``` -ModelService resources are created. +ModelService resources are created. ```bash kubectl get pods -n llm-d