logicalclocks · SirOibaf · Sep 2, 2025 · Aug 28, 2025 · Aug 28, 2025 · Aug 28, 2025
diff --git a/docs/assets/images/guides/mlops/serving/deployment_endpoints.png b/docs/assets/images/guides/mlops/serving/deployment_endpoints.png
diff --git a/docs/user_guides/mlops/serving/index.md b/docs/user_guides/mlops/serving/index.md
@@ -24,6 +24,10 @@ Configure the predictor to batch inference requests, see the [Inference Batcher
 
 Configure the predictor to log inference requests and predictions, see the [Inference Logger Guide](inference-logger.md).
 
+### REST API
+
+Send inference request to the models deployed using REST API, see the [Rest API Guide](rest-api.md).
+
 ### Troubleshooting
 
 Inspect the model server logs to troubleshoot your model deployments, see the [Troubleshooting Guide](troubleshooting.md).

diff --git a/docs/user_guides/mlops/serving/rest-api.md b/docs/user_guides/mlops/serving/rest-api.md
@@ -0,0 +1,89 @@
+# Hopsworks Model Serving REST API
+
+## Introduction
+
+Hopsworks provides model serving capabilities by leveraging [KServe](https://kserve.github.io/website/) as the model serving platform and [Istio](https://istio.io/) as the ingress gateway to the model deployments. 
+
+This document explains how to interact with a model deployment via REST API.
+
+## Base URL
+
+Deployed models are accessible through the Istio ingress gateway. The URL to interact with a model deployment is provided on the model deployment page in the Hopsworks UI. 
+
+The URL follows the format `http://<ISTIO_GATEWAY_IP>/<RESOURCE_PATH>`, where `RESOURCE_PATH` depends on the [model server](https://kserve.github.io/website/docs/intro#supported-model-frameworks) (e.g. vLLM, TensorFlow Serving, SKLearn ModelServer).
+
+<p align="center">
+  <figure>
+    <img  style="max-width: 100%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_endpoints.png" alt="Endpoints">
+    <figcaption>Deployment Endpoints</figcaption>
+  </figure>
+</p>
+
+
+## Authentication
+
+All requests must include an API Key for authentication. You can create an API by following this [guide](../../projects/api_key/create_api_key.md). 
+
+Include the key in the Authorization header:
+```text
+Authorization: ApiKey <API_KEY_VALUE>
+```
+
+## Headers
+
+| Header          | Description                                 | Example Value                        |
+| --------------- | ------------------------------------------- | ------------------------------------ |
+| `Host`          | Model’s hostname, provided in Hopsworks UI. | `fraud.test.hopsworks.ai` |
+| `Authorization` | API key for authentication.                 | `ApiKey <your_api_key>`              |
+| `Content-Type`  | Request payload type (always JSON).         | `application/json`                   |
+
+## Request Format
+
+The request must be sent as a JSON object containing an `inputs` or `instances` field. You can find more information on the request format [here](https://kserve.github.io/website/docs/concepts/architecture/data-plane/v1-protocol#request-format).
+
+=== "Python"
+    ```python
+    import requests
+
+    data = {
+        "inputs": [
+            [
+                4641025220953719,
+                4920355418495856
+            ]
+        ]
+    }
+
+    headers = {
+        "Host": "fraud.test.hopsworks.ai",
+        "Authorization": "ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp",
+        "Content-Type": "application/json"
+    }
+
+    response = requests.post(
+        "http://10.87.42.108/v1/models/fraud:predict",
+        headers=headers,
+        json=data
+    )
+    print(response.json())
+
+    ```
+=== "Curl"
+    ```bash
+    curl -X POST "http://10.87.42.108/v1/models/fraud:predict" \
+          -H "Host: fraud.test.hopsworks.ai" \
+          -H "Authorization: ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp" \
+          -H "Content-Type: application/json" \
+          -d '{
+                "inputs": [
+                  [
+                    4641025220953719,
+                    4920355418495856
+                  ]
+                ]
+              }'
+    ```
+
+## Response
+
+The model returns predictions in a JSON object. The response depends on the model server implementation. You can find more information regarding specific model servers in the [Kserve documentation](https://kserve.github.io/website/docs/intro).
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -203,6 +203,7 @@ nav:
               - Inference Logger: user_guides/mlops/serving/inference-logger.md
               - Inference Batcher: user_guides/mlops/serving/inference-batcher.md
               - API Protocol: user_guides/mlops/serving/api-protocol.md
+              - REST API: user_guides/mlops/serving/rest-api.md
               - Troubleshooting: user_guides/mlops/serving/troubleshooting.md
               - External Access: user_guides/mlops/serving/external-access.md
           - Vector Database: user_guides/mlops/vector_database/index.md