Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/user_guides/mlops/serving/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ Configure the predictor to batch inference requests, see the [Inference Batcher

Configure the predictor to log inference requests and predictions, see the [Inference Logger Guide](inference-logger.md).

### Rest API

Send inference request to the models deployed using REST API, see the [Rest API Guide](rest-api.md).

### Troubleshooting

Inspect the model server logs to troubleshoot your model deployments, see the [Troubleshooting Guide](troubleshooting.md).
Expand Down
103 changes: 103 additions & 0 deletions docs/user_guides/mlops/serving/rest-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Hopsworks Model Serving REST API

## Introduction

Hopsworks provides model serving using [KServe](https://kserve.github.io/website/) as the deployment framework and [Istio]() as the ingress gateway to the Kubernetes cluster.

This document explains how to interact with a deployed model endpoint via REST.

## Base URL

The deployed model is accessible through the Istio ingress gateway. The URL to access the model is provided on the deployment page inside the Hopsworks UI.

The URL follows this format:
```text
http://<ISTIO_GATEWAY_IP>/v1/models/<DEPLOYMENT_NAME>:predict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not 100% correct. The URL format depends on the model server. For example, /v1/models/:predict works for python deployments, but not for TensorFlow or LLMs.

I would suggest something like:
http://<ISTIO_GATEWAY_IP>/<RESOURCE_PATH>, where RESOURCE_PATH depends on the model server (e.g., vLLM, TensorFlow Serving, KServe sklearnserver).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it based on your suggestion.

```

- `<ISTIO_GATEWAY_IP>`: External IP address of the Istio ingress gateway.
- `<DEPLOYMENT_NAME>`: The name of the deployment.

<p align="center">
<figure>
<img style="max-width: 100%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_endpoints.png" alt="Endpoints">
<figcaption>Deployment Endpoints</figcaption>
</figure>
</p>


## Authentication

All requests must include an API Key for authentication. You can create an API by following this [guide](../../projects/api_key/create_api_key.md).

Include the key in the Authorization header:
```text
Authorization: ApiKey <API_KEY_VALUE>
```

## Headers

| Header | Description | Example Value |
| --------------- | ------------------------------------------- | ------------------------------------ |
| `Host` | Model’s hostname, provided in Hopsworks UI. | `fraud.test.hopsworks.ai` |
| `Authorization` | API key for authentication. | `ApiKey <your_api_key>` |
| `Content-Type` | Request payload type (always JSON). | `application/json` |

## Request Format

The request must be sent as a JSON object containing an `inputs` or `instances` field. You can find more information on the request format [here](https://kserve.github.io/website/docs/concepts/architecture/data-plane/v1-protocol#request-format).

=== "Python"
```python
import requests

data = {
"inputs": [
[
4641025220953719,
4920355418495856
]
]
}

headers = {
"Host": "fraud.test.hopsworks.ai",
"Authorization": "ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp",
"Content-Type": "application/json"
}

response = requests.post(
"http://10.87.42.108/v1/models/fraud:predict",
headers=headers,
json=data
)
print(response.json())

```
=== "Curl"
```bash
curl -X POST "http://10.87.42.108/v1/models/fraud:predict" \
-H "Host: fraud.test.hopsworks.ai" \
-H "Authorization: ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp" \
-H "Content-Type: application/json" \
-d '{
"inputs": [
[
4641025220953719,
4920355418495856
]
]
}'
```

## Example Response

The model returns predictions in a JSON object. You can find more information [here](https://kserve.github.io/website/docs/concepts/architecture/data-plane/v1-protocol#response-format).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model server responses also depend on the model server implementation :)
The { "predictions": [] } format applies to sklearn/xgboost deployments, but TensorFlow Serving or vLLM returns a different format that the one specified in the link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah yes I see your point. I removed the example I have and updated the text to point to a say that the response also depends on the model server.

I could not get a link to their Model Serving Page so I pointed Kserve docs to mention that they can refer their for more information regarding any model servers.


```json
{
"predictions": [
"some_prediction_result"
]
}
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ nav:
- Inference Logger: user_guides/mlops/serving/inference-logger.md
- Inference Batcher: user_guides/mlops/serving/inference-batcher.md
- API Protocol: user_guides/mlops/serving/api-protocol.md
- REST API: user_guides/mlops/serving/rest-api.md
- Troubleshooting: user_guides/mlops/serving/troubleshooting.md
- External Access: user_guides/mlops/serving/external-access.md
- Vector Database: user_guides/mlops/vector_database/index.md
Expand Down