(patch): return entire list #3539

ErikKaum · 2025-11-08T15:44:25Z

Issue reported by a user. Calling text classification like so:

        result = client.text_classification(
            text="I love this product! It works great and exceeded my expectations",
            model=model,
            top_k=2,
        )

returns only 1 result despite setting top_k to 2. I think top_k is sent in correctly but the result is truncated to return only the first result.

I notice that a few other tasks had the same pattern so I omitted the [0] as well.

Note that I'm not 100% sure that this is the correct fix, especially since the async client is auto generated. Maybe you would prefer not to edit it directly?

Lemme know 🙌

HuggingFaceDocBuilderDev · 2025-11-08T15:49:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hanouticelina

@ErikKaum Which model were you using?
I tried:

import os

from huggingface_hub import InferenceClient


client = InferenceClient(
    provider="hf-inference",
    api_key=os.environ["HF_TOKEN"],
)

result = client.text_classification(
    "I like you. I love you",
    model="tabularisai/multilingual-sentiment-analysis",
    top_k=3,
)
print(result)
# [TextClassificationOutputElement(label='Very Positive', score=0.6660197973251343), TextClassificationOutputElement(label='Positive', score=0.23012897372245789), TextClassificationOutputElement(label='Neutral', score=0.061766646802425385)]

The output isn't truncated, the API returns a list of lists, so we use [0] to get the inner list, where each element is a TextClassificationOutputElement.

same for summarization and translation (we don't expect a list as output for these tasks).

ErikKaum · 2025-11-12T09:57:37Z

Ah that's gnarly, in that case it might be that the API serves it in the wrong format 😓

I used this model: emotion-english-distilroberta-base , with the default container in inference endpoints.

Did you use TEI or something else to run the model?

hanouticelina · 2025-11-12T10:07:35Z

Did you use TEI or something else to run the model?

no it's calling directly https://endpoints.huggingface.co/hf-inference/endpoints/auto-multilingual-sentiment-ana that uses registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 container (which is the default i think?).

btw even with emotion-english-distilroberta-base, i'm getting the expected output (i.e. a list of TextClassificationOutputElement):

import os

from huggingface_hub import InferenceClient


client = InferenceClient(
    provider="hf-inference",
    api_key=os.environ["HF_TOKEN"],
)

result = client.text_classification(
    "I like you. I love you",
    model="j-hartmann/emotion-english-distilroberta-base",
    top_k=3,
)
print(result)

# [TextClassificationOutputElement(label='joy', score=0.9762780666351318), TextClassificationOutputElement(label='sadness', score=0.006413786672055721), TextClassificationOutputElement(label='neutral', score=0.0055558281019330025)]

ErikKaum · 2025-11-12T17:23:42Z

Okay this is kinda weird, even with the registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 container when I deploy it on Inference endpoints I'm getting the truncated output 🤔

this is what I'm using:

from huggingface_hub import InferenceClient

def main():
    client = InferenceClient(
        token="token"
    )
    test_text = "I love this product! It works great and exceeded my expectations."
    
    result = client.text_classification(
        text=test_text,
        model=url,
        top_k=3,
    )
    
    print(result)

if __name__ == "__main__":
    main()

outputs:

TextClassificationOutputElement(label='joy', score=0.9758541584014893)

Either there's something weird on my client or then the inference endpoint that's served through the API has some added trick to it 🤔

hanouticelina · 2025-11-13T11:13:15Z

@ErikKaum yes indeed, just managed to reproduce that. pinging @oOraph if you have an idea? do we use specific pipelines for endpoints that's served through HF Inference?

(TL;DR for @oOraph: with HF Inference, the API returns a list of lists of TextClassificationOutputElement, while Inference Endpoints return just a flat list of TextClassificationOutputElement. So the same model yields different output)

Wauplin · 2025-11-13T13:43:02Z

(might be good having the same repro example in curl / raw requests instead of InferenceClient to isolate the problem during investigations

i.e.

curl -X POST \
  -H "authorization: Bearer $HF_TOKEN" \
  -H "content-type: application/json" \
  -d '{
    "inputs": "I love this product! It works great and exceeded my expectations.",
    "parameters": {
      "top_k": 3
    }
  }' \
  https://router.huggingface.co/hf-inference/models/j-hartmann/emotion-english-distilroberta-base

-and same with an inference endpoints url)

hanouticelina · 2025-11-13T13:46:17Z

^yes sorry, here is the Python repro i used :

import os

import requests


def query(payload):
    endpoint_url = ...
    headers = {
        "Accept": "application/json",
        "Authorization": f"Bearer {os.getenv('HF_TOKEN')}",
        "Content-Type": "application/json",
    }
    response = requests.post(endpoint_url, headers=headers, json=payload)
    return response.json()


output = query({"inputs": "I like you. I love you", "parameters": {}})

print(output)
print(type(output))
print(type(output[0]))

oOraph · 2025-11-14T12:14:40Z

I'll dig asap I did not read everything yet so I might be totally wrong. But I guess the problem comes from here:
https://github.com/huggingface/huggingface-inference-toolkit/blob/54d2596560ac237b1292972259d97689ef27aecc/src/huggingface_inference_toolkit/handler.py#L134

-> I had to add this at some point for the Hub widgets to work correctly on text-classification. Honnestly I don't remember why anymore and it might be unrelevant now (because at the time the widgets were in the middle of the process of being reworked to use the hugginface.js lib if not mistaken but the issue may not be here anymore :))

oOraph · 2025-11-14T15:02:25Z

OK so I looked.

HF Endpoints default image (or registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 with the API_INFERENCE_COMPAT=false env var, it's the same) returns the raw transformers.pipeline output
The problem: the type of this output is varying and can be a list or a list of list depending on the inputs type (single str vs list of str)
The Hub widget is always expecting a list of list in output, no matter what the input is (imo it's legit to always expect the same returned type but both pov can be acceptable)

Hence the registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 + API_INFERENCE_COMPAT=true env var output tweak mentionned above, to make the bridge between pipeline and widget):

https://github.com/huggingface/huggingface-inference-toolkit/blob/54d2596560ac237b1292972259d97689ef27aecc/src/huggingface_inference_toolkit/handler.py#L134

More details:

Difference between the endpoints classical output and the "tweaked" hf inference output:

HF Endpoints:

return the raw pipeline output from transformers. But depending on the body:

input is a single str: answer is a list of top_k dicts
example as mentionned above by @Wauplin

$ curl -X POST   -H "authorization: Bearer $HF_TOKEN"   -H "content-type: application/json"   -d '{
    "inputs": "I love this product! It works great and exceeded my expectations.",
    "parameters": {
      "top_k": 5
    }                                                                    
  }'   https://xiwea4wb90ys69b2.us-east-1.aws.endpoints.huggingface.cloud
[{"label":"joy","score":0.9758541584014893},{"label":"surprise","score":0.010024362243711948},{"label":"neutral","score":0.007699983660131693},{"label":"anger","score":0.003294413909316063},{"label":"sadness","score":0.001413077348843217}]

input is a list of n str: output is a list of n lists of top_k dicts

$ curl -X POST   -H "authorization: Bearer $HF_TOKEN"   -H "content-type: application/json"   -d '{
    "inputs": ["I love this product! It works great and exceeded my expectations.", "I hate this"],
    "parameters": {
      "top_k": 5
    }                                                                    
  }'   https://xiwea4wb90ys69b2.us-east-1.aws.endpoints.huggingface.cloud
[[{"label":"joy","score":0.9758541584014893},{"label":"surprise","score":0.010024362243711948},{"label":"neutral","score":0.007699983660131693},{"label":"anger","score":0.003294413909316063},{"label":"sadness","score":0.001413077348843217}],[{"label":"anger","score":0.6354441046714783},{"label":"disgust","score":0.2896914482116699},{"label":"sadness","score":0.04677379131317139},{"label":"neutral","score":0.01836824044585228},{"label":"fear","score":0.004434023052453995}]]

Tweak for HF inference: always return formatted like in the second case no matter what the input is, to make it widget compatible

-> I just made the test to know whether the output tweak was still needed or not: still needed otherwise we hit the following

(side note, script to see the varying output depending on the input cases:

from transformers import pipeline
model = pipeline(model="j-hartmann/emotion-english-distilroberta-base", task="text-classification")
output = model("I love this", top_k=3)
# [{'label': 'joy', 'score': 0.9845667481422424}, {'label': 'surprise', 'score': 0.004927210509777069}, {'label': 'sadness', 'score': 0.004531434271484613}]
print(output)
output = model(["I love this", "I hate this"], top_k=3)
# [[{'label': 'joy', 'score': 0.9845667481422424}, {'label': 'surprise', 'score': 0.004927210509777069}, {'label': 'sadness', 'score': 0.004531434271484613}], [{'label': 'anger', 'score': 0.6354441046714783}, {'label': 'disgust', 'score': 0.2896914482116699}, {'label': 'sadness', 'score': 0.04677379131317139}]]
print(output)

)

ErikKaum · 2025-11-14T17:35:56Z

Okay super nice detective work 😄

So to make sure I understood:

the hf python (and js for that matter) client always expect the output to be list[list], no matter the input?
in the hf-inference there was a tweak to make this work well with the clients
when people deploy the model on inference endpoints, they don't have this tweak --> they see it as top_k not being respected

Honestly I personally prefer that the output shape never changes based on the input, so I'd be all for that option here. I think on the inference endpoint side we'd just need to make sure that the new hf-serve (successor to the toolkit) uses this output type and then we're good I'd say 👍

Does that make sense?

ErikKaum added 2 commits November 8, 2025 16:42

possible fix

c050d64

Merge branch 'main' into patch/return-entire-list

4438e65

ErikKaum requested review from Wauplin and hanouticelina November 8, 2025 15:44

these might have the same bug

f3b507e

hanouticelina reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(patch): return entire list #3539

(patch): return entire list #3539

ErikKaum commented Nov 8, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 8, 2025

Uh oh!

hanouticelina left a comment

Uh oh!

ErikKaum commented Nov 12, 2025

Uh oh!

hanouticelina commented Nov 12, 2025

Uh oh!

ErikKaum commented Nov 12, 2025

Uh oh!

hanouticelina commented Nov 13, 2025

Uh oh!

Wauplin commented Nov 13, 2025 •

edited

Loading

Uh oh!

hanouticelina commented Nov 13, 2025

Uh oh!

oOraph commented Nov 14, 2025 •

edited

Loading

Uh oh!

oOraph commented Nov 14, 2025 •

edited

Loading

Uh oh!

ErikKaum commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

(patch): return entire list #3539

Are you sure you want to change the base?

(patch): return entire list #3539

Conversation

ErikKaum commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 8, 2025

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

ErikKaum commented Nov 12, 2025

Uh oh!

hanouticelina commented Nov 12, 2025

Uh oh!

ErikKaum commented Nov 12, 2025

Uh oh!

hanouticelina commented Nov 13, 2025

Uh oh!

Wauplin commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanouticelina commented Nov 13, 2025

Uh oh!

oOraph commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oOraph commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ErikKaum commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ErikKaum commented Nov 8, 2025 •

edited

Loading

Wauplin commented Nov 13, 2025 •

edited

Loading

oOraph commented Nov 14, 2025 •

edited

Loading

oOraph commented Nov 14, 2025 •

edited

Loading