Skip to content

Conversation

@ErikKaum
Copy link
Member

@ErikKaum ErikKaum commented Nov 8, 2025

Issue reported by a user. Calling text classification like so:

        result = client.text_classification(
            text="I love this product! It works great and exceeded my expectations",
            model=model,
            top_k=2,
        )

returns only 1 result despite setting top_k to 2. I think top_k is sent in correctly but the result is truncated to return only the first result.

I notice that a few other tasks had the same pattern so I omitted the [0] as well.

Note that I'm not 100% sure that this is the correct fix, especially since the async client is auto generated. Maybe you would prefer not to edit it directly?

Lemme know 🙌

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ErikKaum Which model were you using?
I tried:

import os

from huggingface_hub import InferenceClient


client = InferenceClient(
    provider="hf-inference",
    api_key=os.environ["HF_TOKEN"],
)

result = client.text_classification(
    "I like you. I love you",
    model="tabularisai/multilingual-sentiment-analysis",
    top_k=3,
)
print(result)
# [TextClassificationOutputElement(label='Very Positive', score=0.6660197973251343), TextClassificationOutputElement(label='Positive', score=0.23012897372245789), TextClassificationOutputElement(label='Neutral', score=0.061766646802425385)]

The output isn't truncated, the API returns a list of lists, so we use [0] to get the inner list, where each element is a TextClassificationOutputElement.

same for summarization and translation (we don't expect a list as output for these tasks).

@ErikKaum
Copy link
Member Author

Ah that's gnarly, in that case it might be that the API serves it in the wrong format 😓

I used this model: emotion-english-distilroberta-base , with the default container in inference endpoints.

Did you use TEI or something else to run the model?

@hanouticelina
Copy link
Contributor

Did you use TEI or something else to run the model?

no it's calling directly https://endpoints.huggingface.co/hf-inference/endpoints/auto-multilingual-sentiment-ana that uses registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 container (which is the default i think?).

btw even with emotion-english-distilroberta-base, i'm getting the expected output (i.e. a list of TextClassificationOutputElement):

import os

from huggingface_hub import InferenceClient


client = InferenceClient(
    provider="hf-inference",
    api_key=os.environ["HF_TOKEN"],
)

result = client.text_classification(
    "I like you. I love you",
    model="j-hartmann/emotion-english-distilroberta-base",
    top_k=3,
)
print(result)

# [TextClassificationOutputElement(label='joy', score=0.9762780666351318), TextClassificationOutputElement(label='sadness', score=0.006413786672055721), TextClassificationOutputElement(label='neutral', score=0.0055558281019330025)]

@ErikKaum
Copy link
Member Author

Okay this is kinda weird, even with the registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 container when I deploy it on Inference endpoints I'm getting the truncated output 🤔

this is what I'm using:

from huggingface_hub import InferenceClient

def main():
    client = InferenceClient(
        token="token"
    )
    test_text = "I love this product! It works great and exceeded my expectations."
    
    result = client.text_classification(
        text=test_text,
        model=url,
        top_k=3,
    )
    
    print(result)

if __name__ == "__main__":
    main()

outputs:

TextClassificationOutputElement(label='joy', score=0.9758541584014893)

Either there's something weird on my client or then the inference endpoint that's served through the API has some added trick to it 🤔

@hanouticelina
Copy link
Contributor

@ErikKaum yes indeed, just managed to reproduce that. pinging @oOraph if you have an idea? do we use specific pipelines for endpoints that's served through HF Inference?

(TL;DR for @oOraph: with HF Inference, the API returns a list of lists of TextClassificationOutputElement, while Inference Endpoints return just a flat list of TextClassificationOutputElement. So the same model yields different output)

@Wauplin
Copy link
Contributor

Wauplin commented Nov 13, 2025

(might be good having the same repro example in curl / raw requests instead of InferenceClient to isolate the problem during investigations

i.e.

curl -X POST \
  -H "authorization: Bearer $HF_TOKEN" \
  -H "content-type: application/json" \
  -d '{
    "inputs": "I love this product! It works great and exceeded my expectations.",
    "parameters": {
      "top_k": 3
    }
  }' \
  https://router.huggingface.co/hf-inference/models/j-hartmann/emotion-english-distilroberta-base

-and same with an inference endpoints url)

@hanouticelina
Copy link
Contributor

^yes sorry, here is the Python repro i used :

import os

import requests


def query(payload):
    endpoint_url = ...
    headers = {
        "Accept": "application/json",
        "Authorization": f"Bearer {os.getenv('HF_TOKEN')}",
        "Content-Type": "application/json",
    }
    response = requests.post(endpoint_url, headers=headers, json=payload)
    return response.json()


output = query({"inputs": "I like you. I love you", "parameters": {}})

print(output)
print(type(output))
print(type(output[0]))

@oOraph
Copy link
Contributor

oOraph commented Nov 14, 2025

I'll dig asap I did not read everything yet so I might be totally wrong. But I guess the problem comes from here:
https://github.com/huggingface/huggingface-inference-toolkit/blob/54d2596560ac237b1292972259d97689ef27aecc/src/huggingface_inference_toolkit/handler.py#L134

-> I had to add this at some point for the Hub widgets to work correctly on text-classification. Honnestly I don't remember why anymore and it might be unrelevant now (because at the time the widgets were in the middle of the process of being reworked to use the hugginface.js lib if not mistaken but the issue may not be here anymore :))

@oOraph
Copy link
Contributor

oOraph commented Nov 14, 2025

OK so I looked.

  • HF Endpoints default image (or registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 with the API_INFERENCE_COMPAT=false env var, it's the same) returns the raw transformers.pipeline output

  • The problem: the type of this output is varying and can be a list or a list of list depending on the inputs type (single str vs list of str)

  • The Hub widget is always expecting a list of list in output, no matter what the input is (imo it's legit to always expect the same returned type but both pov can be acceptable)

Hence the registry.internal.huggingface.tech/hf-endpoints/inference-pytorch-cpu:api-inference-6.5.0 + API_INFERENCE_COMPAT=true env var output tweak mentionned above, to make the bridge between pipeline and widget):

https://github.com/huggingface/huggingface-inference-toolkit/blob/54d2596560ac237b1292972259d97689ef27aecc/src/huggingface_inference_toolkit/handler.py#L134

More details:

Difference between the endpoints classical output and the "tweaked" hf inference output:

  1. HF Endpoints:

return the raw pipeline output from transformers. But depending on the body:

  • input is a single str: answer is a list of top_k dicts
    example as mentionned above by @Wauplin
$ curl -X POST   -H "authorization: Bearer $HF_TOKEN"   -H "content-type: application/json"   -d '{
    "inputs": "I love this product! It works great and exceeded my expectations.",
    "parameters": {
      "top_k": 5
    }                                                                    
  }'   https://xiwea4wb90ys69b2.us-east-1.aws.endpoints.huggingface.cloud
[{"label":"joy","score":0.9758541584014893},{"label":"surprise","score":0.010024362243711948},{"label":"neutral","score":0.007699983660131693},{"label":"anger","score":0.003294413909316063},{"label":"sadness","score":0.001413077348843217}]
  • input is a list of n str: output is a list of n lists of top_k dicts
$ curl -X POST   -H "authorization: Bearer $HF_TOKEN"   -H "content-type: application/json"   -d '{
    "inputs": ["I love this product! It works great and exceeded my expectations.", "I hate this"],
    "parameters": {
      "top_k": 5
    }                                                                    
  }'   https://xiwea4wb90ys69b2.us-east-1.aws.endpoints.huggingface.cloud
[[{"label":"joy","score":0.9758541584014893},{"label":"surprise","score":0.010024362243711948},{"label":"neutral","score":0.007699983660131693},{"label":"anger","score":0.003294413909316063},{"label":"sadness","score":0.001413077348843217}],[{"label":"anger","score":0.6354441046714783},{"label":"disgust","score":0.2896914482116699},{"label":"sadness","score":0.04677379131317139},{"label":"neutral","score":0.01836824044585228},{"label":"fear","score":0.004434023052453995}]]
  1. Tweak for HF inference: always return formatted like in the second case no matter what the input is, to make it widget compatible

-> I just made the test to know whether the output tweak was still needed or not: still needed otherwise we hit the following

Screenshot from 2025-11-14 15-26-55

(side note, script to see the varying output depending on the input cases:

from transformers import pipeline
model = pipeline(model="j-hartmann/emotion-english-distilroberta-base", task="text-classification")
output = model("I love this", top_k=3)
# [{'label': 'joy', 'score': 0.9845667481422424}, {'label': 'surprise', 'score': 0.004927210509777069}, {'label': 'sadness', 'score': 0.004531434271484613}]
print(output)
output = model(["I love this", "I hate this"], top_k=3)
# [[{'label': 'joy', 'score': 0.9845667481422424}, {'label': 'surprise', 'score': 0.004927210509777069}, {'label': 'sadness', 'score': 0.004531434271484613}], [{'label': 'anger', 'score': 0.6354441046714783}, {'label': 'disgust', 'score': 0.2896914482116699}, {'label': 'sadness', 'score': 0.04677379131317139}]]
print(output)

)

@ErikKaum
Copy link
Member Author

Okay super nice detective work 😄

So to make sure I understood:

  • the hf python (and js for that matter) client always expect the output to be list[list], no matter the input?
  • in the hf-inference there was a tweak to make this work well with the clients
  • when people deploy the model on inference endpoints, they don't have this tweak --> they see it as top_k not being respected

Honestly I personally prefer that the output shape never changes based on the input, so I'd be all for that option here. I think on the inference endpoint side we'd just need to make sure that the new hf-serve (successor to the toolkit) uses this output type and then we're good I'd say 👍

Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants