Skip to content

[Bug]: 0.11.0rc0版本 推理后在openai格式请求中使用guided参数访问推理服务时会导致vllm崩溃 #4006

@Chyokoei

Description

@Chyokoei

Your current environment

vllm-ascend版本:0.11.0rc0
运行模型qwq-32b-w8a8 (经测,其他模型有同样问题)
硬件:Atlas 800I A2 推理版

🐛 Describe the bug

在推理完成后,使用“结构化输出指南”中的extra_body请求参数会导致vllm服务端报错,然后服务中断。
from openai import OpenAI
from enum import Enum
from pydantic import BaseModel, constr

class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"

class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType

client = OpenAI(
base_url="http://172.0.163.2:8011/v1",
api_key="-",
)
json_schema = CarDescription.model_json_schema()

completion = client.chat.completions.create(
model="qwq-32b-w8a8",
messages=[
{
"role": "user",
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
}
],
extra_body={"structured_outputs": {"json": json_schema}},
)
print(completion.choices[0].message.content)

使用0.10.2rc1版本镜像时运行以上脚本结果正常,服务端没报错,有正确返回。使用0.11.0rc0 服务会直接宕掉。
报错如下

结构化输出报错.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions