-
Notifications
You must be signed in to change notification settings - Fork 555
Description
Your current environment
vllm-ascend版本:0.11.0rc0
运行模型qwq-32b-w8a8 (经测,其他模型有同样问题)
硬件:Atlas 800I A2 推理版
🐛 Describe the bug
在推理完成后,使用“结构化输出指南”中的extra_body请求参数会导致vllm服务端报错,然后服务中断。
from openai import OpenAI
from enum import Enum
from pydantic import BaseModel, constr
class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"
class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType
client = OpenAI(
base_url="http://172.0.163.2:8011/v1",
api_key="-",
)
json_schema = CarDescription.model_json_schema()
completion = client.chat.completions.create(
model="qwq-32b-w8a8",
messages=[
{
"role": "user",
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
}
],
extra_body={"structured_outputs": {"json": json_schema}},
)
print(completion.choices[0].message.content)
使用0.10.2rc1版本镜像时运行以上脚本结果正常,服务端没报错,有正确返回。使用0.11.0rc0 服务会直接宕掉。
报错如下