[Bug]: 0.11.0rc0版本 推理后在openai格式请求中使用guided参数访问推理服务时会导致vllm崩溃

### Your current environment

vllm-ascend版本:0.11.0rc0
运行模型qwq-32b-w8a8 (经测，其他模型有同样问题）
硬件：Atlas 800I A2 推理版

### 🐛 Describe the bug

在推理完成后，使用“[结构化输出指南](https://docs.vllm.ai/en/latest/features/structured_outputs.html)”中的extra_body请求参数会导致vllm服务端报错，然后服务中断。
from openai import OpenAI
from enum import Enum
from pydantic import BaseModel, constr

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"

class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType

client = OpenAI(
    base_url="http://172.0.163.2:8011/v1",
    api_key="-",
)
json_schema = CarDescription.model_json_schema()

completion = client.chat.completions.create(
    model="qwq-32b-w8a8",
    messages=[
        {
            "role": "user",
            "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
        }
    ],
    extra_body={"structured_outputs": {"json": json_schema}},
)
print(completion.choices[0].message.content)

使用0.10.2rc1版本镜像时运行以上脚本结果正常，服务端没报错，有正确返回。使用0.11.0rc0 服务会直接宕掉。
报错如下

[结构化输出报错.txt](https://github.com/user-attachments/files/23354671/default.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 0.11.0rc0版本推理后在openai格式请求中使用guided参数访问推理服务时会导致vllm崩溃 #4006

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: 0.11.0rc0版本 推理后在openai格式请求中使用guided参数访问推理服务时会导致vllm崩溃 #4006

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: 0.11.0rc0版本推理后在openai格式请求中使用guided参数访问推理服务时会导致vllm崩溃 #4006