[BUG] PagedAttentionPrefill KV cache head_dim 维度 stride 不连续导致底层算子报错

## 问题描述

`PagedAttentionPrefill` 底层算子要求 KV cache 最后一维（head_dim）的 stride 必须为 1，但当前实现未对传入的 `k_cache` / `v_cache` 做连续性检查，导致非连续张量直接传入底层算子时触发 `Bad Tensor Strides` 错误。该问题是普遍性的，不限于特定后端。

## 复现步骤

```bash
python test/infinicore/ops/paged_attention_prefill.py --metax
```

结果：failed 56/60

## 错误信息

```
Bad Tensor Strides
k_cache_desc->stride(3) == 1
```

传入张量 stride 类似 `(4096, 1, 512, 4)`，最后一维 stride 为 4，不满足 `stride(3) == 1` 的要求。

## 建议修复

在 `python/infinicore/ops/paged_attention_prefill.py` 的 Python wrapper 中添加连续性 guard：当 `k_cache` / `v_cache` 的最后一维不是连续时，自动调用 `.contiguous()` 转换。

标准 vLLM KV cache view 的 head-dim stride 已经是 1，所以正常 vLLM 路径不会触发额外 copy。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] PagedAttentionPrefill KV cache head_dim 维度 stride 不连续导致底层算子报错 #1148

问题描述

复现步骤

错误信息

建议修复

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] PagedAttentionPrefill KV cache head_dim 维度 stride 不连续导致底层算子报错 #1148

Description

问题描述

复现步骤

错误信息

建议修复

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions