convert_cpu_weights DeepSeek R1 0528 crashed

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

kt-kernel 0.4.1, Ubuntu 24.04.
This server ran ktransformers 0.3.2 success.
And I create new conda env to run kt-kernel 0.4.1.
The new conda env with kt-kernel 0.4.1 conver cpu-weights and run Qwen-30b success
But I tried to conver DeepSeek R1 0528 to cpu weights failed.

### Reproduction

```
python scripts/convert_cpu_weights.py \
  --input-path /path/to/model \
  --input-type bf16 \
  --output /path/to/output \
  --quant-method int4
```

when procee to layer 55, it crashed and return:
```
Processing layer 55 (53/59)...
Converting layer 55 with 256 experts via online quantization...
  Loaded weights shapes:
    gate_proj: torch.Size([256, 2048, 7168])
    up_proj: torch.Size([256, 2048, 7168])
    down_proj: torch.Size([256, 7168, 2048])
TP MOE layer 55, pool: 0x4019aca0, expert num: 256, num_experts_per_tok: 8
Creating AMX_MOE_TP 1 at numa 0
Creating AMX_MOE_TP 0 at numa 0
Creating "/opt/ai-models/r1/DeepSeek-R1-0528-CPU/_layer_55/_numa_1"Creating
"/opt/ai-models/r1/DeepSeek-R1-0528-CPU/_layer_55/_numa_0"
alloc 1 from other numa for 7160d0052660
From BF16
段错误 (核心已转储)
```


Error message so less, it's this script have any log settings?

### Others

I found the memory add and add, the error seems OOM, this server has 768G RAM, did it enough?
How big memory need to conver DeepSeek R1 671b?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert_cpu_weights DeepSeek R1 0528 crashed #1627

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

convert_cpu_weights DeepSeek R1 0528 crashed #1627

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions