feat(thicktoken): slice tokens before decoding #669

franklevasseur · 2026-01-27T16:11:14Z

I ran a small benchmark on a 5.09MB, 1.02M tokens and this is what I got:

┌───────────────────┬──────┬──────┬─────────┐
│ (index)           │ slow │ fast │ speedup │
├───────────────────┼──────┼──────┼─────────┤
│ first 100 tokens  │ 1349 │ 814  │ '1.66'  │
│ first 1k tokens   │ 1547 │ 892  │ '1.73'  │
│ last 100 tokens   │ 1432 │ 819  │ '1.75'  │
│ last 1k tokens    │ 1417 │ 824  │ '1.72'  │
│ middle 100 tokens │ 1385 │ 820  │ '1.69'  │
│ middle 1k tokens  │ 1399 │ 810  │ '1.73'  │
│ edge 100 tokens   │ 1389 │ 826  │ '1.68'  │
│ edge 1k tokens    │ 1343 │ 825  │ '1.63'  │
│ random 100 tokens │ 1496 │ 847  │ '1.77'  │
│ random 1k tokens  │ 1392 │ 874  │ '1.59'  │
└───────────────────┴──────┴──────┴─────────┘

I could have offered a simpler API like

type TokenSlice = {
  preserve: "start" | "end" | "both",
  tokenCount: number
}

But I think I prefer letting LLMz decide how to slice. If the slicing algorithm changes in LLMz, the thicktoken lib can remain unchanged.

feat(thicktoken): slice tokens before decoding

6221541

franklevasseur requested a review from a team as a code owner January 27, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(thicktoken): slice tokens before decoding #669

feat(thicktoken): slice tokens before decoding #669

Uh oh!

franklevasseur commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(thicktoken): slice tokens before decoding #669

Are you sure you want to change the base?

feat(thicktoken): slice tokens before decoding #669

Uh oh!

Conversation

franklevasseur commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants