Faster tokenizer #137

LoganDark · 2023-06-04T23:06:16Z

This one is faster

Benchmark 12700 tokens...
Encode 0.361 MB/s
Decode 12700000.0 MB/s
Encode 2.292 MB/s
Decode 12700000.0 MB/s
Encode 4.277 MB/s
Decode 12700000.0 MB/s
Unit test...
All OK

Benchmark 317500 tokens...
Encode 0.359 MB/s
Decode 17.167 MB/s
Encode 2.143 MB/s
Decode 17.687 MB/s
Encode 3.477 MB/s
Decode 26.242 MB/s
Unit test...
All OK

Not bad for python, I guess.

This one is faster

Sacrifice little runtime performance (~10%) for much faster loading (~50%).

LoganDark added 2 commits June 4, 2023 16:01

Faster tokenizer

6359bfe

This one is faster

Back to dictionary

8ccb10a

Sacrifice little runtime performance (~10%) for much faster loading (~50%).

LoganDark force-pushed the fast-tokenizer branch 2 times, most recently from 20f0c2b to c0cce90 Compare June 5, 2023 00:57

Hint Generator type

c01f3fb

LoganDark force-pushed the fast-tokenizer branch from c0cce90 to c01f3fb Compare June 5, 2023 01:00

LoganDark added 2 commits June 4, 2023 18:13

Remove now unused tok2len

a8e9a26

Make tokenizer compile with mypyc

44b52ba

LoganDark force-pushed the fast-tokenizer branch from 320bd2b to 764bb5a Compare June 5, 2023 21:48

Typecheck more with mypyc

3a89a5b

LoganDark force-pushed the fast-tokenizer branch from 764bb5a to 3a89a5b Compare June 5, 2023 21:48

Merge branch 'main' into fast-tokenizer

2b15367

LoganDark mentioned this pull request Jul 22, 2023

add rwkv world tokenizer oobabooga/text-generation-webui#3254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Faster tokenizer #137

Faster tokenizer #137

Uh oh!

LoganDark commented Jun 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Faster tokenizer #137

Are you sure you want to change the base?

Faster tokenizer #137

Uh oh!

Conversation

LoganDark commented Jun 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant