Skip to content

Conversation

@LoganDark
Copy link

This one is faster

Benchmark 12700 tokens...
Encode 0.361 MB/s
Decode 12700000.0 MB/s
Encode 2.292 MB/s
Decode 12700000.0 MB/s
Encode 4.277 MB/s
Decode 12700000.0 MB/s
Unit test...
All OK

Benchmark 317500 tokens...
Encode 0.359 MB/s
Decode 17.167 MB/s
Encode 2.143 MB/s
Decode 17.687 MB/s
Encode 3.477 MB/s
Decode 26.242 MB/s
Unit test...
All OK

Not bad for python, I guess.

LoganDark added 2 commits June 4, 2023 16:01
This one is faster
Sacrifice little runtime performance (~10%) for much faster
loading (~50%).
@LoganDark LoganDark force-pushed the fast-tokenizer branch 2 times, most recently from 20f0c2b to c0cce90 Compare June 5, 2023 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant