-
Notifications
You must be signed in to change notification settings - Fork 598
feat: add MTP to ds-r1 ref. impl #2403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
@mrmhodak @hanyunfan @tanvi-mlcommons had to create new mr due to some artifacts |
tanvi-mlcommons
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
viraatc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready for merge
| "api_key": None, | ||
| "tensor_parallel_size": 8, | ||
| # NOTE(vir): sg-lang crash without +2 additional | ||
| "context_length": MAX_ISL + MAX_OSL + MAX_TEMPLATE_TOKS + 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see warning regarding max-sequence-length when when MTP is enabled.
possible SGLANG bug:
Skipping prompt 2738 due to error: Error code: 400 - {'object': 'error', 'message': "Requested token count exceeds the model's maximum context length of 23142 tokens. You requested a total of 23144 tokens: 3144 tokens from the input messages and 20000 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit.", 'type': 'BadRequestError', 'param': None, 'code': 400}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be the last MTP emitting +2 tokens 🤔 . But good to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actual token-len (after applying chat-template) of prompt#2738 is 3140 (also the max-ISL in ds-r1 dataset).
-
sglang server returns HTTP400 (invalid request),
calculating input length 3144 (With MTP) and 3142 (Without MTP) -
i suspect application of chat-template / related. arguments
(sglang auto-applies chat-template on v1/chat/completions) -
for now, WAR is to update the
+2to+5to now also account for MTP (+4 was not sufficient);
max output length is anyways bounded by sampling parametermax_tokens(no output changes)
ive started thread in sglang slack channel, asking for clarifications:
viraatc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submission-checker changes:
|
@tanvi-mlcommons @mrmhodak we need a approval for this PR |
|
@nvzhihanj : There is a failed check |
seems transient? |
|
@viraatc please fix the merge conflicts |
No description provided.