Skip to content

Conversation

@viraatc
Copy link
Contributor

@viraatc viraatc commented Dec 4, 2025

No description provided.

@viraatc viraatc requested a review from a team as a code owner December 4, 2025 01:07
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@viraatc
Copy link
Contributor Author

viraatc commented Dec 4, 2025

@mrmhodak @hanyunfan @tanvi-mlcommons
@nvzhihanj

had to create new mr due to some artifacts

tanvi-mlcommons
tanvi-mlcommons previously approved these changes Dec 4, 2025
Copy link

@tanvi-mlcommons tanvi-mlcommons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

Copy link
Contributor Author

@viraatc viraatc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready for merge

"api_key": None,
"tensor_parallel_size": 8,
# NOTE(vir): sg-lang crash without +2 additional
"context_length": MAX_ISL + MAX_OSL + MAX_TEMPLATE_TOKS + 2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see warning regarding max-sequence-length when when MTP is enabled.
possible SGLANG bug:

Skipping prompt 2738 due to error: Error code: 400 - {'object': 'error', 'message': "Requested token count exceeds the model's maximum context length of 23142 tokens. You requested a total of 23144 tokens: 3144 tokens from the input messages and 20000 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit.", 'type': 'BadRequestError', 'param': None, 'code': 400}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be the last MTP emitting +2 tokens 🤔 . But good to know

Copy link
Contributor Author

@viraatc viraatc Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actual token-len (after applying chat-template) of prompt#2738 is 3140 (also the max-ISL in ds-r1 dataset).

  • sglang server returns HTTP400 (invalid request),
    calculating input length 3144 (With MTP) and 3142 (Without MTP)

  • i suspect application of chat-template / related. arguments
    (sglang auto-applies chat-template on v1/chat/completions)

  • for now, WAR is to update the +2 to +5 to now also account for MTP (+4 was not sufficient);
    max output length is anyways bounded by sampling parameter max_tokens (no output changes)

ive started thread in sglang slack channel, asking for clarifications:

Copy link
Contributor Author

@viraatc viraatc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submission-checker changes:

@nvzhihanj
Copy link
Contributor

@tanvi-mlcommons @mrmhodak we need a approval for this PR

@mrmhodak
Copy link
Contributor

@nvzhihanj : There is a failed check

@viraatc
Copy link
Contributor Author

viraatc commented Dec 17, 2025

mlc.script_action.ScriptExecutionError: Script run execution failed. Error : 504 Server Error: Gateway Time-out for url: https://zenodo.org/record/6617879/files/resnext50_32x4d_fpn.onnx

seems transient?
pushed an empty to redo checks.

@arjunsuresh
Copy link
Contributor

@viraatc please fix the merge conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants