Skip to content

Conversation

@XkunW
Copy link
Contributor

@XkunW XkunW commented Nov 28, 2024

PR Type

[Release]
v0.4.0

Short Description

  • Onboarded various new models and new model types: text embedding model and reward reasoning model.
  • Added metrics command that streams performance metrics for inference server.
  • Enabled more launch command options: --max-num-seqs, "--model-weights-parent-dir", --pipeline-parallelism, --enforce-eager.
  • Improved support for launching custom models.
  • Improved command response time.
  • Improved visuals for list command.

XkunW added 28 commits October 29, 2024 10:27
… models, added Llama 3.2 and Llama 3.1 Nemotron
…d list command based on model type, updated READMEs, removed debugging code
…ling for when errors that doesn't affect server launching show up in err logs
@XkunW XkunW merged commit d221dae into main Nov 28, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants