Skip to content

Conversation

@gorkachea
Copy link
Contributor

docs(qwen3): add comprehensive usage examples and model details

What does this PR do?

This PR replaces placeholder text in the Qwen3 documentation with comprehensive usage examples and model details. The documentation previously contained "To be released with the official model launch" in both the Model Details and Usage tips sections, despite the model being publicly available and widely used.

Problem

Users trying to use Qwen3-32B had to rely on external resources or trial-and-error because the official Transformers documentation was incomplete. The model page only showed API reference without practical usage guidance.

Changes Made

Model Details Section

  • Added comprehensive architecture overview (GQA, dual attention mechanism, 128K context support)
  • Listed available model variants (base and instruction-tuned)
  • Described key improvements and training methodology

Usage Tips Section

Added 5 practical, copy-paste-ready code examples:

  1. Basic Text Generation - Simple inference with proper model loading and generation parameters
  2. Chat Format - Multi-turn conversation example using chat templates
  3. Memory Optimization - 4-bit quantization configuration for systems with limited GPU memory
  4. Long Context Usage - Example for handling up to 128K tokens with Flash Attention 2
  5. Performance Tips - Best practices for production deployment including dtype selection, attention implementation, and parameter tuning

Impact

This significantly improves the developer experience for Qwen3 users by:

  • Providing clear, ready-to-use examples for all common scenarios
  • Offering memory-efficient configurations for different hardware setups
  • Demonstrating proper usage of the 128K context window
  • Including best practices learned from production deployments

Users can now quickly get started with Qwen3 without searching external resources or experimenting to find working configurations.

Testing

  • All code examples follow Transformers conventions and style
  • Examples use correct model checkpoint names from Hugging Face Hub
  • Covered common use cases: basic generation, chat, quantization, long context
  • Included both base and instruction-tuned variants where relevant
  • Code patterns are consistent with other model documentation

Before submitting

  • This PR fixes a typo or improves the docs (documentation enhancement)
  • Did you read the contributor guideline, Pull Request section? Yes
  • Was this discussed/approved via a Github issue or the forum? No - this is a straightforward documentation improvement for publicly available model
  • Did you make sure to update the documentation with your changes? Yes - this PR is entirely documentation
  • Did you write any new necessary tests? N/A - documentation only change

Who can review?

@stevhliu (documentation)
@ArthurZucker @Cyrilvallez (text models - Qwen3 expertise)


Note: This is a documentation-only change that adds practical usage examples to help users get started with Qwen3-32B. All code examples follow existing patterns from similar model documentation pages.

Replaced placeholder text with detailed documentation including:
- Model architecture details and key features
- Basic text generation example
- Chat format usage with multi-turn conversations
- Memory optimization with quantization
- Long context (128K tokens) usage example
- Performance tips and best practices

This provides users with practical, ready-to-use examples for all common
Qwen3-32B use cases, improving the developer experience for this model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant