Skip to content

Conversation

@patrocinio
Copy link
Contributor

Fixes #3643

Description

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3662

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Nov 25, 2025
- Add dedicated PAD_token (index 2) for proper padding
- Use pack_padded_sequence in encoder to handle variable-length sequences
- Ensure encoder hidden state represents actual content, not padding
- Add ignore_index=PAD_token to loss function to exclude padding from gradients
- Update all embedding layers with padding_idx parameter
- Add comprehensive documentation explaining padding handling best practices

Fixes issues where:
1. GRU final hidden state could be from PAD tokens
2. Loss was computed on PAD tokens affecting training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feedback about NLP From Scratch: Translation with a Sequence to Sequence Network and Attention

1 participant