Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@ appendix-D/01_main-chapter-code/3.pdf
appendix-E/01_main-chapter-code/loss-plot.pdf

ch04/04_gqa/kv_bytes_vs_context_length.pdf
ch05/05_mla/kv_bytes_vs_context_length.pdf
ch06/06_swa/kv_bytes_vs_context_length.pdf
ch04/05_mla/kv_bytes_vs_context_length.pdf
ch04/06_swa/kv_bytes_vs_context_length.pdf
ch04/07_moe/ffn_vs_moe.pdf

ch05/01_main-chapter-code/loss-plot.pdf
ch05/01_main-chapter-code/temperature-plot.pdf
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ Several folders contain optional materials as a bonus for interested readers:
- [Grouped-Query Attention](ch04/04_gqa)
- [Multi-Head Latent Attention](ch04/05_mla)
- [Sliding Window Attention](ch04/06_swa)
- [Mixture-of-Experts (MoE)](ch04/07_moe)
- **Chapter 5: Pretraining on unlabeled data:**
- [Alternative Weight Loading Methods](ch05/02_alternative_weight_loading/)
- [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)
Expand Down
6 changes: 3 additions & 3 deletions ch04/06_swa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,14 @@ The savings when using SWA over MHA are further shown in the plot below for diff

 

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/swa-memory/4.webp?2" alt="SWA" width="=800px" />
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/swa-memory/4.webp?2" alt="SWA" width="800px" />

&nbsp;

You can reproduce these plots via:
You can reproduce thi plots via:

```bash
plot_memory_estimates_swa.py \
uv run plot_memory_estimates_swa.py \
--emb_dim 4096 --n_heads 48 --n_layers 36 \
--batch_size 1 --dtype bf16 \
--sliding_window_size 2048 --swa_ratio "5:1"
Expand Down
123 changes: 0 additions & 123 deletions ch04/06_swa/memory_estimator_mla.py

This file was deleted.

90 changes: 0 additions & 90 deletions ch04/06_swa/plot_memory_estimates_mla.py

This file was deleted.

Loading
Loading