You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Working notes for future agents hacking on `tinker-cookbook`. Additional docs ca
19
19
- Launch scripts define a CLI-facing `CLIConfig` (parsed by `chz`) that instantiates the richer training `Config`. This gives every recipe a consistent `python -m ... key=value` interface.
20
20
- Env builders compose like `RLDatasetBuilder → EnvGroupBuilder → Env`. Groups let us share metadata (tags, pairwise comparisons) and center rewards across related rollouts.
21
21
-**Completers:** algorithms interact with the `TokenCompleter` interface. `TinkerTokenCompleter` (wrapping a `SamplingClient`) is the default implementation, but evaluators may accept any `TokenCompleter` or `MessageCompleter`.
22
-
-**Renderers & tokenizer utils:** pick the renderer that matches your tokenizer/model pair (e.g., `role_colon`, `llama3`, `qwen3`). `TrainOnWhat` controls which tokens get weight=1 in SFT. Tokenizers are cached via `tokenizer_utils.get_tokenizer`, with Llama-3 names remapped to `baseten/Meta-Llama-3-tokenizer` to bypass HF gating.
22
+
-**Renderers & tokenizer utils:** pick the renderer that matches your tokenizer/model pair (e.g., `role_colon`, `llama3`, `qwen3`). `TrainOnWhat` controls which tokens get weight=1 in SFT. Tokenizers are cached via `tokenizer_utils.get_tokenizer`, with Llama-3 names remapped to `thinkingmachineslabinc/meta-llama-3-tokenizer` to bypass HF gating.
23
23
-**Loss plumbing:** every `tinker.Datum` bundles a `model_input` plus `loss_fn_inputs` (`TensorData`). Use helpers such as `conversation_to_datum`, `datum_from_tokens_weights`, and `_remove_mask` instead of constructing dicts manually. Built-in losses: `cross_entropy`, `importance_sampling`, `ppo`; `forward_backward_custom` covers bespoke differentiable objectives.
24
24
25
25
## Conventions & Notation (from CONTRIBUTING)
@@ -59,7 +59,7 @@ Working notes for future agents hacking on `tinker-cookbook`. Additional docs ca
59
59
60
60
### Evaluations & Sampling
61
61
- Inline evaluators implement either `TrainingClientEvaluator` or `SamplingClientEvaluator`. Training loops accept builder lists (`evaluator_builders`, `infrequent_evaluator_builders`). Inspect AI integration is in `eval/inspect_evaluators.py` and `eval/run_inspect_evals.py`.
62
-
- Sampling clients come from `training_client.save_weights_and_get_sampling_client(name=...)`. To export weights, use `RestClient.download_checkpoint_archive_from_tinker_path`.
62
+
- Sampling clients come from `training_client.save_weights_and_get_sampling_client(name=...)`. To export weights, use `RestClient.get_checkpoint_archive_url_from_tinker_path`.
63
63
64
64
## Async & Performance
65
65
- Worker pools advance in ~10s clock cycles. Submit `forward_backward_async` and `optim_step_async` back-to-back, then await both futures to keep them on the same cycle.
Copy file name to clipboardExpand all lines: llms-full.txt
+44-9Lines changed: 44 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -607,11 +607,12 @@ We'll start with a couple of general pages that'll be relevant to almost all of
607
607
608
608
# Saving and loading weights and optimizer state
609
609
610
-
During training, you'll need to save checkpoints for two main purposes: *sampling* (to test your model) and *resuming training* (to continue from where you left off). The `TrainingClient` provides three methods to handle these cases:
610
+
During training, you'll need to save checkpoints for two main purposes: *sampling* (to test your model) and *resuming training* (to continue from where you left off). The `TrainingClient` provides these methods to handle these cases:
611
611
612
612
1. `save_weights_for_sampler()`: saves a copy of the model weights that can be used for sampling.
613
613
2. `save_state()`: saves the weights and the optimizer state. You can fully resume training from this checkpoint.
614
-
3. `load_state()`: load the weights and the optimizer state. You can fully resume training from this checkpoint.
614
+
3. `load_state()`: load the model weights only (without optimizer state). Use this when you want to start fresh training from a checkpoint, e.g., starting DPO training from an SFT checkpoint.
615
+
4. `load_state_with_optimizer()`: load the model weights and optimizer state. Use this when resuming interrupted training, as it restores the full training state including optimizer momentum.
615
616
616
617
Note that (1) is faster and requires less storage space than (2).
### When to use `save_state()` and `load_state()`:
658
+
Async versions are also available: `load_state_with_optimizer_async()`.
658
659
660
+
### Example: Starting fresh from a checkpoint
659
661
660
-
- Multi-step training pipelines (e.g. supervised learning followed by reinforcement learning)
661
-
- Adjusting hyperparameters or data mid-run
662
-
- Recovery from interruptions or failures
662
+
Use `load_state()` when you want to start a new training phase from saved weights (e.g., starting DPO from an SFT checkpoint):
663
+
664
+
```python
665
+
# Load weights only, starting with fresh optimizer state
666
+
training_client.load_state(sft_checkpoint_path)
667
+
```
668
+
669
+
### When to use `load_state_with_optimizer()`:
670
+
671
+
- Recovery from interruptions or failures (resume training exactly where you left off)
663
672
- Any scenario where you need to preserve exact optimizer state (momentum, learning rate schedules, etc.)
664
673
674
+
### When to use `load_state()`:
675
+
676
+
- Multi-step training pipelines (e.g., starting DPO training from an SFT checkpoint)
677
+
- Starting fresh training from pretrained weights with a new optimizer
678
+
679
+
### ServiceClient methods for loading checkpoints
680
+
681
+
The `ServiceClient` also provides methods to create a new `TrainingClient` directly from a saved checkpoint:
682
+
683
+
- `create_training_client_from_state(path)`: Creates a `TrainingClient` with weights loaded from the checkpoint (no optimizer state). Use this when starting a new training phase from saved weights.
684
+
- `create_training_client_from_state_with_optimizer(path)`: Creates a `TrainingClient` with both weights and optimizer state loaded. Use this when resuming interrupted training.
<|begin_of_sentence|><|User|>What can you help me with?<|Assistant|><think>Thinking...</think>I can help you with...<|end_of_centence|>
628
+
<|begin_of_sentence|><|User|>What can you help me with?<|Assistant|><think>Thinking...</think>I can help you with...<|end_of_sentence|>
629
629
For no-think, just use <|Assistant|></think>
630
+
Deepseek renderer does not support the system role out of the box. You can set system_role_as_user to True to automatically convert the system role to the user role.
0 commit comments