Skip to content

Olmo3 support#1170

Merged
jlarson4 merged 86 commits intoTransformerLensOrg:dev-3.xfrom
etomoscow:olmo3-support
Feb 13, 2026
Merged

Olmo3 support#1170
jlarson4 merged 86 commits intoTransformerLensOrg:dev-3.xfrom
etomoscow:olmo3-support

Conversation

@etomoscow
Copy link

Description

Add support for OLMO 3/3.1 models, on top of the existing OLMO 1-2 and OLMoE.

OLMO 3/3.1 introduces several architectural improvements that required a new weight conversion implementation with the following enhancements over the existing OLMO weight conversion:

Description

This PR adds support for the OLMO 3/3.1 family of models from AllenAI, complementing the existing OLMO 1, OLMO 2, and OLMoE implementations.

OLMO 3/3.1 uses a different architecture from earlier OLMO models, so I added a new weight conversion function. It supports GQA, QK normalization, layer norm weights.

The config conversion uses AutoConfig from the HuggingFace config. It also handles the layer_types attribute for models with mixed sliding window and full attention layers (converting sliding_attentionlocal, full_attentionglobal).

Models Added

  • allenai/Olmo-3-7B-Think
  • allenai/Olmo-3-32B-Think
  • allenai/Olmo-3.1-32B-Think
  • allenai/Olmo-3-7B-Instruct
  • allenai/Olmo-3.1-32B-Instruct

Type of change

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

jlarson4 and others added 21 commits January 16, 2026 19:27
This PR adds support for the OLMO 3/3.1 family of models from AllenAI,
complementing the existing OLMO 1/2 and OLMoE implementations.

Key improvements over existing OLMO weight conversion:
- Proper GQA detection (n_key_value_heads < n_heads) with underscore prefix
- Q/K normalization support (q_norm.w, k_norm.w)
- Uses actual layer norm weights instead of torch.ones()
- Device-consistent tensor creation (device=W.device)
- Complete attention bias support (b_Q, b_K, b_V)

Models added:
- allenai/Olmo-3-7B-Think
- allenai/Olmo-3-32B-Think
- allenai/Olmo-3.1-32B-Think
- allenai/Olmo-3-7B-Instruct
- allenai/Olmo-3.1-32B-Instruct

Test output: 11/11 OLMO 3 tests passed
@jlarson4 jlarson4 changed the base branch from main to dev February 13, 2026 00:17
@jlarson4 jlarson4 changed the base branch from dev to dev-3.x February 13, 2026 00:27
# Conflicts:
#	demos/Attribution_Patching_Demo.ipynb
#	demos/BERT.ipynb
#	demos/Colab_Compatibility.ipynb
#	demos/Exploratory_Analysis_Demo.ipynb
#	demos/Head_Detector_Demo.ipynb
#	demos/Main_Demo.ipynb
#	demos/Othello_GPT.ipynb
#	poetry.lock
#	pyproject.toml
#	tests/integration/test_head_detector.py
#	tests/unit/test_svd_interpreter.py
#	transformer_lens/HookedEncoder.py
#	transformer_lens/HookedTransformer.py
#	transformer_lens/components/abstract_attention.py
#	transformer_lens/loading_from_pretrained.py
#	transformer_lens/pretrained/weight_conversions/__init__.py
#	transformer_lens/pretrained/weight_conversions/olmo.py
#	transformer_lens/pretrained/weight_conversions/olmo2.py
#	transformer_lens/pretrained/weight_conversions/olmoe.py
#	transformer_lens/utils.py
@jlarson4
Copy link
Collaborator

Hello @etomoscow! I mentioned this in PR #1081, the OLMo HookedTransformer implementation has been bumped to 3.x, in order to maintain Python 3.9 for version 2.x (#1081 included a deprecation of python 3.9 support). I have forward ported the this branch to 3.x, and these models will be included in HookedTransformer in the next 3.x release.

@jlarson4 jlarson4 merged commit 98e4837 into TransformerLensOrg:dev-3.x Feb 13, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

Comments