Olmo3 support by etomoscow · Pull Request #1170 · TransformerLensOrg/TransformerLens

etomoscow · 2026-02-11T19:49:48Z

Description

Add support for OLMO 3/3.1 models, on top of the existing OLMO 1-2 and OLMoE.

OLMO 3/3.1 introduces several architectural improvements that required a new weight conversion implementation with the following enhancements over the existing OLMO weight conversion:

Description

This PR adds support for the OLMO 3/3.1 family of models from AllenAI, complementing the existing OLMO 1, OLMO 2, and OLMoE implementations.

OLMO 3/3.1 uses a different architecture from earlier OLMO models, so I added a new weight conversion function. It supports GQA, QK normalization, layer norm weights.

The config conversion uses AutoConfig from the HuggingFace config. It also handles the layer_types attribute for models with mixed sliding window and full attention layers (converting sliding_attention → local, full_attention → global).

Models Added

allenai/Olmo-3-7B-Think
allenai/Olmo-3-32B-Think
allenai/Olmo-3.1-32B-Think
allenai/Olmo-3-7B-Instruct
allenai/Olmo-3.1-32B-Instruct

Type of change

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Originally from TransformerLensOrg#718.

Add OLMoE

Fix to OLMo 2 normalization

This PR adds support for the OLMO 3/3.1 family of models from AllenAI, complementing the existing OLMO 1/2 and OLMoE implementations. Key improvements over existing OLMO weight conversion: - Proper GQA detection (n_key_value_heads < n_heads) with underscore prefix - Q/K normalization support (q_norm.w, k_norm.w) - Uses actual layer norm weights instead of torch.ones() - Device-consistent tensor creation (device=W.device) - Complete attention bias support (b_Q, b_K, b_V) Models added: - allenai/Olmo-3-7B-Think - allenai/Olmo-3-32B-Think - allenai/Olmo-3.1-32B-Think - allenai/Olmo-3-7B-Instruct - allenai/Olmo-3.1-32B-Instruct Test output: 11/11 OLMO 3 tests passed

# Conflicts: # demos/Attribution_Patching_Demo.ipynb # demos/BERT.ipynb # demos/Colab_Compatibility.ipynb # demos/Exploratory_Analysis_Demo.ipynb # demos/Head_Detector_Demo.ipynb # demos/Main_Demo.ipynb # demos/Othello_GPT.ipynb # poetry.lock # pyproject.toml # tests/integration/test_head_detector.py # tests/unit/test_svd_interpreter.py # transformer_lens/HookedEncoder.py # transformer_lens/HookedTransformer.py # transformer_lens/components/abstract_attention.py # transformer_lens/loading_from_pretrained.py # transformer_lens/pretrained/weight_conversions/__init__.py # transformer_lens/pretrained/weight_conversions/olmo.py # transformer_lens/pretrained/weight_conversions/olmo2.py # transformer_lens/pretrained/weight_conversions/olmoe.py # transformer_lens/utils.py

jlarson4 · 2026-02-13T14:34:15Z

Hello @etomoscow! I mentioned this in PR #1081, the OLMo HookedTransformer implementation has been bumped to 3.x, in order to maintain Python 3.9 for version 2.x (#1081 included a deprecation of python 3.9 support). I have forward ported the this branch to 3.x, and these models will be included in HookedTransformer in the next 3.x release.

jonasrohw and others added 30 commits December 12, 2024 09:58

added and tested: OLMo-1B,OLMo-7B

1fe4d04

fixed: numpy do not do a major upgrade!

0f3e3b3

fixed: dimensions of 7b to be correct

3a101f4

tested: Loading checkpoints & model variations

1b34ccd

Reimplement OLMoE changes.

f0a0a68

Originally from TransformerLensOrg#718.

Implement TODO (norm_topk_prob)

8c094e5

Disable bos token for OLMoE.

7565c06

Add q and k norm.

04cd309

Correct normalization type for OLMoE.

68d6961

Merge pull request TransformerLensOrg#1 from joelburget/olmoe

9afd032

Add OLMoE

Merge branch 'dev' into OLMo

96c1fbb

ran formatting

72fb903

Merge branch 'dev' into OLMo

9d3a85e

Merge branch 'dev' into OLMo

d4519b2

tmp update for olmo2

064310f

Fix: Olmo2 uses normalization after the attention/mlp

b1fd04b

Merge branch 'dev' into OLMo

871ba03

ran format

7939e8d

fixed some type issues

97fd1e7

Merge branch 'dev' into OLMo

9032fe7

OLMo 2 RMS

39703c4

OLMo 2 RMS

1c283c1

Tested Instruct models

688a421

Merge pull request TransformerLensOrg#3 from jleechung/OLMo

9febc5c

Fix to OLMo 2 normalization

fix: Olmo2DecoderLayer type issues

86b1fce

fix type assertions for attention

fa5c885

chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility

148df46

Merge dev and regenerate lock file

1c60345

fix: sort imports in olmo2.py

7aa3a91

docs: update Colab notebook for OLMo models

c8d443b

jlarson4 and others added 21 commits January 16, 2026 19:27

Fix type issues

5ccbf68

Fix type issues

6d3c870

Fix format issues

7151270

Fix format issues again

1d0aebb

Fix format issues for black

4316e8b

another attempt at black formatting

040b19b

Fix format issues for black again

fb259ce

Retyping the blocks in HookedTransformer and HookedEncoder

72521c7

undo modulelist typing

f0ddc0e

Improve type checking in test_detect_head_with_invalid_head_name

bdbd649

removing unused import

0ec06b9

Fixing Patchscopes_Generation_Demo.ipynb

09a9bdd

Fixing the rest of the notebooks

7933afc

Fixing the more notebooks

61347e0

run_line_magic

d4e986a

BERT ipynb fix

4fed25a

Trying to fix the BERT set_grad cell

7b22ce4

more set_grad cell fixes

0899f3c

Revert python change

da6d1e3

Revert python change

da8698b

jlarson4 changed the base branch from main to dev February 13, 2026 00:17

jlarson4 changed the base branch from dev to dev-3.x February 13, 2026 00:27

jlarson4 added 4 commits February 12, 2026 22:45

fixed import

69bc817

sorted models

49fb4fb

Updated rope yarn

36d84ce

Fixed format issue

b9cf7ba

jlarson4 merged commit 98e4837 into TransformerLensOrg:dev-3.x Feb 13, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Olmo3 support#1170

Olmo3 support#1170
jlarson4 merged 86 commits intoTransformerLensOrg:dev-3.xfrom
etomoscow:olmo3-support

etomoscow commented Feb 11, 2026

Uh oh!

jlarson4 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Comments

Conversation

etomoscow commented Feb 11, 2026

Description

Description

Models Added

Type of change

Checklist:

Uh oh!

jlarson4 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Comments