Skip to content

Commit c083b17

Browse files
authored
Update sentencepiece IO test (#406)
- Switch to tiny testing model to reduce memory usage - Use slow tokenizer to test sentencepiece requirement - Add sentencepiece extra to dev requirements
1 parent f0b475d commit c083b17

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
spacy>=3.5.0,<4.0.0
22
numpy>=1.15.0
3-
transformers>=3.4.0,<4.35.0
3+
transformers[sentencepiece]>=3.4.0,<4.35.0
44
torch>=1.8.0
55
srsly>=2.4.0,<3.0.0
66
dataclasses>=0.6,<1.0; python_version < "3.7"

spacy_transformers/tests/test_pipeline_component.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,8 @@ def test_transformer_pipeline_tagger_senter_listener():
238238
def test_transformer_sentencepiece_IO():
239239
"""Test that a transformer using sentencepiece trains + IO goes OK"""
240240
orig_config = Config().from_str(cfg_string)
241-
orig_config["components"]["transformer"]["model"]["name"] = "camembert-base"
241+
orig_config["components"]["transformer"]["model"]["name"] = "hf-internal-testing/tiny-xlm-roberta"
242+
orig_config["components"]["transformer"]["model"]["tokenizer_config"] = {"use_fast": False}
242243
nlp = util.load_model_from_config(orig_config, auto_fill=True, validate=True)
243244
tagger = nlp.get_pipe("tagger")
244245
tagger_trf = tagger.model.get_ref("tok2vec").layers[0]

0 commit comments

Comments
 (0)