Skip to content

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Nov 11, 2025

If you add a model or a dataset, please add the corresponding checklist:

@KennethEnevoldsen KennethEnevoldsen changed the title feat: add typecheck fix: add typecheck Nov 12, 2025
return type(self).model_construct(model_results=new_model_results)

def join_revisions(self) -> Self:
def join_revisions(self) -> "BenchmarkResults":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs the "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change to this, but would require __future__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer __future__


def __iter__(self) -> Iterator[ModelResult]:
@override
def __iter__(self) -> Iterator[ModelResult]: # type: ignore[override]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get the type error here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseModel have it's own iterator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, but it still gives you a type error even with the decorator (seems to me like we are specifying it twice)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll remove it

"""
sentences = [text for batch in inputs for text in batch["text"]]
instruction = self.get_task_instruction(task_metadata, prompt_type)
instruction: str | None = self.get_task_instruction(task_metadata, prompt_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be annotated by the function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get task instruction will always return string, but later we have instructions = None and this cause conflict. Probably later can be changed to empty string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would probably do like this:

Suggested change
instruction: str | None = self.get_task_instruction(task_metadata, prompt_type)
instruction: str | None # further up
instruction = self.get_task_instruction(task_metadata, prompt_type)

otherwise we are suggesting that the function could also return None even though it can't

exclude_aggregate: bool = False,
exclude_private: bool = False,
) -> list[type[AbsTask]]: ...
T = TypeVar("T", AbsTask, type[AbsTask])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this guarantee that if I provide AbsTask, then I won't get an type[AbsTask] out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how does this influence the docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this guarantee that if I provide AbsTask, then I won't get an type[AbsTask] out?

Yes

Also how does this influence the docs?

I'll check

def evaluate(
model: ModelMeta | MTEBModels | SentenceTransformer | CrossEncoder,
tasks: AbsTask | Iterable[AbsTask],
tasks: AbsTask | Benchmark | Iterable[AbsTask | Benchmark],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be an iterable of benchmarks? Benchmark should be an iterable of AbsTask

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems that we don't support this, because I mistakenly looked to deprecated evaluator

def test_run_using_benchmark(model: mteb.EncoderProtocol, tmp_path: Path):

Probably we need to add a test for new evaluator too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add support for it, but I am leaning toward not doing it. Keeps it cleaner and it is not like a for loop across benchmarks is hard to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

) -> tuple[MTEBModels | ModelMeta, ModelMeta, ModelName, Revision]:
from sentence_transformers import CrossEncoder, SentenceTransformer

wrapped: MTEBModels | ModelMeta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wrapped: MTEBModels | ModelMeta
wrapped_model: MTEBModels | ModelMeta

Comment on lines -73 to -76
self.tasks = list(tasks)
if len(self.tasks) > 0 and isinstance(self.tasks[0], Benchmark):
if isinstance(tasks, list) and all(
isinstance(task, Benchmark) for task in tasks
):
self.benchmarks = tasks
self.tasks = list(chain.from_iterable(self.tasks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure why this was changed?

def generate_model_card(
model_name: str,
tasks: list[AbsTask] | None = None,
tasks: Sequence[AbsTask] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterable or sequence?

Comment on lines +63 to +67
class AbsMetrics(TypedDict):
"""The abstract class for the metrics returned by the tasks"""

...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure why this is added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to standartize, because dict is not compatible with mappting, but I will remove this I think and change evaluate subset to Mapping, which can handle both

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions bot added the stale label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants