Skip to content

Conversation

@rohansen856
Copy link
Contributor

Metadata

Details

Stackend PR, Depends on #1576

This PR adds Studies v2 migration.

A question:
Due to the pre commit hook i could not put 6 arguments in a function, so i had to workaround that with this instead:
openml_api\resources\studies.py (line 10-15)

        limit = kwargs.get("limit")
        offset = kwargs.get("offset")
        status = kwargs.get("status")
        main_entity_type = kwargs.get("main_entity_type")
        uploader = kwargs.get("uploader")
        benchmark_suite = kwargs.get("benchmark_suite")

I would like to confirm if this approach is correct or not. Raising a draft PR for now.

@codecov-commenter
Copy link

codecov-commenter commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.26%. Comparing base (645ef01) to head (9170edc).

Files with missing lines Patch % Lines
openml/_api/resources/studies.py 21.05% 30 Missing ⚠️
openml/_api/http/client.py 82.60% 12 Missing ⚠️
openml/_api/resources/tasks.py 87.23% 6 Missing ⚠️
openml/_api/runtime/fallback.py 0.00% 6 Missing ⚠️
openml/_api/runtime/core.py 82.14% 5 Missing ⚠️
openml/_api/resources/datasets.py 77.77% 2 Missing ⚠️
openml/_api/__init__.py 75.00% 1 Missing ⚠️
openml/_api/config.py 96.87% 1 Missing ⚠️
openml/study/functions.py 75.00% 1 Missing ⚠️
openml/tasks/functions.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1610      +/-   ##
==========================================
+ Coverage   52.78%   54.26%   +1.48%     
==========================================
  Files          36       47      +11     
  Lines        4331     4559     +228     
==========================================
+ Hits         2286     2474     +188     
- Misses       2045     2085      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
25 tasks
@rohansen856
Copy link
Contributor Author

Implementing noqa instead of the kwargs following example from here: openml\testing.py:

    def _check_fold_timing_evaluations(  # noqa: PLR0913
        self,
        fold_evaluations: dict[str, dict[int, dict[int, float]]],
        num_repeats: int,
        num_folds: int,
        *,
        max_time_allowed: float = 60000.0,
        task_type: TaskType = TaskType.SUPERVISED_CLASSIFICATION,
        check_scores: bool = True,
    ) -> None:

Final function signature:

    def list(  # noqa: PLR0913
        self,
        limit: int | None = None,
        offset: int | None = None,
        status: str | None = None,
        main_entity_type: str | None = None,
        uploader: list[int] | None = None,
        benchmark_suite: int | None = None,
    ) -> Any:

@rohansen856 rohansen856 marked this pull request as ready for review January 13, 2026 07:21
Copy link
Contributor

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Just use the listing as suggested in #1575 (comment) which is already similar to what you have done.

@rohansen856
Copy link
Contributor Author

@geetu040 I reviewed the specific changes needed and have a slight doubt in the pandas implementation.
So as i undertand, i need to use pandas Dataframe insteaf of ANY in openml\_api\resources\base.py like this:

class StudiesAPI(ResourceAPI, ABC):
    @abstractmethod
    def list(  # noqa: PLR0913
        self,
        limit: int | None = None,
        offset: int | None = None,
        status: str | None = None,
        main_entity_type: str | None = None,
        uploader: list[int] | None = None,
        benchmark_suite: int | None = None,
    ) -> pd.DataFrame: ...

and similarly i have to change the return object in openml\_api\resources\studies.py from this:return response.text
to this:

xml_string = response.text

        # Parse XML and convert to DataFrame
        study_dict = xmltodict.parse(xml_string, force_list=("oml:study",))

        # Minimalistic check if the XML is useful
        assert isinstance(study_dict["oml:study_list"]["oml:study"], list), type(
            study_dict["oml:study_list"],
        )
        assert (
            study_dict["oml:study_list"]["@xmlns:oml"] == "http://openml.org/openml"
        ), study_dict["oml:study_list"]["@xmlns:oml"]

        studies = {}
        for study_ in study_dict["oml:study_list"]["oml:study"]:
            # maps from xml name to a tuple of (dict name, casting fn)
            expected_fields = {
                "oml:id": ("id", int),
                "oml:alias": ("alias", str),
                "oml:main_entity_type": ("main_entity_type", str),
                "oml:benchmark_suite": ("benchmark_suite", int),
                "oml:name": ("name", str),
                "oml:status": ("status", str),
                "oml:creation_date": ("creation_date", str),
                "oml:creator": ("creator", int),
            }
            study_id = int(study_["oml:id"])
            current_study = {}
            for oml_field_name, (real_field_name, cast_fn) in expected_fields.items():
                if oml_field_name in study_:
                    current_study[real_field_name] = cast_fn(study_[oml_field_name])
            current_study["id"] = int(current_study["id"])
            studies[study_id] = current_study

        return pd.DataFrame.from_dict(studies, orient="index")

A total of 3 files would be affected: openml\_api\resources\base.py, openml\_api\resources\studies.py and openml\study\functions.py

Can you please confirm my approach... After that i will update the PR.

@geetu040
Copy link
Contributor

@rohansen856 yes sounds right

@rohansen856
Copy link
Contributor Author

Updated! Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants