feat: Introduce Taskfile-based workflow by vishnya · Pull Request #3 · lean-dojo/LeanAgent

vishnya · 2025-06-15T23:03:18Z

PR: `feat`: Introduce Task-based workflow for all project operations

This PR introduces Taskfile as the new, unified entry point for all developer and user-facing operations, such as setup, testing, and running experiments. The motivation is to replace a collection of standalone scripts and manual command sequences with a single, self-documenting, and reproducible workflow. This simplifies onboarding and ensures consistency across all environments.

Summary of Notable Changes (File by File)

File	Change	Rationale
`Taskfile.yaml`	Added	The new heart of the workflow. Defines a set of clean, high-level tasks (`setup`, `test`, `run`, `run_fisher`, etc.) that orchestrate all necessary environment setup, downloads, and script executions. All configuration variables are documented with inline comments.
`run_compute_fisher.sh`	Deleted	This script's logic (environment setup, file cleanup, Ray management, and Python execution) has been fully absorbed into the `run_fisher` task in `Taskfile.yaml`, removing redundancy.
`replace_files.sh`	Modified	The script no longer creates temporary `ld_path.txt` and `pl_path.txt` files, as it now cleans them up upon completion. This keeps the project directory clean without needing `.gitignore` entries.
`tests/test_taskfile.py`	Added	A new test file that validates the `Taskfile.yaml`. It ensures that critical tasks are defined and that the file does not contain unresolved template placeholders.
`pytest.ini`	Added	Configures `pytest` to look for tests exclusively within the `tests/` directory. This prevents it from discovering and running tests from downloaded dependency repositories (e.g., in `data/raid/repos_new`).
`README.md`	Modified	Updated to reflect the new `Taskfile`-based workflow, instructing users to run `task <command>` instead of using individual scripts.
`.gitignore`	Modified	Removed ignore patterns for temporary files that are now cleaned up automatically by their generating scripts.
`requirements.txt`	No Change	The final requirements are identical to `main`.
`dynamic_database.py`	Modified	Misplaced comment blocks were breaking runs.

How to Test This PR

Reviewers can optionally validate these changes by checking out the branch and running the primary workflows, which now feel much cleaner:

# 1. Complete one-time project setup
task setup

# 2. Run the test suite
task test

# 3. Launch the main training and proving process
task run

# 4. Run the Elastic Weight Consolidation (EWC) workflow
task run_fisher

Known Issues & Next Steps

Google Drive Quota: The download_checkpoint_data task, part of the setup workflow, relies on gdown to fetch large files from Google Drive. During heavy testing, it's possible to hit a download quota, which appears to last up to 24 hours. A future improvement would be to host these artifacts on a more robust platform (e.g., Hugging Face Hub, AWS S3).
Refactor replace_files.sh: The file-patching mechanism in replace_files.sh is effective but somewhat brittle. A more robust, Python-based solution for applying these patches would be a valuable next step.
Create a config file from which scripts can pull from (as in the TODOs).

vishnya · 2025-06-15T23:07:05Z

@Adarsh321123 @motiwari please review!

…operations via Taskfile pattern

motiwari · 2025-06-16T14:59:53Z

This looks way, way better than having the user follow many different instructions (with many different ways things could go wrong) -- and I'm learning how to use Taskfiles effectively for the first time. Thank you for doing this!

I reviewed the code changes and they look good.

The only question I have is whether these changes preserve correctness. Does the code in this PR still produce the same results from the original paper?

vishnya · 2025-06-16T17:32:38Z

hi @motiwari ! That’s a great point. For me the runs take a very long time, and it’s hard to tell. Do you happen to have a benchmark toy dataset that we can use for mocking? If not, we should create one.

Adarsh321123 · 2025-06-18T16:08:48Z

Hi @vishnya. Thanks for this contribution! The Taskfile-based workflow is a huge improvement for developer experience and onboarding. For sanity checking correctness, we can simply run the new task run workflow on a small set of repos (like just Compfiles and MIL) and compare key metrics/outputs against those in the paper. Moreover, to check that the entire workflow works, we can use a separate blank repo. You can quickly do these by following the README.md and then hardcoding those repositories in leanagent.py.

vishnya · 2025-06-21T20:52:59Z

Hi folks! I've been too busy with work last week and haven't had a chance to test. Running the new task run on a small set of repos, and separately a blank repo, makes sense, although it seems fairer to compare the results against running the old flow on the same repos , rather than comparing the paper results.

vishnya · 2025-06-23T12:40:46Z

@Adarsh321123 @motiwari

1/ Want to confirm that we want to test the following way, and whether or not you think the test will be straightforward to implement (i.e. youve done something similar before):
For a) a blank repo, for b) 1-2 repos, compare results of the run

Existing code, with
proposed taskfile code.
2/ Which metrics should we focus on, and what is the range of values/margin of error that is acceptable?

Adarsh321123 · 2025-06-23T17:29:08Z

@vishnya

1/ Yes, testing that way for (a) and (b) is straightforward.
2/ It would be easiest to focus on LeanAgent's accuracy during lifelong learning for PFR and compare that to the 2.7% reported in Table 5 in the paper. This is somewhat stochastic and may deserve a few runs.

motiwari · 2025-06-25T23:33:20Z

Hi @vishnya , my apologies for the delay in getting back to you after our 1:1 discussion.

The steps @Adarsh321123 mentioned seem good. Let us know if you need more details on how to run everything. @Adarsh321123 and I are also discussing setting up a lighter testing framework in #4 and #5

vishnya force-pushed the rp/taskification branch 2 times, most recently from 8ac5ec0 to ee8399a Compare June 16, 2025 01:51

feat: adopt Taskfile workflow – developers and users can now run all …

1b6d85c

…operations via Taskfile pattern

vishnya force-pushed the rp/taskification branch from ee8399a to 1b6d85c Compare June 16, 2025 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Introduce Taskfile-based workflow#3

feat: Introduce Taskfile-based workflow#3
vishnya wants to merge 1 commit intolean-dojo:mainfrom
vishnya:rp/taskification

vishnya commented Jun 15, 2025 •

edited

Loading

Uh oh!

vishnya commented Jun 15, 2025 •

edited

Loading

Uh oh!

motiwari commented Jun 16, 2025

Uh oh!

vishnya commented Jun 16, 2025

Uh oh!

Adarsh321123 commented Jun 18, 2025 •

edited

Loading

Uh oh!

vishnya commented Jun 21, 2025

Uh oh!

vishnya commented Jun 23, 2025

Uh oh!

Adarsh321123 commented Jun 23, 2025

Uh oh!

motiwari commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vishnya commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR: feat: Introduce Task-based workflow for all project operations

Summary of Notable Changes (File by File)

How to Test This PR

Known Issues & Next Steps

Uh oh!

vishnya commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

motiwari commented Jun 16, 2025

Uh oh!

vishnya commented Jun 16, 2025

Uh oh!

Adarsh321123 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishnya commented Jun 21, 2025

Uh oh!

vishnya commented Jun 23, 2025

Uh oh!

Adarsh321123 commented Jun 23, 2025

Uh oh!

motiwari commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vishnya commented Jun 15, 2025 •

edited

Loading

PR: `feat`: Introduce Task-based workflow for all project operations

vishnya commented Jun 15, 2025 •

edited

Loading

Adarsh321123 commented Jun 18, 2025 •

edited

Loading