Conversation
|
@Adarsh321123 @motiwari please review! |
8ac5ec0 to
ee8399a
Compare
…operations via Taskfile pattern
ee8399a to
1b6d85c
Compare
|
This looks way, way better than having the user follow many different instructions (with many different ways things could go wrong) -- and I'm learning how to use Taskfiles effectively for the first time. Thank you for doing this! I reviewed the code changes and they look good. The only question I have is whether these changes preserve correctness. Does the code in this PR still produce the same results from the original paper? |
|
hi @motiwari ! That’s a great point. For me the runs take a very long time, and it’s hard to tell. Do you happen to have a benchmark toy dataset that we can use for mocking? If not, we should create one. |
|
Hi @vishnya. Thanks for this contribution! The Taskfile-based workflow is a huge improvement for developer experience and onboarding. For sanity checking correctness, we can simply run the new task run workflow on a small set of repos (like just Compfiles and MIL) and compare key metrics/outputs against those in the paper. Moreover, to check that the entire workflow works, we can use a separate blank repo. You can quickly do these by following the |
|
Hi folks! I've been too busy with work last week and haven't had a chance to test. Running the new task run on a small set of repos, and separately a blank repo, makes sense, although it seems fairer to compare the results against running the old flow on the same repos , rather than comparing the paper results. |
|
1/ Want to confirm that we want to test the following way, and whether or not you think the test will be straightforward to implement (i.e. youve done something similar before):
|
|
1/ Yes, testing that way for (a) and (b) is straightforward. |
|
Hi @vishnya , my apologies for the delay in getting back to you after our 1:1 discussion. The steps @Adarsh321123 mentioned seem good. Let us know if you need more details on how to run everything. @Adarsh321123 and I are also discussing setting up a lighter testing framework in #4 and #5 |
PR:
feat: Introduce Task-based workflow for all project operationsThis PR introduces
Taskfileas the new, unified entry point for all developer and user-facing operations, such as setup, testing, and running experiments. The motivation is to replace a collection of standalone scripts and manual command sequences with a single, self-documenting, and reproducible workflow. This simplifies onboarding and ensures consistency across all environments.Summary of Notable Changes (File by File)
Taskfile.yamlsetup,test,run,run_fisher, etc.) that orchestrate all necessary environment setup, downloads, and script executions. All configuration variables are documented with inline comments.run_compute_fisher.shrun_fishertask inTaskfile.yaml, removing redundancy.replace_files.shld_path.txtandpl_path.txtfiles, as it now cleans them up upon completion. This keeps the project directory clean without needing.gitignoreentries.tests/test_taskfile.pyTaskfile.yaml. It ensures that critical tasks are defined and that the file does not contain unresolved template placeholders.pytest.inipytestto look for tests exclusively within thetests/directory. This prevents it from discovering and running tests from downloaded dependency repositories (e.g., indata/raid/repos_new).README.mdTaskfile-based workflow, instructing users to runtask <command>instead of using individual scripts..gitignorerequirements.txtmain.dynamic_database.pyHow to Test This PR
Reviewers can optionally validate these changes by checking out the branch and running the primary workflows, which now feel much cleaner:
Known Issues & Next Steps
download_checkpoint_datatask, part of thesetupworkflow, relies ongdownto fetch large files from Google Drive. During heavy testing, it's possible to hit a download quota, which appears to last up to 24 hours. A future improvement would be to host these artifacts on a more robust platform (e.g., Hugging Face Hub, AWS S3).replace_files.sh: The file-patching mechanism inreplace_files.shis effective but somewhat brittle. A more robust, Python-based solution for applying these patches would be a valuable next step.