Tool artifact for the ASE 2026 Tool-Track submission "MergeSE: Post-hoc Model Merging for Software Engineering Tasks Without Retraining"
The underlying model-merging approach comes from our research-track submission (under review): "A Unified Model for Cross-Domain Clone Detection via Model Merging"
MergeSE merges fine-tuned HuggingFace encoder checkpoints (CodeBERT, GraphCodeBERT, UniXcoder, CodeT5-encoder, …) into a single model without any training data, and evaluates the result on standard software-engineering benchmarks. It ships as a single-file CLI and a web tool that share the same engine.
It implements:
- TIES-Merging (Yadav et al., NeurIPS 2023)
- DARE-TIES (Yu et al., 2024)
- Task-vector averaging (Ilharco et al., 2022)
Plus end-to-end evaluation across the full range of SE classification tasks (clone detection, vulnerability detection, defect prediction, code-smell detection, commit classification, code-review acceptability, comment-code consistency, exception-type prediction, type inference, and any custom CSV) and one-command export to HuggingFace / ONNX / TorchScript.
Pick whichever fits your environment — all three sit on top of the same merging engine, so results are identical.
| # | Path | Best for |
|---|---|---|
| 1 | Web tool via Docker | Easiest setup. One command brings up the UI and REST API. |
| 2 | Web tool without Docker | Same UI, but you'd rather run Flask directly in a venv. |
| 3 | CLI tool | Scripting, headless servers, reproducible runs, paper-grade evaluation. |
Full references: docs/WEB.md and docs/CLI.md.
git clone https://github.com/srlabUsask/MergeSE.git
cd MergeSEdocker compose up -d --build # → http://localhost:8765Open http://localhost:8765 in your browser. Stop with docker compose down.
Common first-run issues:
permission denied … /var/run/docker.sock— your user isn't in thedockergroup. One-time fix:sudo usermod -aG docker $USER, then open a fresh shell. Or prefix the one-off command withsudo.address already in use … 8765— something else is bound to port 8765. Either stop it (sudo ss -ltnp | grep :8765to find the PID), or remap the host port indocker-compose.yml(e.g."127.0.0.1:8766:8765") and use http://localhost:8766 instead.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install --index-url https://download.pytorch.org/whl/cpu torch
pip install ".[server,datasets]"
python server/app.py # → http://localhost:8765For production, swap python server/app.py for
gunicorn -c deploy/gunicorn.conf.py server.app:app.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install --index-url https://download.pytorch.org/whl/cpu torch
pip install .
mergese inspect ./model_a ./model_b --base microsoft/codebert-base
mergese merge ./model_a ./model_b --base microsoft/codebert-base \
--method ties --output ./merged
mergese evaluate ./merged --task clone_detection --test-file ./test.csv
mergese export ./merged --format onnx --output ./merged.onnxThis installs a mergese console script on your $PATH. To run without
installing, use python mergese.py … after pip install -r requirements.txt.
| Task | Input | Metric | Known benchmarks |
|---|---|---|---|
| Clone detection | pair | binary F1 | BigCloneBench, CLCDSA, GPTCloneBench, POJ-104 |
| Vulnerability detection | single | binary F1 | Devign, ReVeal, Big-Vul, D2A, Draper |
| Defect / bug prediction | single | binary F1 | Defects4J, PROMISE, CodeXGLUE-Defect |
| Code-smell detection | single | binary F1 | MLCQ, Qualitas |
| Commit classification | single | macro F1 | CommitBench |
| Code-review acceptability | pair | binary F1 | CodeReview |
| Comment-code consistency | pair | binary F1 | comment-consistency datasets |
| Exception-type prediction | single | macro F1 | CodeXGLUE-Exception |
| Type inference (closed) | single | macro F1 | Typilus, Type4Py, ManyTypes4Py |
| Custom (any CSV) | auto | auto | — |
mergese tasks (CLI) or GET /api/tasks (web) returns the same list.
When models have differently-shaped classifier heads (e.g. a 2-class clone
detector + a 10-class commit classifier), MergeSE auto-detects the mismatch
and runs an encoder-only merge. The base's head is preserved so you can
attach a fresh task-specific head downstream. Force this with
--encoder-only, or override with --include-heads.
MergeSE/
├── mergese.py # the entire CLI (single file)
├── mergese_tasks.py # task registry
├── server/
│ ├── app.py # Flask backend
│ └── presets.json # example workflows
├── frontend/
│ ├── index.html
│ ├── styles.css
│ ├── app.js
│ └── favicon.svg
├── deploy/
│ ├── nginx.conf # reverse-proxy site
│ ├── mergese.service # systemd unit
│ └── gunicorn.conf.py
├── data/
│ └── benchmarks/ # 200-row bundled samples + index.json
├── tests/
│ ├── test_merge_math.py
│ └── test_tasks_and_heads.py
├── docs/
│ ├── CLI.md # full CLI reference
│ └── WEB.md # full web-tool reference
├── pyproject.toml
├── requirements.txt
├── Dockerfile
└── docker-compose.yml
| Name | Rows | Task | Source |
|---|---|---|---|
bundled://bigclonebench |
200 (100/100) | clone detection (Java) | CodeXGLUE / BigCloneBench |
bundled://clcdsa |
200 (100/100) | cross-language clones (Java↔Python) | CLCDSA Source Codes |
bundled://gptclonebench |
200 (100/100) | semantic clones (Java) | GPTCloneBench standalone |
These are sampled from the original benchmarks for smoke-testing only. For
paper-grade numbers, point --test-file at the full dataset.
If you use MergeSE itself, please cite the tool paper:
@inproceedings{roy2026mergese,
author = {Palash R. Roy and Banani Roy and Chanchal K. Roy and Kevin A. Schneider},
title = {MergeSE: Post-hoc Model Merging for Software Engineering Tasks Without Retraining},
booktitle = {Proc. ASE Tool Track},
year = {2026}
}If you use the merging methodology MergeSE packages, please also cite our research-track paper:
@inproceedings{roy2026unified,
author = {Palash R. Roy and Banani Roy and Chanchal K. Roy and Kevin A. Schneider},
title = {A Unified Model for Cross-Domain Clone Detection via Model Merging},
booktitle = {Proc. ASE},
year = {2026}
}Apache-2.0. See LICENSE.