This is a replication package for our work on evaluation of code generation metris.
- Library for computation of code generation metrics is available on PyPi
pip install codegen-metrics
- Pre-print is available on arXiv
- The article is to be published soon in Journal of Systems and Software
We use poetry to manage the environment and library versions.
- You can find the installation manual here.
- Run
poetry installto setup the environment.
To run grading scripts you will also need to install tkinter.
- For linux users:
sudo apt-get install python3-tk. - For Mac users:
brew install [email protected]
To run metric computations, you will also need tree-sitter.
- To use it, run
git clone https://github.com/tree-sitter/tree-sitter-python.git build/tree-sitter-python.- To make sure that you use the right version of tree, checkout the specific version:
cd build/tree-sitter-python && git checkout 9e53981
We expect all scripts to be run from the root directory of this repository.
metrics_evaluation/gradingcontains Python scripts that run simple GUI for grading HS and Conala datasetsmetrics_evaluation/metricscontains code to run all the metrics studied in our work. For usage examples refer to02-compute-metrics.ipynbmetrics_evaluation/metricscontains code for bootstrapping and analysis. It is further used in03-bootstrap.ipynbdatadirectory contains all the data: intentions, generations from all models, human grades, etc.
@article{evtikhiev2023metrics,
title = {Out of the BLEU: How should we assess quality of the Code Generation models?},
journal = {Journal of Systems and Software},
pages = {111741},
year = {2023},
issn = {0164-1212},
doi = {https://doi.org/10.1016/j.jss.2023.111741},
url = {https://www.sciencedirect.com/science/article/pii/S016412122300136X},
author = {Mikhail Evtikhiev and Egor Bogomolov and Yaroslav Sokolov and Timofey Bryksin},
keywords = {Code generation, Metrics, Neural networks, Code similarity},
}