📌 Reproducing results? Just copy-paste this to get started with deepsql.
# Clone the repo
git clone https://github.com/RaphaelMouravieff/deep_sql.git deepsql
cd deepsql
# Set up the environment
conda create -n deepsql python=3.11.11 -y
conda activate deepsql
# Install dependencies
pip install -r requirements.txtROBUT tests Table QA robustness with 10 human-crafted perturbations:
- Header: synonym, abbreviation
- Content: row/col shuffle, extension, masking, adding
- Question: word/sentence paraphrase
- Mixed: combined perturbations
Connect to the GPU Node
bash ssh gpu
Request a GPU with Slurm
srun -p hard --gpus-per-node=1 --constraint=A6000 --pty bash Activate the Conda Environment
conda activate [env] Starting Ollama
ollama serve & Running the Code Agents: Create the data
cd deep_sql/scripts
bash Agents/run.sh Clean/Increase the dataset generated: Clean the generated data + merged files
cd deep_sql/scripts
bash Datasets/prepare.sh Pre-train the clean dataset: Pre-trained the model on the dataset
cd deep_sql/scripts
bash Train/ptrain.sh Fine-tuned the clean dataset: Fine-tuning on wikitablequestions
cd deep_sql/scripts
bash Train/fine_tuned.sh Uni-test Library: Check if multi chunk work to save library, one common vector store and multiple .json.
cd deep_sql
python -m uni_test.library_multi_chunkUni-test answer check: Check vectore_store_content (step1= 96766)
cd deep_sql
python -m uni_test.vectore_store_content --vector_store_path data/library/vector_store_step_copyUni-test likelihood: Create the likelihood threeshold
cd deep_sql
python -m uni_test.find_likelihood_threeshold