A project setup and workflow designed for academic research collaboration, centered around Git and optimized for AI assistance.
Key Features:
- Git repo + Dropbox share, symlinked into one folder
- Git-centric: A must to use AI, because AI messes up things
- Compatible with traditional workflows and no-Git coauthors
- Fine-tuned skills, MCPs, and agents useful for academic research
See ProjectExample/ for structure reference and Setup for automated setup.
- Academic Research Project Template
Projects use a two-folder structure:
MyProject/- Git repository containing code, final figures/tables, and LaTeX documentsMyProject-Share/- Dropbox-synced folder with data, notes, and intermediate outputs
Folders from MyProject-Share/ are symlinked into MyProject/, so you work in one place with access to everything.
Why two folders? Solves the Git vs. Dropbox dilemma: Dropbox lacks proper version control and handles conflicts poorly, while Git struggles with large files. By linking folders together, you get Git's version control + Dropbox's file sharing while working seamlessly in one place.
Working with non-Git users: you can also clone the repo into the MyProject-Share folder, so they can work just as usual. Because it is shared via Dropbox, you can access all the code and handle Git on their behalf.
Code/- All analysis scripts and implementation- The subfolders are organized around different tasks, e.g.,
DataCleaning.
- The subfolders are organized around different tasks, e.g.,
Figures/- Final presentable charts, plots, and visualizations that we want to track the version with gitTables/- Final presentable result tables and summary statistics that we want to track with gitPaper/- The LaTeX folder containing the draftSlides/- The LaTeX folder containing slides
Notes/- Research notes and documentationData/- Raw and processed datasets. Typically read-only.Output/- Generated results and intermediate files- This folder is organized with subfolders that have the same names as folders under
Code.
- This folder is organized with subfolders that have the same names as folders under
-
Clone and create project:
git clone https://github.com/FuZhiyu/ResearchProjectTemplate.git cd ResearchProjectTemplate ./create_project.sh YourProjectNameOrPath -
Share with coauthors:
- Share
YourProjectName-Share/via Dropbox - Push to GitHub:
cd YourProjectName && git remote add origin <url> && git push -u origin main - Coauthors: clone repo and run
./setup_mac.sh
- Share
We use Git for version control and GitHub for collaboration. Git helps us track changes, work simultaneously without conflicts, and maintain a complete history of our research progress. Tons of tutorials on Git can be easily found online, so here we briefly explain two key concepts, commit and pull request, and focus more on best practices in academic research.
Why isn't Dropbox/Overleaf version history enough? Version control isn't just "save every copy"—it's about organizing changes meaningfully. Thousands of timestamped versions don't help you understand what changed or easily recover specific states.
A commit is a snapshot of your project at a specific point in time. Each commit has:
- A message describing what changed
- The author and timestamp
- A complete copy of all files at that moment
- A unique identifier (hash)
When you make changes to files, Git tracks what's different from the last commit. You can then "commit" these changes to create a new snapshot. This allows you to see exactly what changed between different versions.
-
Commit very often - Essential before AI edits. AI can mess things up, but frequent commits let you experiment safely knowing everything can be recovered.
-
Descriptive messages - "Fix typo in table 3" not "fix stuff" for easier change tracking.
-
Rule 1 >> Rule 2 - Rules 1 and 2 conflict since detailed messages add commit burden. Prioritize frequent commits over detailed messages. Better to have cryptic messages than no checkpoints.
Tips for lazy messaging:
- Ask AI: "Summarize the staged changes and write a commit message"
- Use PR descriptions to provide context later
-
One topic per commit - Keep commits focused on single changes rather than bundling multiple topics.
- Data files (
.csv,.xlsx,.parquet) - Personal files, IDE settings
- Sensitive info (API keys, passwords)
- Large files (>100MB)
- Auxiliary files (
.aux,.log,.tmp)
The standard "commit code, not output" rule is less clear for academic research. Exercise discretion:
- Do commit: Final figures/tables that feed into LaTeX documents when reproducibility is critical
- Don't commit: Large binary files that slow Git and can't show meaningful diffs
Remember: MyProject-Share/ files aren't tracked by Git.
Use VSCode's Python Interactive Window for cleaner Git management—write .py files with cell-by-cell evaluation.
If using notebooks: clean outputs before committing, save copies with outputs to the output folder.
We will use Pull Request (PR) workflows for collaboration. Here is an accessible guide on how the PR workflows work.
PRs solve co-editing conflicts: when multiple coauthors work simultaneously, how do we merge safely?
The solution: coauthors branch out, work independently, then merge back. Git handles non-conflicting changes automatically; conflicting edits are resolved during the PR process.
A typical PR workflow works as follows. The introduction here uses terminal commands, though all these can be done with GUI, or simply with AI.
-
Start new work:
git checkout main git pull origin main # Get latest changes git checkout -b feature/julie-regression-analysis # branch out
-
Do your work: Edit files, run analysis, create figures
-
Save your progress (do this frequently):
git add . git commit -m "Add regression tables for main specification"
-
Push to GitHub:
git push -u origin feature/julie-regression-analysis
-
Create Pull Request: (On VSCode source control panel, the button that looks like merge can directly create a pull request)
- Go to GitHub.com → our repository
- Click "Compare & pull request" (appears after you push)
- Write description of what you changed
- Click "Create pull request"
-
Review process:
- Team members review your changes
- Discuss any questions in PR comments
- Make additional commits if needed
-
Merge: Once approved, click "Merge pull request". We recommend choosing
squash and mergewhich combines all the updates in a single commit inmainto keep the timeline clean. -
Clean up:
git checkout main git pull origin main # Get your merged changes git branch -d feature/julie-regression-analysis # Delete old branch
Projects include CLAUDE.md and AGENTS.md (symlink for Codex compatibility) with AI coding principles:
- Write concise research code (not production-ready)
- Use interactive, line-by-line evaluation
- Save outputs to
Output/, notFigures/Tables/ - Edit existing files instead of creating new ones
- Execute from project root
The template includes specialized Claude agents in .claude/agents/:
code-reviewer- Reviews code changes for academic research quality, style consistency, and potential issuesreport-checker- Validates research reports and documentation for completeness and accuracyresults-summarizer- Creates comprehensive summaries of analysis results after outputs have been generated
Usage: Agents are invoked automatically by Claude Code when appropriate, or can be requested explicitly.
The template includes specialized Claude skills in .claude/skills/:
pdf- PDF processing toolkit for extracting text/tables, creating PDFs, merging/splitting documents, and handling fillable formsmistral-pdf-to-markdown- Convert PDFs (including scanned documents) to Markdown using Mistral OCR API with image extractionzotero-paper-reader- Read and analyze academic papers directly from your Zotero library, with automatic PDF-to-Markdown conversionwork-summary- Create factual working journal entries inNotes/WorkingJournal/after completing analysis work
Usage: Skills are automatically available in Claude Code. Example: "Use the zotero-paper-reader skill to read the paper about liquidity from my library"
The template configures Model Context Protocol (MCP) servers in .mcp.json:
- Zotero MCP - Direct integration with your Zotero library for searching papers, retrieving metadata, and downloading PDFs
Configuration:
- Edit
Notes/.envand fill in your API keys: - Customize
.mcp.jsonif needed (all env vars are read fromNotes/.env)
Note: Notes/.env is in the shared folder (not tracked by Git), so your API keys are never committed to version control
The project uses uv for Python environment management, which is installed by the setup script.
Packages installed by default:
- Data analysis: jupyter, pandas, matplotlib, polars, pyarrow
- Claude skills: pypdf, reportlab, pdf2image, pillow, mistralai, python-dotenv, zotero-mcp
Quick uv commands:
uv sync- Install all required dependencies frompyproject.tomluv run <command>- Run any command with project environment (e.g.,uv run python script.py,uv run jupyter notebook,uv run pytest)uv add package- Add a new dependency. Instead of usingpip, this is the more robust way to ensure dependencies are shared across the team.uv remove package- Remove a dependency
The setup script configures uv to place virtual environments in ~/.venvs/MyProject rather than within the project folder. This keeps the project directory clean and ensures consistent environment paths across different machines.
Technical note: The rationale for putting the .venvs folder outside of the project folder is that project folders are often synced via Dropbox across different machines. uv uses hard-link/clone for the Python environment, which will be broken by Dropbox sync, resulting in multiple copies of the same package across different projects (highly space inefficient). Moving it out of Dropbox solves this issue.
The setup script creates a .venv symlink in your project pointing to ~/.venvs/MyProject. VS Code and other tools automatically detect this symlink and use the correct environment—no manual configuration needed.