Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
e1011bc
fix send message in the chat
BoBer78 Oct 16, 2025
8f7defa
Fix backend (streaming + DB), reasoning and annotations still broken.…
BoBer78 Oct 17, 2025
8f71bf2
fix 60% of frontend issues
BoBer78 Oct 17, 2025
d941a94
small ts fix
BoBer78 Oct 17, 2025
11fada0
update backend types to work with frontend
BoBer78 Oct 20, 2025
fd7001e
fix reasoning
BoBer78 Oct 20, 2025
ca17500
fix metadata streaming
BoBer78 Oct 20, 2025
a47574d
fix HIL
BoBer78 Oct 21, 2025
352577e
working parralel HIL, please help
BoBer78 Oct 21, 2025
d77ee7b
small fix
BoBer78 Oct 21, 2025
b447c75
fix hil sequential tool calls
BoBer78 Oct 22, 2025
011b274
Clean up
BoBer78 Oct 22, 2025
dc63dca
backend linting
BoBer78 Oct 22, 2025
55c1a63
fix FE linting
BoBer78 Oct 23, 2025
43a350d
fix tests backend
BoBer78 Oct 23, 2025
b3fd93a
fix backend test and add some for Agent's Routine
BoBer78 Oct 24, 2025
f1d4d2e
Merge branch 'main' into vercel_v5
BoBer78 Oct 24, 2025
2b8c6b5
working response API agent's routine (small details to fix)
BoBer78 Oct 30, 2025
59f6f7b
merge main
BoBer78 Oct 30, 2025
dff7e73
Merge branch 'main' into vercel_v5
BoBer78 Oct 31, 2025
36ad97e
samll cleanup
BoBer78 Oct 31, 2025
c239372
fix mypy
BoBer78 Oct 31, 2025
244177f
better astream and reasoning encryption added
BoBer78 Nov 4, 2025
4b90186
switch everything to response API
BoBer78 Nov 5, 2025
68a1f8c
merge main
BoBer78 Nov 6, 2025
55a5bfa
merge main
BoBer78 Nov 6, 2025
0c5cb92
fix first tests
BoBer78 Nov 6, 2025
22fe37c
fix test 1
BoBer78 Nov 7, 2025
e89959c
fix last tests
BoBer78 Nov 10, 2025
ad076fe
update deepeval code + run
BoBer78 Nov 10, 2025
5fe487e
re-run deepeval
BoBer78 Nov 10, 2025
4843a83
run all deepeval cases
BoBer78 Nov 10, 2025
632cc69
remove MD5 key for pagination and better stream stopping (content tok…
BoBer78 Nov 10, 2025
fe3ff8d
fix frontend tool selection and possible bug in OpenRouter's reponse …
BoBer78 Nov 10, 2025
5fe3096
remove breakpoint
BoBer78 Nov 10, 2025
1698b73
fix partial messages 1
BoBer78 Nov 10, 2025
3b12f13
fix stop
BoBer78 Nov 11, 2025
016125d
fix HIL
BoBer78 Nov 11, 2025
e7e5131
fix tests
BoBer78 Nov 11, 2025
df9a530
fix tool pre-selection
BoBer78 Nov 11, 2025
22aa7ed
fix weird bug in parse enpoint, Openrouter ??
BoBer78 Nov 11, 2025
1176a6e
fix test
BoBer78 Nov 11, 2025
5c26d98
fix autogen, merge utils funcitons, now pre-selector is working with …
BoBer78 Nov 12, 2025
0460af8
add test for util function
BoBer78 Nov 12, 2025
ab432bc
fix autogen
BoBer78 Nov 12, 2025
7fd0a0f
Merge branch 'main' into vercel_v5
BoBer78 Nov 12, 2025
627a1fc
fix stop
BoBer78 Nov 17, 2025
b7e29d8
fix pagination ?
BoBer78 Nov 17, 2025
b23e37c
merge main
BoBer78 Nov 19, 2025
7184738
fix test
BoBer78 Nov 19, 2025
a2139d7
small fix
BoBer78 Nov 20, 2025
5e769e5
merge main
BoBer78 Nov 20, 2025
a20303a
Merge branch 'main' into vercel_v5
BoBer78 Nov 21, 2025
cbb1622
New DB schema and first working Alembic migration (UP)
BoBer78 Nov 26, 2025
a50a017
temp_commit
BoBer78 Nov 26, 2025
ffba598
temp_commit
BoBer78 Nov 26, 2025
5f5ea0e
almost working
BoBer78 Nov 26, 2025
8f7ec9b
fix script content
BoBer78 Nov 27, 2025
c1c24d2
Fix weird tool arangements
BoBer78 Nov 27, 2025
cf6cd66
merge main
BoBer78 Nov 28, 2025
d370974
change ToolCalls to dict and fix loading messages 1
BoBer78 Dec 1, 2025
6585072
fix load messages
BoBer78 Dec 1, 2025
9c4807e
fix output of tools
BoBer78 Dec 1, 2025
4e6b9ef
add back validated collumn for HIL
BoBer78 Dec 1, 2025
cf569f5
fix the isComplete collumn in the DB when downgrading
BoBer78 Dec 2, 2025
0fda8cc
fix tool selection
BoBer78 Dec 2, 2025
733eb39
Fix obvious mypy
BoBer78 Dec 2, 2025
e0278cd
merge main
BoBer78 Dec 2, 2025
af6745e
partial work on agentsRoutine
BoBer78 Dec 2, 2025
08519ae
Fix super WEIRD old Entity bug + update search
BoBer78 Dec 3, 2025
a2fa088
small fix tool calls
BoBer78 Dec 3, 2025
4844705
fix search and suggestion endpoints
BoBer78 Dec 4, 2025
8595336
Reasoning added + small refactor of parts addition.
BoBer78 Dec 4, 2025
50d88fa
reasoning backwards compatible in migration script
BoBer78 Dec 4, 2025
4b58df4
small fix to downgrade
BoBer78 Dec 4, 2025
b78d852
Migrate to OpenTypes in AgentRoutine
BoBer78 Dec 5, 2025
49c654b
stopping working
BoBer78 Dec 5, 2025
a076468
HIL fixed, frontend still broken
BoBer78 Dec 5, 2025
cb96a44
Remove HIL arg change and fix backend metadata
BoBer78 Dec 5, 2025
7f5a3ac
fix tests
BoBer78 Dec 9, 2025
a5a70cd
add utils tests
BoBer78 Dec 9, 2025
fdfdbb6
fix frontend tests
BoBer78 Dec 9, 2025
075a1e3
fix autogen 2
BoBer78 Dec 9, 2025
60756fa
fix and run deepeval
BoBer78 Dec 9, 2025
0d7044d
switch back to tool_call in the backend to identify tools
BoBer78 Dec 9, 2025
e2c894b
Modify alembic script
BoBer78 Dec 9, 2025
91e50be
fix last id --> tool_call_id migration and cleanup
BoBer78 Dec 9, 2025
4e77d32
fix tests
BoBer78 Dec 9, 2025
4418f90
small fix
BoBer78 Dec 9, 2025
d13a24d
small fix 2
BoBer78 Dec 9, 2025
f6250e7
search fix
BoBer78 Dec 9, 2025
c2a95a3
fix tool selection
BoBer78 Dec 11, 2025
0e2c8a1
is_complete change in migration script
BoBer78 Dec 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- Exa tools no longer MCP.
- Adapt everything to vercel v5.
- Adapt to Response API.


## [v0.11.5] - 6.11.2025

Expand Down Expand Up @@ -104,6 +107,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Try to enforce using metric tools rather than downloading assets.
- Rule to avoid overvalidating.


## [v0.10.0] - 2.10.2025

### Fixed
Expand Down
935 changes: 935 additions & 0 deletions backend/alembic/versions/25cefa8449c6_change_to_response_api.py

Large diffs are not rendered by default.

460 changes: 218 additions & 242 deletions backend/eval/output/detailed.json

Large diffs are not rendered by default.

96 changes: 48 additions & 48 deletions backend/eval/output/scores.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,244 +3,244 @@
"metrics_df": [
{
"test_name": "cerebellum_morphologies",
"Correctness [GEval]": 0.7353869349174808,
"Correctness [GEval]": 0.5378027691066317,
"Tool Correctness": 1.0,
"Argument Correctness": 0.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "cerebellum_morphologies_descendants",
"Correctness [GEval]": 0.7043457159658145,
"Correctness [GEval]": 0.7693448910375766,
"Tool Correctness": 1.0,
"Argument Correctness": 0.5,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "connectivity_metrics",
"Correctness [GEval]": 0.8413822094291581,
"Correctness [GEval]": 0.8282548737644898,
"Tool Correctness": 1.0,
"Argument Correctness": 0.5,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "connectivity_metrics_extra_filters",
"Correctness [GEval]": 0.8070269281059362,
"Correctness [GEval]": 0.8709177678372655,
"Tool Correctness": 1.0,
"Argument Correctness": 0.5,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "cortex_morphologies",
"Correctness [GEval]": 0.4653298006903742,
"Tool Correctness": 0.5,
"Correctness [GEval]": 0.5362164060624173,
"Tool Correctness": 1.0,
"Argument Correctness": 0.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 0.0
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "get_specific_circuit",
"Correctness [GEval]": 0.4548528662574201,
"Tool Correctness": 0.0,
"Correctness [GEval]": 0.8622402310750299,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "hippocampus_morphologies",
"Correctness [GEval]": 0.4451658056800826,
"Correctness [GEval]": 0.28530009631183456,
"Tool Correctness": 1.0,
"Argument Correctness": 0.0,
"Deterministic Argument Correctness": 0.75,
"Overall Argument Correctness": 0.75
"Deterministic Argument Correctness": 0.6666666666666666,
"Overall Argument Correctness": 0.6666666666666666
},
{
"test_name": "ion_channel",
"Correctness [GEval]": 0.7807403647069058,
"Correctness [GEval]": 0.7820330992678872,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.5,
"Overall Argument Correctness": 1.0
},
{
"test_name": "ion_channel_recording",
"Correctness [GEval]": 0.8749718614233014,
"Correctness [GEval]": 0.5556731982805367,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.5,
"Deterministic Argument Correctness": 0.625,
"Overall Argument Correctness": 1.0
},
{
"test_name": "literature_search",
"Correctness [GEval]": 0.648942087564872,
"Correctness [GEval]": 0.9471472414785973,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "me_model_glossary",
"Correctness [GEval]": 0.683306831536382,
"Tool Correctness": 1.0,
"Correctness [GEval]": 0.5537974270369953,
"Tool Correctness": 0.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "morphology_studies",
"Correctness [GEval]": 0.4840374574429811,
"Correctness [GEval]": 0.8922801826803916,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "neuroscientists_search",
"Correctness [GEval]": 0.7001340077257124,
"Correctness [GEval]": 0.7275925796421102,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "off_topic_cooking",
"Correctness [GEval]": 0.5945247609273225,
"Correctness [GEval]": 0.5978033804771575,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "off_topic_programming",
"Correctness [GEval]": 0.7695570160957931,
"Correctness [GEval]": 0.8565314307651558,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "off_topic_sports",
"Correctness [GEval]": 0.8057059705376057,
"Correctness [GEval]": 0.7509455475364661,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "off_topic_weather",
"Correctness [GEval]": 0.6890006117592387,
"Correctness [GEval]": 0.5413464829505792,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "platform_explore",
"Correctness [GEval]": 0.2853000957950641,
"Correctness [GEval]": 0.27294881512648356,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "platform_news",
"Correctness [GEval]": 0.8330807047145509,
"Correctness [GEval]": 0.804995298303768,
"Tool Correctness": 1.0,
"Argument Correctness": 0.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "platform_ui_simulate",
"Correctness [GEval]": 0.5975963651681446,
"Correctness [GEval]": 0.34272411774196826,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "platform_viewing",
"Correctness [GEval]": 0.6731008652407333,
"Correctness [GEval]": 0.72747771401505,
"Tool Correctness": 1.0,
"Argument Correctness": 0.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "plotting",
"Correctness [GEval]": 0.7366374977736997,
"Correctness [GEval]": 0.6127246801610241,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.5,
"Overall Argument Correctness": 1.0
},
{
"test_name": "read_paper",
"Correctness [GEval]": 0.32354476156942,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
"Correctness [GEval]": 0.7122372263400929,
"Tool Correctness": 0.0,
"Argument Correctness": 0.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 0.0
},
{
"test_name": "sin_plot",
"Correctness [GEval]": 0.43660922801944063,
"Correctness [GEval]": 0.7415472928297893,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "software_docs_entitysdk",
"Correctness [GEval]": 0.7795970779907103,
"Correctness [GEval]": 0.6813881453094685,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Deterministic Argument Correctness": 0.2,
"Overall Argument Correctness": 1.0
},
{
"test_name": "software_docs_obione",
"Correctness [GEval]": 0.6725945299889496,
"Correctness [GEval]": 0.8310577292713972,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Deterministic Argument Correctness": 0.2,
"Overall Argument Correctness": 1.0
},
{
"test_name": "species_list",
"Correctness [GEval]": 0.6400358146726484,
"Correctness [GEval]": 0.6566399827580228,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 1.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "thalamus_id",
"Correctness [GEval]": 0.7415867699349027,
"Correctness [GEval]": 0.659614380970777,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.5,
"Overall Argument Correctness": 1.0
},
{
"test_name": "warning_test",
"Correctness [GEval]": 0.618138951957894,
"Correctness [GEval]": 0.7849753604657479,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
},
{
"test_name": "web_search",
"Correctness [GEval]": 0.4164325021025331,
"Correctness [GEval]": 0.7085964582401003,
"Tool Correctness": 1.0,
"Argument Correctness": 1.0,
"Deterministic Argument Correctness": 0.0,
"Overall Argument Correctness": 1.0
}
],
"created_at": "2025-11-28 16:33:30.267246"
}
"created_at": "2025-12-09 11:28:17.260619"
}
Loading