Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,29 @@ adheres to [Semantic Versioning](https://semver.org/).

### Added

- **Asset-classification resolver** — `findata.resolver.resolve_asset()`,
`GET /resolver/resolve`, and the `resolve_asset` MCP tool. Turns any
Brazilian asset identifier (ticker/CNPJ/ISIN/name) into a classification
mapped to the consolidation taxonomy. `macro_class` is the asset class
(Renda Fixa, Renda Variável, Multimercado, Alternativos, Estruturados);
geography is the orthogonal `exposure` axis (Brasil/Internacional), so
IVVB11 and BDRs are RV + Internacional and a global-mandate FIA is RV +
Internacional. Also returns `subclasse`, `underlying_nature`, debenture
Lei-12.431 facts with a **certainty status** (`lei_12431_status`:
confirmed/candidate/not_applicable; `isento_status`), `source`,
`confidence`, the `cascade` walked, and a structured `signals` trail
(which rule fired on what evidence). Deterministic and offline at its core
(a curated ETF/global-fund seed + structural rules), with an injectable
external-provider chain (Mais Retorno / CVM-B3 / restricted web search)
for low-confidence fallback. Classifies ETFs/funds by underlying (IFRA11
debêntures → RF; IVVB11 ações → RV), defends the COE-never-ETF and "Crédito
Estruturado" name-traps, and keeps a heuristic Lei-12.431 isento as a
`candidate` below the cascade short-circuit so a provider can confirm it by
ISIN. Hardened after cross-host adversarial review and CI review bots:
bare-token collisions (`IE`/`LC`/`LF`/`MACRO`/`ACOES`/`PARTICIPACOES`,
substring `LCI`) gated on fund context, public-bond subclasse derived from
the bond code (NTN-B → inflation), API length caps, `as_of` stamped in
America/Sao_Paulo.
- **ANBIMA Títulos Públicos (TPF) secondary market** — `get_tpf()`,
`GET /anbima/tpf`, and `findata anbima tpf`. Daily reference rates for
outstanding federal government bonds (LTN, LFT, NTN-B, NTN-C, NTN-F) from
Expand Down
11 changes: 6 additions & 5 deletions docs/MCP_SURFACE.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,15 @@ safe. **The 95 REST routes that back the CLI and HTTP consumers never change.**

| | 1:1 (old) | curated (new) |
|---|---:|---:|
| MCP tools | 95 | **24** (25 with code mode) |
| `tools/list` size | ~85k chars (~21k tok) | **~29k chars (~7k tok)** |
| REST operations | 95 | **95 (unchanged)** |
| MCP tools | 95 | **25** (26 with code mode) |
| `tools/list` size | ~85k chars (~21k tok) | **~30k chars (~7k tok)** |
| REST operations | 95 | **96** |

## The 24 curated tools
## The 25 curated tools

```
```text
registry_lookup ← start here: CNPJ / ticker / code / name → entities
resolve_asset ← classify an asset: macro asset class + exposure

bcb_series bcb_ptax bcb_focus (BCB: 12 → 3)
cvm_company cvm_financials cvm_fund cvm_structured_fund (CVM: 22 → 4)
Expand Down
132 changes: 132 additions & 0 deletions docs/RESOLVER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# `resolve_asset` — classificador de ativos (taxonomia Wealthuman)

> Entrega para o demandante (Wealthuman / consolidação de extratos). Define o
> contrato que o consolidador chama por ativo (dezenas por extrato). Implementado
> em [`src/findata/resolver/`](../src/findata/resolver/), exposto por REST, MCP e
> biblioteca Python.

## Problema

A consolidação classifica cada ativo na taxonomia macro do banker. O agente
antigo buscava ANBIMA/debentures.com.br no brave: lento e errava (chutava RV pelo
"11" de um ETF de debênture, perdia mandato global sem "IE", confundia "Crédito
Estruturado" com COE). `resolve_asset` devolve a classificação **determinística,
cacheável e auditável**, já na taxonomia do cliente.

## Como chamar

Três superfícies, mesmo núcleo:

| Superfície | Chamada |
|---|---|
| REST | `GET /resolver/resolve?ticker=IFRA11&name=FI%20ITAUINFRA` |
| MCP | tool `resolve_asset` (args `name`/`ticker`/`cnpj`/`isin`) |
| Python | `await findata.resolver.resolve_asset(ticker="IFRA11")` |

**Input** — qualquer subconjunto de identificadores; o resolver normaliza e
promove um identificador "pelado" passado em `name` (o extrato às vezes só tem o
label):

```json
{ "name": "FI ITAUINFRA CI", "ticker": "IFRA11", "cnpj": null, "isin": null }
```

Sem PII: o resolver recebe **só** identificador de ativo, nunca dado de cliente.
Limites de tamanho no boundary (`name` 256, `ticker` 16, `cnpj` 32, `isin` 16).

## Contrato de saída

```jsonc
{
"identifier_resolved": { "cnpj": null, "ticker": "IFRA11", "isin": null, "name": "FI ITAUINFRA CI" },
"kind": "etf", // fundo|acao|fii|etf|bdr|debenture|cra|cri|cdb|lci_lca|tesouro|coe|outro
"cvm": { "classe": null, "anbima_categoria": null, "estrutura": "ETF" },
"macro_class": "Renda Fixa", // CLASSE DE ATIVO (ver eixo 1 abaixo)
"subclasse": "Indexada à Inflação",
"exposure": "Brasil", // GEOGRAFIA (ver eixo 2) — Brasil|Internacional|null
"underlying_nature": "debentures", // acoes|debentures|credito|recebiveis|imoveis|multiativos|tesouro|cambio|private_equity|outro
"debenture": { // só quando há debênture
"incentivada_1243": true,
"lei_12431_status": "confirmed", // confirmed|candidate|not_applicable|unknown
"indexador": "IPCA+",
"vencimento": null
},
"tax": { "isento": true, "isento_status": "confirmed_exempt" },
"source": "openfindata", // openfindata|maisretorno|cvm|b3|web_search
"confidence": 0.97, // 0..1; baixa => human-in-the-loop
"as_of": "2026-06-29", // carimbado em America/Sao_Paulo
"cascade": ["openfindata:curated"],// trilha de fontes percorrida
"signals": [ // trilha estruturada: que regra disparou e com qual evidência
{ "rule": "curated_seed", "evidence": "ticker=IFRA11", "detail": null }
],
"notes": "Curated: ETF de debêntures de infraestrutura (FI-Infra, Lei 12.431)…"
}
```

### Dois eixos ortogonais (decisão de modelo)

1. **`macro_class` = classe de ativo**: `Renda Fixa`, `Renda Variável`,
`Multimercado`, `Alternativos`, `Estruturados` (+ `Indefinido` quando o
resolver não decide). Geografia **não** é valor de macro.
2. **`exposure` = geografia/estratégia**: `Brasil` | `Internacional` | `null`. É
onde a exposição econômica está, independente da classe. A B3 é o domicílio do
ativo, não a exposição. Logo:
- **IVVB11** (ETF de S&P 500 listado na B3) → `RV` + `exposure=Internacional`
- **BDR** → `RV` + `exposure=Internacional` (risco cambial/exterior)
- **FIA de mandato global** (ARBOR, WHG) → `RV` + `exposure=Internacional`

### Eixo de certeza fiscal

Os bools `incentivada_1243`/`isento` respondem "sim/não". Os status carregam a
**certeza** que o bool não carrega:

- `lei_12431_status`: `confirmed` (sinal explícito de infra / FI-Infra),
`candidate` (heurística emissor+IPCA, **confirmar por ISIN** antes de tratar
como isento), `not_applicable` (é debênture, mas não infra), `unknown`.
- `isento_status`: `confirmed_exempt` (estatutário: CRA/CRI, LCI/LCA, 12.431
confirmada), `candidate_exempt` (heurística), `confirmed_taxable`, `unknown`.

Quando `confidence < ~0.9` ou status `candidate`, é gancho de revisão humana.

## Cascata de fontes (fallback)

1. **openfindata** (primário, offline): seed curado + regras estruturais. Resolve
o test set sem rede.
2. **Mais Retorno MCP** (dados BR de fundo/CNPJ/classe CVM).
3. **outro provider** (CVM dados abertos / B3).
4. **web_search restrito** a `maisretorno.com`, `b3.com.br`,
`yahoofinance.com.br`, `debentures.com.br`.

Cada degrau preenche o que o anterior não trouxe e **baixa a confidence**;
`source` reflete a origem final; `cascade` loga o caminho. Os degraus 2 a 4 são
um ponto de extensão injetável (`AssetProvider`), consultado só quando o
resultado do núcleo está fraco. No estado atual deste PR, **só o degrau 1 está
ligado** (os externos são stubs a conectar no deploy).

## Test set (passa 100%, offline)

| Identificador | macro_class | exposure | nota |
|---|---|---|---|
| IFRA11 / FI ITAUINFRA | Renda Fixa | Brasil | ETF de debêntures de infra; "Indexada à Inflação"; isento confirmado |
| ARBOR FIC FIA | Renda Variável | Internacional | mandato global sem "IE" |
| WHG GLOBAL FIC FIA IE | Renda Variável | Internacional | estrutura IE |
| DEB PETROBRAS IPCA+ | Renda Fixa | Brasil | debênture; incentivada **candidate** (confirmar ISIN) |
| COE | Estruturados | (n/a) | `kind=coe`, **nunca** ETF |
| "Crédito Estruturado" (Warren/AMW) | Renda Fixa | Brasil | name-trap: é crédito, não Estruturados |
| IVVB11 | Renda Variável | Internacional | ETF de ações S&P 500 |
| HGLG11 / MXRF11 | Renda Variável | Brasil | subclasse FII |

## Não-funcionais

- **Determinístico + cacheável**: mesmo identificador → mesma classificação
(exceto `as_of`); CNPJ/ticker mudam de classe raramente, cachear agressivo.
- **Latência baixa**: núcleo é offline, sem I/O.
- **Auditável**: sempre `source` + `as_of` + `cascade` + `signals`.
- **Sem PII**: só identificador de ativo cruza o boundary.

## Pendências antes de produção

- Conectar os providers externos reais (Mais Retorno MCP, web search restrito).
- Confirmação ISIN-level da incentivada (12.431) via ANBIMA/debentures.com.br no
degrau de cascata — hoje fica `candidate`.
- Ampliar o seed curado de ETFs conforme novos ETFs forem listados na B3.
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,9 @@ max-statements = 50
# Curated MCP layer: FastAPI Query() defaults (B008), wide consolidated tools
# (PLR0913), and intentional flat dataset-dispatch switches (C901/PLR0912/PLR0911).
"src/findata/api/mcp_app.py" = ["B008", "PLR0913", "C901", "PLR0912", "PLR0911"]
# Resolver engine: the classification cascade is an intentional flat
# rule-by-rule switch (one branch per instrument shape) — auditable by design.
"src/findata/resolver/engine.py" = ["C901", "PLR0912", "PLR0911"]
# CLI commands are naturally wide (many typer.Option flags).
"src/findata/cli.py" = ["PLR0913"]
# Banner uses rich + sys.stdout directly — not a print-statement debug.
Expand Down
2 changes: 2 additions & 0 deletions src/findata/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
openfinance,
receita,
registry,
resolver,
susep,
tesouro,
yahoo,
Expand Down Expand Up @@ -139,6 +140,7 @@ async def _value_error_handler(_: Request, exc: ValueError) -> JSONResponse:
app.include_router(aneel.router)
app.include_router(susep.router)
app.include_router(registry.router)
app.include_router(resolver.router)
app.include_router(yahoo.router)


Expand Down
32 changes: 32 additions & 0 deletions src/findata/api/mcp_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@

from findata.api._b3_common import MAX_TICKERS, resolve_quotes
from findata.registry import lookup
from findata.resolver import resolve_asset
from findata.sources.anbima import indices as anbima_src
from findata.sources.aneel import leiloes
from findata.sources.b3 import cotahist, indices
Expand Down Expand Up @@ -97,6 +98,37 @@ async def registry_lookup(
return await lookup(q, limit=limit)


@router.get(
"/resolver/resolve",
operation_id="resolve_asset",
response_model=None,
summary="Classify a Brazilian asset: asset class + Brasil/Internacional exposure",
)
async def resolve_asset_tool(
name: str | None = Query(
None, max_length=256, description="Asset name/label, e.g. 'FI ITAUINFRA CI'"
),
ticker: str | None = Query(None, max_length=16, description="B3 ticker, e.g. IFRA11, PETR4"),
cnpj: str | None = Query(None, max_length=32, description="Fund CNPJ (masked or not)"),
isin: str | None = Query(None, max_length=16, description="ISIN, e.g. BR..."),
) -> Any:
"""Turn any asset identifier into a classification mapped to the
consolidation taxonomy. ``macro_class`` is the asset class only — Renda Fixa,
Renda Variável, Multimercado, Alternativos, Estruturados; geography is the
separate ``exposure`` axis (Brasil/Internacional), so e.g. IVVB11 is RV +
Internacional.

Returns ``macro_class`` + ``exposure`` + ``subclasse`` + ``underlying_nature``
(splits ETF-de-ações from ETF-de-debêntures), debenture/Lei-12.431 facts (with
a confirmed/candidate certainty status), ``source``, ``confidence``, the
``cascade`` walked, and structured ``signals`` (which rule fired on what
evidence) — deterministic and cacheable. Pass any subset of identifiers; a
bare ticker/CNPJ given as ``name`` is auto-detected. Use this (not
``registry_lookup``) when you need the asset's class, not its registry entity.
"""
return await resolve_asset(name=name, ticker=ticker, cnpj=cnpj, isin=isin)
Comment thread
coderabbitai[bot] marked this conversation as resolved.


# ── BCB: Banco Central ────────────────────────────────────────────


Expand Down
36 changes: 36 additions & 0 deletions src/findata/api/routers/resolver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""Asset-classification resolver routes.

Wraps :func:`findata.resolver.resolve_asset` over HTTP. The consolidator calls
this per asset (dozens per statement), so the handler is a thin, cacheable pass
through the deterministic core. No PII: only an asset identifier crosses the
boundary.
"""

from __future__ import annotations

from fastapi import APIRouter, Query

from findata.resolver import AssetClassification, resolve_asset

router = APIRouter(prefix="/resolver", tags=["Resolver"])


@router.get("/resolve")
async def resolve(
name: str | None = Query(
None, max_length=256, description="Nome/label do ativo (ex.: 'FI ITAUINFRA CI')"
),
ticker: str | None = Query(None, max_length=16, description="Ticker B3 (ex.: IFRA11, PETR4)"),
cnpj: str | None = Query(None, max_length=32, description="CNPJ do fundo (com ou sem máscara)"),
isin: str | None = Query(None, max_length=16, description="ISIN (ex.: BR...)"),
) -> AssetClassification:
"""Classifica um ativo na taxonomia macro Wealthuman.

Aceita qualquer identificador (``name``/``ticker``/``cnpj``/``isin``) e
devolve ``macro_class`` (classe de ativo: Renda Fixa, Renda Variável,
Multimercado, Alternativos, Estruturados) + ``exposure`` (eixo ortogonal de
geografia: Brasil/Internacional) + subclasse, underlying, debênture/Lei
12.431, ``source``, ``confidence``, ``signals`` e a cascata percorrida.
Determinístico e cacheável.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
"""
return await resolve_asset(name=name, ticker=ticker, cnpj=cnpj, isin=isin)
37 changes: 37 additions & 0 deletions src/findata/resolver/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""Wealthuman asset-classification resolver.

``resolve_asset(identifier)`` turns any Brazilian asset identifier (ticker,
CNPJ, ISIN, or bare name) into a classification mapped to the Wealthuman
taxonomy: ``macro_class`` is the asset class (Renda Fixa, Renda Variável,
Multimercado, Alternativos, Estruturados); geography is the orthogonal
``exposure`` axis (Brasil/Internacional). Plus subclasse, underlying nature,
debenture / Lei-12.431 facts (with a certainty status), source, confidence, an
audit cascade, and structured signals.

Deterministic, cacheable, auditable, no PII. See ``openfindata-mcp-spec.md``.
"""

from __future__ import annotations

from findata.resolver.engine import AssetProvider, classify, resolve_asset
from findata.resolver.models import (
AssetClassification,
CvmInfo,
DebentureInfo,
IdentifierResolved,
TaxInfo,
)
from findata.resolver.normalize import NormalizedInput, normalize

__all__ = [
"AssetClassification",
"AssetProvider",
"CvmInfo",
"DebentureInfo",
"IdentifierResolved",
"NormalizedInput",
"TaxInfo",
"classify",
"normalize",
"resolve_asset",
]
Loading
Loading