-
Notifications
You must be signed in to change notification settings - Fork 2
feat(resolver): asset classifier mapping identifiers to specific taxonomy #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
robertoecf
wants to merge
7
commits into
main
Choose a base branch
from
claude/kind-euler-06c857
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
24e77d7
feat(resolver): asset classifier mapping identifiers to Wealthuman ta…
robertoecf d2e8ebc
feat(resolver): add fiscal-certainty axis to debenture/tax classifica…
robertoecf 7aec2ab
fix(resolver): keep debenture=None for Tesouro-backed RF ETFs
robertoecf f67cb82
feat(resolver): structured signals audit trail
robertoecf 70361eb
fix(resolver): address review-bot findings + doc drift
robertoecf c802076
docs(resolver): client-facing contract for resolve_asset (Wealthuman …
robertoecf 63b5b3f
fix(resolver): address remaining CodeRabbit review threads
robertoecf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| # `resolve_asset` — classificador de ativos (taxonomia Wealthuman) | ||
|
|
||
| > Entrega para o demandante (Wealthuman / consolidação de extratos). Define o | ||
| > contrato que o consolidador chama por ativo (dezenas por extrato). Implementado | ||
| > em [`src/findata/resolver/`](../src/findata/resolver/), exposto por REST, MCP e | ||
| > biblioteca Python. | ||
|
|
||
| ## Problema | ||
|
|
||
| A consolidação classifica cada ativo na taxonomia macro do banker. O agente | ||
| antigo buscava ANBIMA/debentures.com.br no brave: lento e errava (chutava RV pelo | ||
| "11" de um ETF de debênture, perdia mandato global sem "IE", confundia "Crédito | ||
| Estruturado" com COE). `resolve_asset` devolve a classificação **determinística, | ||
| cacheável e auditável**, já na taxonomia do cliente. | ||
|
|
||
| ## Como chamar | ||
|
|
||
| Três superfícies, mesmo núcleo: | ||
|
|
||
| | Superfície | Chamada | | ||
| |---|---| | ||
| | REST | `GET /resolver/resolve?ticker=IFRA11&name=FI%20ITAUINFRA` | | ||
| | MCP | tool `resolve_asset` (args `name`/`ticker`/`cnpj`/`isin`) | | ||
| | Python | `await findata.resolver.resolve_asset(ticker="IFRA11")` | | ||
|
|
||
| **Input** — qualquer subconjunto de identificadores; o resolver normaliza e | ||
| promove um identificador "pelado" passado em `name` (o extrato às vezes só tem o | ||
| label): | ||
|
|
||
| ```json | ||
| { "name": "FI ITAUINFRA CI", "ticker": "IFRA11", "cnpj": null, "isin": null } | ||
| ``` | ||
|
|
||
| Sem PII: o resolver recebe **só** identificador de ativo, nunca dado de cliente. | ||
| Limites de tamanho no boundary (`name` 256, `ticker` 16, `cnpj` 32, `isin` 16). | ||
|
|
||
| ## Contrato de saída | ||
|
|
||
| ```jsonc | ||
| { | ||
| "identifier_resolved": { "cnpj": null, "ticker": "IFRA11", "isin": null, "name": "FI ITAUINFRA CI" }, | ||
| "kind": "etf", // fundo|acao|fii|etf|bdr|debenture|cra|cri|cdb|lci_lca|tesouro|coe|outro | ||
| "cvm": { "classe": null, "anbima_categoria": null, "estrutura": "ETF" }, | ||
| "macro_class": "Renda Fixa", // CLASSE DE ATIVO (ver eixo 1 abaixo) | ||
| "subclasse": "Indexada à Inflação", | ||
| "exposure": "Brasil", // GEOGRAFIA (ver eixo 2) — Brasil|Internacional|null | ||
| "underlying_nature": "debentures", // acoes|debentures|credito|recebiveis|imoveis|multiativos|tesouro|cambio|private_equity|outro | ||
| "debenture": { // só quando há debênture | ||
| "incentivada_1243": true, | ||
| "lei_12431_status": "confirmed", // confirmed|candidate|not_applicable|unknown | ||
| "indexador": "IPCA+", | ||
| "vencimento": null | ||
| }, | ||
| "tax": { "isento": true, "isento_status": "confirmed_exempt" }, | ||
| "source": "openfindata", // openfindata|maisretorno|cvm|b3|web_search | ||
| "confidence": 0.97, // 0..1; baixa => human-in-the-loop | ||
| "as_of": "2026-06-29", // carimbado em America/Sao_Paulo | ||
| "cascade": ["openfindata:curated"],// trilha de fontes percorrida | ||
| "signals": [ // trilha estruturada: que regra disparou e com qual evidência | ||
| { "rule": "curated_seed", "evidence": "ticker=IFRA11", "detail": null } | ||
| ], | ||
| "notes": "Curated: ETF de debêntures de infraestrutura (FI-Infra, Lei 12.431)…" | ||
| } | ||
| ``` | ||
|
|
||
| ### Dois eixos ortogonais (decisão de modelo) | ||
|
|
||
| 1. **`macro_class` = classe de ativo**: `Renda Fixa`, `Renda Variável`, | ||
| `Multimercado`, `Alternativos`, `Estruturados` (+ `Indefinido` quando o | ||
| resolver não decide). Geografia **não** é valor de macro. | ||
| 2. **`exposure` = geografia/estratégia**: `Brasil` | `Internacional` | `null`. É | ||
| onde a exposição econômica está, independente da classe. A B3 é o domicílio do | ||
| ativo, não a exposição. Logo: | ||
| - **IVVB11** (ETF de S&P 500 listado na B3) → `RV` + `exposure=Internacional` | ||
| - **BDR** → `RV` + `exposure=Internacional` (risco cambial/exterior) | ||
| - **FIA de mandato global** (ARBOR, WHG) → `RV` + `exposure=Internacional` | ||
|
|
||
| ### Eixo de certeza fiscal | ||
|
|
||
| Os bools `incentivada_1243`/`isento` respondem "sim/não". Os status carregam a | ||
| **certeza** que o bool não carrega: | ||
|
|
||
| - `lei_12431_status`: `confirmed` (sinal explícito de infra / FI-Infra), | ||
| `candidate` (heurística emissor+IPCA, **confirmar por ISIN** antes de tratar | ||
| como isento), `not_applicable` (é debênture, mas não infra), `unknown`. | ||
| - `isento_status`: `confirmed_exempt` (estatutário: CRA/CRI, LCI/LCA, 12.431 | ||
| confirmada), `candidate_exempt` (heurística), `confirmed_taxable`, `unknown`. | ||
|
|
||
| Quando `confidence < ~0.9` ou status `candidate`, é gancho de revisão humana. | ||
|
|
||
| ## Cascata de fontes (fallback) | ||
|
|
||
| 1. **openfindata** (primário, offline): seed curado + regras estruturais. Resolve | ||
| o test set sem rede. | ||
| 2. **Mais Retorno MCP** (dados BR de fundo/CNPJ/classe CVM). | ||
| 3. **outro provider** (CVM dados abertos / B3). | ||
| 4. **web_search restrito** a `maisretorno.com`, `b3.com.br`, | ||
| `yahoofinance.com.br`, `debentures.com.br`. | ||
|
|
||
| Cada degrau preenche o que o anterior não trouxe e **baixa a confidence**; | ||
| `source` reflete a origem final; `cascade` loga o caminho. Os degraus 2 a 4 são | ||
| um ponto de extensão injetável (`AssetProvider`), consultado só quando o | ||
| resultado do núcleo está fraco. No estado atual deste PR, **só o degrau 1 está | ||
| ligado** (os externos são stubs a conectar no deploy). | ||
|
|
||
| ## Test set (passa 100%, offline) | ||
|
|
||
| | Identificador | macro_class | exposure | nota | | ||
| |---|---|---|---| | ||
| | IFRA11 / FI ITAUINFRA | Renda Fixa | Brasil | ETF de debêntures de infra; "Indexada à Inflação"; isento confirmado | | ||
| | ARBOR FIC FIA | Renda Variável | Internacional | mandato global sem "IE" | | ||
| | WHG GLOBAL FIC FIA IE | Renda Variável | Internacional | estrutura IE | | ||
| | DEB PETROBRAS IPCA+ | Renda Fixa | Brasil | debênture; incentivada **candidate** (confirmar ISIN) | | ||
| | COE | Estruturados | (n/a) | `kind=coe`, **nunca** ETF | | ||
| | "Crédito Estruturado" (Warren/AMW) | Renda Fixa | Brasil | name-trap: é crédito, não Estruturados | | ||
| | IVVB11 | Renda Variável | Internacional | ETF de ações S&P 500 | | ||
| | HGLG11 / MXRF11 | Renda Variável | Brasil | subclasse FII | | ||
|
|
||
| ## Não-funcionais | ||
|
|
||
| - **Determinístico + cacheável**: mesmo identificador → mesma classificação | ||
| (exceto `as_of`); CNPJ/ticker mudam de classe raramente, cachear agressivo. | ||
| - **Latência baixa**: núcleo é offline, sem I/O. | ||
| - **Auditável**: sempre `source` + `as_of` + `cascade` + `signals`. | ||
| - **Sem PII**: só identificador de ativo cruza o boundary. | ||
|
|
||
| ## Pendências antes de produção | ||
|
|
||
| - Conectar os providers externos reais (Mais Retorno MCP, web search restrito). | ||
| - Confirmação ISIN-level da incentivada (12.431) via ANBIMA/debentures.com.br no | ||
| degrau de cascata — hoje fica `candidate`. | ||
| - Ampliar o seed curado de ETFs conforme novos ETFs forem listados na B3. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| """Asset-classification resolver routes. | ||
|
|
||
| Wraps :func:`findata.resolver.resolve_asset` over HTTP. The consolidator calls | ||
| this per asset (dozens per statement), so the handler is a thin, cacheable pass | ||
| through the deterministic core. No PII: only an asset identifier crosses the | ||
| boundary. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from fastapi import APIRouter, Query | ||
|
|
||
| from findata.resolver import AssetClassification, resolve_asset | ||
|
|
||
| router = APIRouter(prefix="/resolver", tags=["Resolver"]) | ||
|
|
||
|
|
||
| @router.get("/resolve") | ||
| async def resolve( | ||
| name: str | None = Query( | ||
| None, max_length=256, description="Nome/label do ativo (ex.: 'FI ITAUINFRA CI')" | ||
| ), | ||
| ticker: str | None = Query(None, max_length=16, description="Ticker B3 (ex.: IFRA11, PETR4)"), | ||
| cnpj: str | None = Query(None, max_length=32, description="CNPJ do fundo (com ou sem máscara)"), | ||
| isin: str | None = Query(None, max_length=16, description="ISIN (ex.: BR...)"), | ||
| ) -> AssetClassification: | ||
| """Classifica um ativo na taxonomia macro Wealthuman. | ||
|
|
||
| Aceita qualquer identificador (``name``/``ticker``/``cnpj``/``isin``) e | ||
| devolve ``macro_class`` (classe de ativo: Renda Fixa, Renda Variável, | ||
| Multimercado, Alternativos, Estruturados) + ``exposure`` (eixo ortogonal de | ||
| geografia: Brasil/Internacional) + subclasse, underlying, debênture/Lei | ||
| 12.431, ``source``, ``confidence``, ``signals`` e a cascata percorrida. | ||
| Determinístico e cacheável. | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
| """ | ||
| return await resolve_asset(name=name, ticker=ticker, cnpj=cnpj, isin=isin) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| """Wealthuman asset-classification resolver. | ||
|
|
||
| ``resolve_asset(identifier)`` turns any Brazilian asset identifier (ticker, | ||
| CNPJ, ISIN, or bare name) into a classification mapped to the Wealthuman | ||
| taxonomy: ``macro_class`` is the asset class (Renda Fixa, Renda Variável, | ||
| Multimercado, Alternativos, Estruturados); geography is the orthogonal | ||
| ``exposure`` axis (Brasil/Internacional). Plus subclasse, underlying nature, | ||
| debenture / Lei-12.431 facts (with a certainty status), source, confidence, an | ||
| audit cascade, and structured signals. | ||
|
|
||
| Deterministic, cacheable, auditable, no PII. See ``openfindata-mcp-spec.md``. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from findata.resolver.engine import AssetProvider, classify, resolve_asset | ||
| from findata.resolver.models import ( | ||
| AssetClassification, | ||
| CvmInfo, | ||
| DebentureInfo, | ||
| IdentifierResolved, | ||
| TaxInfo, | ||
| ) | ||
| from findata.resolver.normalize import NormalizedInput, normalize | ||
|
|
||
| __all__ = [ | ||
| "AssetClassification", | ||
| "AssetProvider", | ||
| "CvmInfo", | ||
| "DebentureInfo", | ||
| "IdentifierResolved", | ||
| "NormalizedInput", | ||
| "TaxInfo", | ||
| "classify", | ||
| "normalize", | ||
| "resolve_asset", | ||
| ] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.