Is my problem solvable with ExtractThinker? #355
Unanswered
Vitor-Lucas
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I'm working on a project that extracts information from Brazilian normative documents, such as laws and manuals, and creates a RAG system that retrieves their information for an LLM to answer legal questions.
I want the information that gets extracted to be updated when needed, such as law articles that may stop being effective.
The way I have done this until now is by trying to make a REGEX string that extracts the information I'm looking for. For instance, in legal documents, I take each of the articles and place their citation, along with contextual information, like which chapter it's from, etc., into a JSON object, such as the one below:
{ "id": "c870c45a-399d-44f6-a5f5-230ebd040ed1", "text": "Art. 8º Cabe à ANAC adotar as medidas ...", "category": "Powers and Duties", "metadata": { "type_document": "Law", "law_number": "11182", "article": "Art. 8", "publication_date": "27-09-2005", "access_date": "10-09-2025", "effective_date_start": "", "effective_date_end": "", "context": { "title": "", "chapter": "Chapter I", "section": "" }, "contains_table": false, "repeals_article": true } }But the problem I'm currently facing is that there are multiple ways to change a law's effective date range, including in other legal documents/files, making it almost impossible to detect with REGEX. So, with this new issue, I'm turning towards using SLMs or LLMs to try to interpret these changes and provide me with the exact range of dates this article has been effective.
While searching for alternatives, I came across this project, and I was really surprised to see how you separate the documents to fit the model's token limitations.
With that being said, I wanted to know if your project could help me solve any of these problems:
Beta Was this translation helpful? Give feedback.
All reactions