Skip to content

TokenTextSplitter.split() splits small text into multiple chunks if there is no ./?/!/\n at the end. #4981

@ilyakastsenevich

Description

@ilyakastsenevich

spring ai version: 1.1.0

Steps to reproduce

    TokenTextSplitter splitter = TokenTextSplitter.builder()
        .withKeepSeparator(true)
        .withChunkSize(10000)
        .withMinChunkSizeChars(10)
        .build();

    Document testDoc = new Document("Hi. This is a small text without one of the ending chars. It is splitted into multiple chunks but shouldn't");
    List<Document> splitted = splitter.split(testDoc);
    for (Document doc : splitted) {
      System.out.println("content: " + doc.getText());
    }

console output:
content: Hi. This is a small text without one of the ending chars.
content: It is splitted into multiple chunks but shouldn't

Expected behavior
only 1 chunk returned (no split because token count is small)

console output:
content: Hi. This is a small text without one of the ending chars. It is splitted into multiple chunks but shouldn't

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions