Skip to content

Potential bug in UTF-8 validation when the UTF-8 sequence crossing frame boundary #85

@jangko

Description

@jangko

@jlokier pointed out to UTF-8 encoding in websocket spec.

https://datatracker.ietf.org/doc/html/rfc6455#section-5.6

Text
The "Payload data" is text data encoded as UTF-8. Note that a particular text frame might include a partial UTF-8 sequence; however, the whole message MUST contain valid UTF-8. Invalid UTF-8 in reassembled messages is handled as described in Section 8.1.

Autobahn have this kind of test cases, but surprisingly, we pass all of them.
Because we don't handle this case specifically, there is still potential for this bug to manifest itself.

The reason we pass autobahn test is we read using big enough buffer to assemble all arrived UTF-8 frames into complete message and thus satisfy the spec condition. but if we are using smaller buffer, behavior will be different.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions