Skip to content

no positional information for the first self attention block #6

@congwang093

Description

@congwang093

Hi, thanks for your hard work. I read the paper and if I understand correctly, the first transformer block doesn't have any positional information. would this cause any issues for passing on information to the rest of the blocks, since the self attention modules always come the some positional information? have you tried to use any other relative positional encoding methods to fill in the gap for the first block?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions