no positional information for the first self attention block

Hi, thanks for your hard work. I read the paper and if I understand correctly, the first transformer block doesn't have any positional information. would this cause any issues for passing on information to the rest of the blocks, since the self attention modules always come the some positional information? have you tried to use any other relative positional encoding methods to fill in the gap for the first block?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

no positional information for the first self attention block #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

no positional information for the first self attention block #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions