torch.nn.MultiheadAttention 是否包含归一化层和前馈层？答案

【问题标题】：Does torch.nn.MultiheadAttention contain normalisation layer and feed forward layer?torch.nn.MultiheadAttention 是否包含归一化层和前馈层？
【发布时间】：2022-01-06 22:25:24
【问题描述】：

试图找到多头注意力的源代码，但找不到任何实现细节。我想知道这个模块是否只包含注意力部分而不是整个转换器块（即它不包含归一化层、残差连接和额外的前馈神经网络）？

【问题讨论】：

标签： python pytorch bert-language-model transformer attention-model

【解决方案1】：

根据source code，答案是否定的。 MultiheadAttention 不出所料只实现了注意力功能。

【讨论】：