【医学图像处理】用于精确生物医学图像分割的复合 Transformer-锐单电子商城

标题：Multi-compound Transformer for Accurate Biomedical Image Segmentation

作者：Yuanfeng Ji，香港大学；Ping Luo，商汤科技

来源：MICCAI 2021

代码：https://github.com/JiYuanFeng/MCTrans MCTrans-master.zip

主题：Transformer；注意机制；医学图像分割

1. 引言

? 论文想要解决的问题是什么？

由于卷积核函数局部性，传统的基于 CNN 的分割模型缺乏对长期依赖关系建模能力。

?? Due to the local property of the convolutional kernels, the traditional CNN-based segmentation models (e.g. FCN) lack the ability for modeling long-term dependencies.

? 现有的解决上述问题的方法有哪些？

为些解决上述问题的方法已被用于建模强大的关系：

基于空间金字塔的方法

不使用基于空间金字塔的方法卷积核大小相同将来自不同范围的上下文信息聚合到单层。
Zhao, Hengshuang, et al. “Pyramid scene parsing network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

基于 UNet 编码器-解码器网络
- 基于 UNet 的编码器-解码器网络通过应用跳跃连接将粗粒度的深层特征与细粒度的浅层特征相结合。
TransUNet
- Chen, Jieneng, et al. “Transunet: Transformers make label encoders for medical image segmentation.” arXiv preprint arXiv:2102.04306 (2021).
- TransUNet 最高级别采用自注意机制 CNN 在特征上计算全局上下文，确保特定尺度下的各种范围依赖。
- 然而，这种设计仍然不是医学图像分割的最佳设计。首先，它只使用自注意机制进行上下文建模，而忽略了跨尺度的依赖性和一致性。后者通常在大小变化明显的病变的分割中起着关键作用。其次，它没有考虑如何学习不同语义类别之间的相关性以及如何确保同一类别区域特征的一致性。它们都是基于的 CNN 设计分割方案的关键。
  
  ?? However, such a design is still sub-optimal for medical image segmentation for the following reasons. First, it only uses the self-attention mechanism for context modeling on a single scale but ignores the cross-scale dependency and consistency. The latter usually plays a critical role in the segmentation of lesions with dramatic size changes. Second, beyond the context modeling, how to learn the correlation between different semantic categories and how to ensure the feature consistency of the same category region are still not taken into account. But both of them have become critical for CNN-based segmentation scheme design.

? 论文的主要创新点是什么？

本文提出了 Multi-Compound Transformer (MCTrans) 它构建了网络跨度上下文依赖关系，并为准确的生物医学图像分割挖掘语义关系；
引入了 Transformer-Self-Attention (TSA) 模块，通过自注机制实现跨度像素级上下文建模，从而实现不同尺度更全面的特征增强。
开发了 Transformer-Cross-Attention (TCA) 通过引入模块嵌入可学习的代理自动学习不同语义类别的语义对应。然后通过进一步嵌入此代理交叉注意机制与特征表示交互。介绍代理嵌入辅助损失，能有效提高相同类别的特征相关性和不同类别之间的特征可特征。

?? We propose the Multi-Compound Transformer (MCTrans), which incorporates rich context modeling and semantic relationship mining for accurate biomedical image segmentation. MCTrans overcomes the limitations of conventional vision transformers by: (1) introducing the Transformer-Self-Attention (TSA) module to achieve cross-scale pixel-level contextual modeling via the self-attention mechanisms, leading to a more comprehensive feature enhancement for different scales. (2) developing the Transformer-Cross-Attention (TCA) to automatically learn the semantic correspondence of different semantic categories by introducing the proxy embedding. We further use such proxy embedding to interact with the feature representations via the cross-attention mechanism. By introducing auxiliary loss for the updated proxy embedding, we find that it could effectively improve feature correlations of the same category and the feature discriminability between different classes.

如上图所示，本文在经典的 UNet 引入编码器和解码器架构 MCTransformer，它由 Transformer-Self-Attention (TSA) 模块和 Transformer-Cross-Attention (TCA) 模块组成。前者用于对多个特征之间的上下文信息进行编码，从而产生丰富且一致的像素级上下文；后者引入了可学习的嵌入（embedding），为了语义关系建模并进一步增强特征表示。

📋 The former is introduced to encode the contextual information between the multiple features, yielding rich and consistent pixel-level context. And the latter introduces learnable embedding for semantic relationship modeling and further enhances feature representations.

给定图像 I ∈ R H × W I \in \mathbb{R}^{H \times W} I∈RH×W，采用深度 CNN 提取不同尺度的多级特征 { X i ∈ R H 2 i × W 2 i × C i } \left\{X_{i} \in \mathbb{R}^{\frac{H}{2^{i}} \times \frac{W}{2^{i}} \times C_{i}}\right\} { Xi∈R2iH×2iW×Ci}。对于层级 i i i，特征以 P × P P \times P P×P 的大小展开成各个块（patch），其中 P P P 在本文中设置为 1，即第 i i i 个特征图的每个位置都将被视为一个块，得到 L i = H W 2 2 ∗ i × P 2 L_{i}=\frac{H W}{2^{2 * i} \times P^{2}} Li=22∗i×P2HW 个块。接下来，将不同层级中的块输入具有相同输出特征维度 C C C 的线性投影头（linear projections heads）（即 1×1 卷积层），得到嵌入标记（token） T i ∈ R L i × C T_{i} \in \mathbb{R}^{L_{i} \times C} Ti∈RLi×C。然后，我们将 i i i = 2, 3, 4 层级的特征进行拼接，形成整体标记 T ∈ R L × C T \in \mathbb{R}^{L \times C} T∈RL×C，其中 L = ∑ i = 2 4 L i L=\sum_{i=2}^{4} L_{i} L=∑i=24Li。为了弥补丢失的位置信息，位置嵌入（positional embedding） E p o s ∈ R L × C E_{p o s} \in \mathbb{R}^{L \times C} Epos∈RL×C 被添加到标记中，以提供关于特征在序列中的相对或绝对位置的信息，这样标记可以表示为 T = T + E pos T=T+E_{\text {pos }} T=T+Epos 。接下来，我们将标记 T T T 输入 TSA 模块以进行多尺度上下文建模。将输出的增强后的标记进一步输入 TCA 模块并与代理嵌入（proxy embedding） E pro ∈ R M × C E_{\text {pro }} \in \mathbb{R}^{M \times C} Epro ∈RM×C 进行交互，其中 M M M 是数据集的类别数。最后，我们将编码后的标记折叠回金字塔特征图，并以自下而上的方式合并它们，以获得最终的特征图进行预测。

2.1.1 Transformer-Self-attention

将一维嵌入标记 T T T 作为输入，TSA 模块用于学习多尺度特征之间的像素级上下文依赖关系。如上图所示，TSA 模块由 K s K_{s} Ks 层组成，每层由多头自注意力（multi-head self-attention，MSA）和前馈网络（feed forward networks，FFN）组成，在每个块之前应用层归一化（layer normalization，LN) ，在每个块之后应用残差连接（residual connection）。 FFN 包含两个带有 ReLU 激活的线性层。

对于第 i i i 层，多头自注意力的输入是从输入 T l − 1 T^{l-1} Tl−1 计算得到的元组**（query, key, value）**：

query = T l − 1 W Q l , key = T l − 1 W K l , value = T l − 1 W V l \text { query }=T^{l-1} \mathbf{W}_{Q}^{l}, \text { key }=T^{l-1} \mathbf{W}_{K}^{l}, \text { value }=T^{l-1} \mathbf{W}_{V}^{l} query =Tl−1WQl, key =Tl−1WKl, value =Tl−1WVl

其中 W O l ∈ R C × d q \mathbf{W}_{O}^{l} \in \mathbb{R}^{C \times d_{q}} WOl∈RC×dq， W K l ∈ R C × d k \mathbf{W}_{K}^{l} \in \mathbb{R}^{C \times d_{k}} WKl∈RC×dk， W V l ∈ R C × d v \mathbf{W}_{V}^{l} \in \mathbb{R}^{C \times d_{v}} WVl∈RC×dv 是第 i i i 层不同线性投影头的参数矩阵， d q d_{q} dq， d k d_{k} dk， d v d_{v} dv 是三个输入的维度。

**自注意力（SA）**可以表示为：

S A ( T l − 1 ) = T l − 1 + Softmax ⁡ ( T l − 1 W Q l ( T l − 1 W K l ) ⊤ d k ) ( T l − 1 W V l ) \mathrm{SA}\left(T^{l-1}\right)=T^{l-1}+\operatorname{Softmax}\left(\frac{T^{l-1} \mathbf{W}_{Q}^{l}\left(T^{l-1} \mathbf{W}_{K}^{l}\right)^{\top}}{\sqrt{d_{k}}}\right)\left(T^{l-1} \mathbf{W}_{V}^{l}\right) SA(Tl−1)=Tl−1+Softmax(dk Tl−1WQl(Tl−1WKl)⊤)(Tl−1WVl)

**多头自注意力

标签： sub连接器78p

锐单商城拥有海量元器件数据手册、 IC替代型号，打造电子元器件IC百科大全！

资讯详情

【医学图像处理】用于精确生物医学图像分割的复合 Transformer

1. 引言

? 论文想要解决的问题是什么？

? 现有的解决上述问题的方法有哪些？

? 论文的主要创新点是什么？

2. 方法

2.1 MCTrans 网络

2.1.1 Transformer-Self-attention

亚马逊云科技宣布基于自研Amazon Graviton4的Amazon EC2 R8g实例正式可用

【医学图像处理】用于精确生物医学图像分割的复合 Transformer

最近热搜

历史搜索 清除历史记录

历史搜索清除历史记录