论文原文:https://arxiv.org/pdf/2106.06963.pdf
参考:https://blog.csdn.net/qq_45645521/article/details/123493075
先验知识:这些柿子是红色的,一定熟了 后验知识:我刚吃了柿子,已经熟透了
Abstract
Posterior-and-Prior Knowledge Exploring-and-Distilling approach (PPKED)
-
first examine the abnormal regions 检查异常部位
-
include modules:
- Posterior Knowledge Explorer (PoKE) 知识探索器
- explores the posterior knowledge 探索后验知识
- provides **explicit abnormal visual regions ** 提供显式异常视觉区域
- alleviate data bias 缓解视觉数据偏差
- 用疾病词袋探索后验知识,捕捉罕见、多样、重要的异常区域
- Prior Knowledge Explorer (PrKE) 知识探索器
- explores the prior knowledge from the prior medical knowledge (prior medical knowledge PrMK G P r G_{Pr} GPr) and prior radiology (prior working experience PrWE W P r W_{Pr} WPr) 探索既往医学知识图(医学知识)和既往放射学报告(工作经验)
- alleviate data bias 缓解文本数据偏差
- 探索以前的工作经验和医学知识
- Multi-domain Knowledge Distiller (MKD) 多领域知识提取器
- generate the final reports
- 提取提取的知识并生成报告
- adaptive distilling attention (ADA)
- make the model adaptively learn to distill correlate knowledge
- Posterior Knowledge Explorer (PoKE) 知识探索器
Introduction
directly applying image captioning approaches to radiology images has problems:
- visual data deviation - unbalanced visual distribution
- textual data deviation - too much normal discriptions
Related Works
Image Captioning
encoder-decoder framework - translates the image to a single descriptive sentence 单描述性句子
radiology report generation - aims to generate a long paragraph - consists of
- each one focusing on a specific medical observation for a specific region in the radiology image 每个人都关注放射图像中特定区域的特定医学观察
Image Paragraph Generation
- in a natural image pararaph: each sentence has equal importance
- in radiology report: generating should be emphasized more than other normalities 需要更重视异常信息
Radiology Report Generation
explore and distill the posterior and prior knowledge for accurate radiology report generation 探索和提取后验和先验知识,以便准确地生成放射学报告
- for the network structure: of input radiology image by proposing to explicitly extract the abnormal regions 通过提出明确地提取异常区域来探索输入放射学图像的知识
- leverage the retrieved reports and medical knowledge graph to model the working experience and medical knowledge 利用检索到的报告和医学知识图对以前的工作经验和以前的医学知识建模
- retrieve a large amount of similar reports
- treat the retrieved reports as latent guidance 将检索到的报告作为潜在的指引 (use fixed templates to introduce inevitable errors)
Posterior-and-Prior Knowledge Exploring-and-Distilling (PPKED)
- PoKE: explores the knowledge by extracting the explicit abnormal regions 通过提取显式异常区域来探索后验知识
- PrKE: explores the relevant knowledge for the input image 通过提取显式异常区域来探索后验知识
- MKD: distills accurate posterior and prior knowledge and adaptively them to generate accurate reports 提取准确的后验和先验知识,并自适应地合并它们以生成准确的报告
Backgrounds
Problem Formulation
PoKE : { I , T } → I ′ ; PrKE : { I ′ , W Pr } ; { I ′ , G Pr } → G Pr ′ MKD : { I ′ , W Pr ′ , G Pr ′ } → R \text{PoKE}:\{I,T\}\to I'; \\ \text{PrKE}:\{I',W_{\text{Pr}}\};\ \{I',G_{\text{Pr}}\}\to G'_{\text{Pr}} \\ \text{MKD}:\{I',W'_{\text{Pr}},G'_{\text{Pr}}\}\to R PoKE:{ I,T}→I′;PrKE:{ I′,WPr}; { I′,GPr}→GPr′MKD:{ I′,WPr′,GPr′}→R
Information Sources
-
I I I: adopt the ResNet-152 to extract 2048 7$\times 7 i m a g e f e a t u r e m a p s w h i c h a r e f u r t h e r p r o j e c t e d i n t o 5127 7 image feature maps which are further projected into 512 7 7imagefeaturemapswhicharefurtherprojectedinto5127\times$7 , resulting I = { i 1 , i 2 , . . . , i N 1 } ∈ R N 1 × d ( N 1 = 49 , d = 512 ) I=\{i_1,i_2,...,i_{N_1}\}\in \mathbb{R}^{N_1 \times d}(N_1=49,d=512) I={ i1,i2,...,iN1}∈RN1×d(N1=49,d=512)
-
T T T: topic bag (common abnormality )
- T = { t 1 , t 2 , . . . , t N T ∈ R N T × d } T=\{t_1,t_2,...,t_{N_T}\in \mathbb{R}^{N_T \times d}\} T={ t1,t2,...,tNT∈RNT×d}
- t i ∈ R d t_i\in\mathbb{R}^d ti∈Rd: the word embedding of the i t h i^{th} ith topic 主题的词嵌入
-
W Pr W_{\text{Pr}} WPr: the reports of the top- N K N_K NK retrieved images are returned and encoded as the W Pr = { R 1 , R 2 , . . . , R N K } ∈ R N K × d W_{\text{Pr}}=\{R_1,R_2,...,R_{N_K}\}\in\mathbb{R}^{N_K\times d} WPr={ R1,R2,...,RNK}∈RNK×d
- use a followed by a over all output vectors 在所有的输出向量上使用一个BERT编码器后跟一个max-pooling层 as the report embedding module R i ∈ R d R_i\in\mathbb{R}^d Ri∈Rd of the i t h i^{th} ith retrieved report
- 先验工作经验:从ResNet-152的最后一个平均池化层提取image embedding,这个image embedding是针对所有图像的; 然后对于给定一张图片。在语料库中找与输入图像余弦相似度最高的100张图片,将这样检索到的100张图片的报告用BERT和一个最大池化连接层进行编码,以此得到工作经验
-
G Pr G_{\text{Pr}} GPr:
- build a universal graph G Uni = ( V , E ) G_{\text{Uni}}=(V,E) GUni=(V,E): models the domain-specific prior knowledge structure 为特定领域的先验知识结构建模
- compose a graph that covers the most common abnormalities or findings 组成一个图表,涵盖最常见的异常或发现
- connect nodes with bidirectional edges 用双向边连接节点
- nodes V V V: N T N_T NT common topics in T T T
- acquire a set of nodes V ′ = { v 1 ′ , v 2 ′ , . . . , v N T } ∈ R R T × d V'=\{v_1',v_2',...,v_{N_T}\}\in \mathbb{R}^{R_T\times d} V′={
v1′,v2′,...,vNT}∈RRT×d encoded by a graph embedding module 由图形嵌入模块编码
- based on the graph convolution operation 基于图的卷积运算
- 先验医学知识:构建一张医学图。词袋中的主题被设置为节点,根据它们相关的器官和身体部分进行分组;对于分在一起的主题用边连接起来,用图卷积神经网络提取先验医学知识
Basic Module
Multi-Head Attention (MHA)
The MHA consists of n parallel heads and each head is defined as a scaled dot-product attention: Att i ( X , Y ) = softmax ( X W i Q ( Y W i K ) T d n ) Y W i V MHA ( X , Y ) = [ Att 1 ( X , Y ) ; . . . ; Att n ( X , Y ) ] W O \text{Att}_i(X,Y)=\text{softmax}(\frac{X\text{W}_i^\text{Q}(Y\text{W}_i^\text{K})^T}{\sqrt{d_n}})Y\text{W}_i^\text{V} \\ \text{MHA}(X,Y)=[\text{Att}_1(X,Y);...;\text{Att}_n(X,Y)]\text{W}^{\text{O}} Atti(X,Y)=softmax(dn XWiQ(YWiK)T)YWiV 标签: 电容rlx能代替rls连接器poke