Weakly-Supervised Localization of Thorax Diseases with ChestX-ray8 dataset

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases ¹

构建了一种新的包含108948主视图的X光图像数据集（ChestX-ray8 ²），用NLP处理放射报告标定labels（8种疾病类型），一张图片可能有多种疾病，也可能没有。(24636 images contain one or more pathologies. remaining 84312 images are normal cases.)
展示了这些常见的胸部疾病可以通过一种unified weakly-supervised multi-label image classification and pathology localization framework被检测出来，甚至在空间上被定位。

For wealy-supervised object localization.

选择AlexNet, GoogleLeNet, VGGNet-16, ResNet-50中的一种网络结构在ImageNet上作为pre-trained models，然后删掉最后的全连接层和分类层，在最后的卷积层之后加上一个transition层，global pooling层，预测层和loss层。
Multi-label Setup:
每个图像的label是一个 $1\times8$ 的向量
$\mathbf{y}=[y_1,\ldots,y_c,\ldots,y_C],y_c\in\{0,1\},C=8$ for each image. 1代表有对应的疾病，0代表正常。
Transition Layer 把用不同模型预训练时，最后一个卷积层输出的feature map大小统一到 $S\times S\times D,S\in\{8,16,32\}$ . D是features的维度。为了之后生成病理位置的heatmap做准备。
Multi-label Classification Loss Layer
因为多标签分类样本分布不均衡，很难学到positive instances (有病变的图像)，所以引入了positive/negative balancing factor $\beta_P,\beta_N$ 来加强对正样本的学习。
Weighted Cross Entropy Loss
$L_{W-CEL}(f(\mathbf{x}),\mathbf{y})=\beta_P\sum\limits_{y_c=1}-\ln(f(x_c))+\beta_N\sum\limits_{y_c=0}-\ln(1-f(x_c))$
$\beta_P=\frac{|P|+|N|}{|P|},\beta_N=\frac{|P|+|N|}{|N|},$ $|P|$ and $|N|$ 是一个batch的图像的labels里1和0的数量。

Global Pooling Layer .
Log-Sum-Exp (LSE) pooling
the pooled value $x_p$ is defined as
$x_p=\frac{1}{r}\cdot\log[\frac{1}{S}\cdot\sum\limits_{(i,j)\in S}e^{(r\cdot x_{ij})}]$
其中， $x_{i,j}$ 是待pooling区域 $S$ 中某位置 $(i,j)$ 处的**值
可以观察到，pooled value的值域是 $S$ 中的最大值（ $r\rightarrow\infty$ ）到平均值（ $r\rightarrow0$ ）。
为了解决overflow和underflow问题，采用如下的LSE pooling公式
$x_p=x^*+\frac{1}{r}\cdot\log\Big[\frac{1}{S}\cdot\sum\limits_{(i,j)\in S}e^{(r\cdot (x_{ij}-x^*))}\Big]$
where $x^*=\max\{|x_{ij}|,(i,j)\in\mathbf{S}\}$
Prediction Layer
如上图所示，把预测层的权重（ $D\times C$ ）提取出来，其中第 $i$ 个向量 $D\times 1$ 表示D个feature map对判断有无疾病 $i$ 的权重。所以当train好预测层之后，在测试时，对一张输入图像，经过DCNN会预测其有某些病，定位这些病理的位置就要用预测层的权重乘上transition layer输出的统一大小的feature maps，就可以得到该输入病理的likelihood map（heatmap）。
Bounding Box Generation
Normalized heatmaps to [0,255]
Threshold heatmap by {60, 180} individually (ad-hoc)
B-Boxes are generated to cover the isolated regions in the resulting binary maps
使用阈值来对Heatmap的每个点进行二值化处理后选取孤立区域绘制B-Box

LSE pooling中r=10时效果最好。
ROC（Receiver Operating Characteristic）评价二值分类器的优势。AUC（Area Under Curve）是ROC曲线下的面积 $AUC\in[0.5,1]$ 。上表分类的metric是AUC。
用ResNet-50在ImageNet上做预训练分类效果最好。
用Weighted Cross Entropy Loss比Cross Entropy Loss效果好。

虽然对病理的定位不需要bounding box的ground truth，但是做病理的classification时，training data是fully annotated做全监督训练。

X, Peng Y, Lu L, et al. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2097-2106. https://arxiv.org/abs/1705.02315Wang ↩︎
Download ChestX-ray8 dataset: https://nihcc.app.box.com/v/ChestXray-NIHCC ↩︎