Variational Topic Inference for Chest X-Ray Report Generation

Image credit: Unsplash


Automating report generation for medical imaging promises to alleviate workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different local regions in the image to generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference model can generate reports which are not mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.

In International Conference on Medical Image Computing and Computer Assisted Intervention 2021
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Supplementary notes can be added here, including code and math.