Photo Gallery
gallery
Textual Entailment based Figure Summarization for Biomedical Articles

Authors: Naveen Saini, Sriparna Saha, Pushpak Bhattacharyya, Himanshu Tuteja

Abstract: The current paper proposes a novel unsupervised approach (FigSum++) for automatic figure summarization in biomedical scientificarticles using a multi-objective evolutionary algorithm. The problem is treated as a binary optimization problem where relevantsentences in the summary for a given figure are selected based on various sentence scoring features (or objective functions): thetextual entailment score between sentences in the summary and figure’s caption, the number of sentences referring to figure, semanticsimilarity between sentences and figure’s caption, the number of overlapping words between sentences and figure’s caption etc.These features are optimized simultaneously using multi-objective binary differential evolution (MBDE). MBDE consists of a set ofsolutions and each solution represents a subset of sentences to be selected in the summary. MBDE generally uses single DE variant,but, here, ensemble of two different DE variants measuring diversity among solutions and convergence towards global optimalsolution, respectively, is employed for efficient search. Usually, in any summarization system, diversity amongst sentences (called asanti-redundancy) in the summary is a very critical feature and it is calculated in terms of similarity (like cosine similarity) amongsentences. In this paper, a new way of measuring diversity in terms of textual entailment is proposed. To represent the sentences of thearticle in the form of numeric vectors, recently proposed, BioBERT, a pre-trained language model in biomedical text mining is utilized.An ablation study has also been presented to determine the importance of different objective functions. For evaluation of the proposedtechnique, two benchmark biomedical datasets containing91and84figures, respectively, are considered. Our proposed systemobtains5%and11%improvements in terms of F-measure metric over two datasets, respectively, in comparison to the state-of-the-artunsupervised methods.

Publishing Date: August, 2019

Published in: ACM Transactions on Multimedia Computing Communications