NLM DIR Seminar Schedule

Seminars Home

Schedule Seminar

UPCOMING SEMINARS

Sept. 9, 2025 Chih-Hsuan Wei
No Data Left Behind: FAIR-SMart Enables FAIR Access to Supplementary Materials for Research Transparency
Sept. 16, 2025 James Leaman JR.
TBD
Sept. 23, 2025 Martha Nelson
TBD
Sept. 30, 2025 Erez Persi
TBD
Oct. 7, 2025 Liana Yeganova
TBD

RECENT SEMINARS

July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus
July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression

Scheduled Seminars on Nov. 2, 2021

Speaker

Won Kim

PI/Lab

Time

11 a.m.

Presentation Title

Improving PubMed retrieval by integrating abstract and full text search.

Location

Building 38A - B2 NCBI Library

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

With the growing availability of full-text articles, integrating abstracts and full texts of documents into a unified representation becomes essential for performing comprehensive biomedical literature search. However, previous studies have shown that naïvely merging abstracts with full texts of articles does not consistently yield better performance.

In this work we study how to combine abstracts with available article full texts to improve the overall retrieval performance. For this purpose, we create an evaluation dataset and define a simulated environment that mimics the search environment of interest. The evaluation dataset consists of queries sampled from PubMed logs along with a subset of retrieved and clicked documents satisfying the requirement of the simulated search environment for each query. To remove PubMed users bias, we only consider documents for which none of the query tokens appear in the title. Another known source of biased clicks are clicks on the top rank. We removed these as they might simply represent a users urge to click on something indiscriminately. Summing up, to create our user click dataset, we collected only retrieved documents for which none of the query tokens appeared in the title and all of them appeared in the abstract, and we ignored clicks on the top tank. Using the dataset, we demonstrate that different sections of a full text document are of different value in deciding relevance and propose a method to combine information coming from various parts of a full text document by converting the information to log odds scores which can be treated uniformly.

Our experimental results suggest that although all query tokens appear in the abstract, incorporation of the body text of PMC improves the PubMed search over that of abstract only. Moreover, if one or more tokens are missing from the abstract our proposed technique can lead to better retrieval results by scoring the body text of the document. Considering that about 15% of PubMed queries return no results due to one or more query tokens not being present in the abstract or title, the proposed approach can provide useful retrieval results for a significant number of those failed queries by supplementing abstracts with full text information. Our approach may also be useful for queries which retrieve few documents; for these queries abstract-only retrieval can be augmented by full text search results.

NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Nov. 2, 2021

Abstract:

ARCHIVES