NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Jan. 20, 2026 Anastasia Gulyaeva
TBD -
Jan. 22, 2026 Mario Flores
AI Pipeline for Characterization of the Tumor Microenvironment -
Jan. 27, 2026 Zhaohui Liang
TBD -
Jan. 29, 2026 Mehdi Bagheri Hamaneh
FastSpel: A simple peptide spectrum predictor that achieves deep learning-level performance at a fraction of the computational cost -
Feb. 3, 2026 Matthew Diller
TBD
RECENT SEMINARS
-
Jan. 8, 2026 Won Gyu Kim
LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery -
Dec. 16, 2025 Sarvesh Soni
ArchEHR-QA: A Dataset and Shared Task for Grounded Question Answering from Electronic Health Records -
Dec. 2, 2025 Qingqing Zhu
CT-Bench & CARE-CT: Building Reliable Multimodal AI for Lesion Analysis in Computed Tomography -
Nov. 25, 2025 Jing Wang
MIMIC-EXT-TE: Millions Clinical Temporal Event Time-Series Dataset -
Oct. 21, 2025 Yifan Yang
TBD
Scheduled Seminars on June 28, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
The first SARS-CoV-2 genome sequence was published in January 2020, less than a month after virus isolation. Since then, more than five million high quality genomes were deposited in GenBank and GISAID. Genome analysis became the key tool through COVID-19 pandemics in search for new variants of concern and investigation of virus evolution. In order to obtain SARS-CoV-2 genomic sequence, the total RNA is extracted from the specimen and then amplified leading to the production of up to 100 overlapping short amplicons that are used to reconstruct the genomic sequence. This approach was successfully applied before to monitor influenza and other viruses. However, while most viruses have mechanisms for internal translation initiation, coronaviruses employ a unique mechanism of transcription that yields subgenomic (sg) RNAs of different lengths. While ORF1ab is translated from the genomic RNA, all other genes are translated from their own sgRNAs. Each sgRNA contains a leader sequence and starts at the beginning of one of the accessory genes and ends at 3’-end of the genome. While SARS-CoV-2 genomes are packed into capsids and are inherited, sgRNAs are, to the best of current knowledge, not transferred from host to host. Moreover, sgRNAs have been shown to be sometimes thousand times more abundant than the genomic RNA during infection, which means that during sequencing, the 3’-terminal portion of the genome is mostly represented by sgRNAs. However, although a mix of genomic RNA and sgRNA is routinely sequenced, the contribution of sgRNAs to the observed genomic variants (if any) has not been investigated.
By analyzing the available public data, we show that presence of sgRNAs affects low frequency variants observed in patients. Furthermore, allele frequency in sgRNA can differ from that in the genome. We also found examples of high frequency variants that make it into the genome consensus that are only present in sgRNAs, and not in the genome. Taken together, these findings show that sgRNAs affect variant calling for SARS-CoV-2 genome sequences and imply that viral RNA from only small number of infected cells is typically sequenced.