NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
April 8, 2025 Jaya Srivastava
Leveraging a deep learning model to assess the impact of regulatory variants on traits and diseases -
April 15, 2025 Pascal Mutz
TBD -
April 18, 2025 Valentina Boeva, Department of Computer Science, ETH Zurich
Decoding tumor heterogeneity: computational methods for scRNA-seq and spatial omics -
April 22, 2025 Stanley Liang
TBD -
April 29, 2025 MG Hirsch
TBD
RECENT SEMINARS
-
April 1, 2025 Roman Kogay
Horizontal transfer of bacterial operons into eukaryote genomes -
March 25, 2025 Yifan Yang
Adversarial Manipulation and Data Memorization in Large Language Models for Medicine -
March 11, 2025 Sofya Garushyants
Tmn – bacterial anti-phage defense system -
March 4, 2025 Sanasar Babajanyan
Evolution of antivirus defense in prokaryotes depending on the environmental virus load -
Feb. 25, 2025 Zhizheng Wang
GeneAgent: Self-verification Language Agent for Gene Set Analysis using Domain Databases
Scheduled Seminars on June 28, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
The first SARS-CoV-2 genome sequence was published in January 2020, less than a month after virus isolation. Since then, more than five million high quality genomes were deposited in GenBank and GISAID. Genome analysis became the key tool through COVID-19 pandemics in search for new variants of concern and investigation of virus evolution. In order to obtain SARS-CoV-2 genomic sequence, the total RNA is extracted from the specimen and then amplified leading to the production of up to 100 overlapping short amplicons that are used to reconstruct the genomic sequence. This approach was successfully applied before to monitor influenza and other viruses. However, while most viruses have mechanisms for internal translation initiation, coronaviruses employ a unique mechanism of transcription that yields subgenomic (sg) RNAs of different lengths. While ORF1ab is translated from the genomic RNA, all other genes are translated from their own sgRNAs. Each sgRNA contains a leader sequence and starts at the beginning of one of the accessory genes and ends at 3’-end of the genome. While SARS-CoV-2 genomes are packed into capsids and are inherited, sgRNAs are, to the best of current knowledge, not transferred from host to host. Moreover, sgRNAs have been shown to be sometimes thousand times more abundant than the genomic RNA during infection, which means that during sequencing, the 3’-terminal portion of the genome is mostly represented by sgRNAs. However, although a mix of genomic RNA and sgRNA is routinely sequenced, the contribution of sgRNAs to the observed genomic variants (if any) has not been investigated.
By analyzing the available public data, we show that presence of sgRNAs affects low frequency variants observed in patients. Furthermore, allele frequency in sgRNA can differ from that in the genome. We also found examples of high frequency variants that make it into the genome consensus that are only present in sgRNAs, and not in the genome. Taken together, these findings show that sgRNAs affect variant calling for SARS-CoV-2 genome sequences and imply that viral RNA from only small number of infected cells is typically sequenced.