Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted.
The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov.
Updates regarding government operating status and resumption of normal operations can be found at opm.gov.
NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Nov. 13, 2025 Leslie Ronish
TBD -
Nov. 18, 2025 Ryan Bell
TBD -
Nov. 24, 2025 Mario Flores
AI Pipeline for Characterization of the Tumor Microenvironment -
Nov. 25, 2025 Jing Wang
TBD -
Dec. 2, 2025 Qingqing Zhu
TBD
RECENT SEMINARS
Scheduled Seminars on June 28, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
The first SARS-CoV-2 genome sequence was published in January 2020, less than a month after virus isolation. Since then, more than five million high quality genomes were deposited in GenBank and GISAID. Genome analysis became the key tool through COVID-19 pandemics in search for new variants of concern and investigation of virus evolution. In order to obtain SARS-CoV-2 genomic sequence, the total RNA is extracted from the specimen and then amplified leading to the production of up to 100 overlapping short amplicons that are used to reconstruct the genomic sequence. This approach was successfully applied before to monitor influenza and other viruses. However, while most viruses have mechanisms for internal translation initiation, coronaviruses employ a unique mechanism of transcription that yields subgenomic (sg) RNAs of different lengths. While ORF1ab is translated from the genomic RNA, all other genes are translated from their own sgRNAs. Each sgRNA contains a leader sequence and starts at the beginning of one of the accessory genes and ends at 3’-end of the genome. While SARS-CoV-2 genomes are packed into capsids and are inherited, sgRNAs are, to the best of current knowledge, not transferred from host to host. Moreover, sgRNAs have been shown to be sometimes thousand times more abundant than the genomic RNA during infection, which means that during sequencing, the 3’-terminal portion of the genome is mostly represented by sgRNAs. However, although a mix of genomic RNA and sgRNA is routinely sequenced, the contribution of sgRNAs to the observed genomic variants (if any) has not been investigated.
By analyzing the available public data, we show that presence of sgRNAs affects low frequency variants observed in patients. Furthermore, allele frequency in sgRNA can differ from that in the genome. We also found examples of high frequency variants that make it into the genome consensus that are only present in sgRNAs, and not in the genome. Taken together, these findings show that sgRNAs affect variant calling for SARS-CoV-2 genome sequences and imply that viral RNA from only small number of infected cells is typically sequenced.