NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
March 16, 2026 Janani Ravi, PhD
A bug’s life: a data integration view of microbial genotypes, phenotypes, and diseases -
March 17, 2026 Roman Kogay
Diversification vs Streamlining: Selection Landscapes of Prokaryotic Genome Evolution -
March 24, 2026 Myeongsang Lee
TBD -
March 31, 2026 Yoshitaka Inoue
TBD -
April 7, 2026 Henrry Secaira Morocho
TBD
RECENT SEMINARS
-
March 10, 2026 Zhizheng Wang
Large Language Models for Gene Set Analysis -
March 5, 2026 Hasan Balci
From Sketch to SBGN: An AI-Assisted and Interactive Workflow for Generating Pathway Maps -
March 3, 2026 Gianlucca Goncalves Nicastro
Systematic identification of Salmonella T6SS effectors uncovers a lipid-targeting family. -
Feb. 24, 2026 Ajith Viswanathan Asari Pankajam
Systematic Evaluation of Gene Markers in Single-Cell Tissue Atlases -
Feb. 19, 2026 Jean Thierry-Mieg
On Magic2, an innovative hardware-friendly RNA-seq analyzer
Scheduled Seminars on Feb. 19, 2026
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Work in collaboration with Danielle Thierry-Mieg and Greg Boratyn
In view of updating our AceView cDNA-supported gene models and better integrate and summarize RNA-seq data from NCBI SRA, we need to quantify in detail the quality of public datasets and then extract the data to reconstruct transcriptomes in various experimental conditions. To do this at a large scale, we were lacking a very fast and accurate RNA-seq analyser. Our proposed solution is Magic2. I will outline our algorithms and present some applications.
Using the GO channels paradigm for synchronization, we designed our code to saturate the many CPUs of modern computers and respect their complex hierarchical memory by maximizing the use of the megabyte-sized fast caches and by minimizing requests to the 100 times slower gigabyte-scale RAM. The data are systematically organized as sorted lists, and besides raw alignments, all the other analyses are performed at essentially no additional cost, while the relevant information sits in the CPU cache, replacing post-processing by co-processing.
Given an SRA identifier, Magic2 extracts in a single pass all the elements required for gene reconstruction and quantification, as well as comprehensive quality-control (QC) metrics, including genomic contamination, RNA degradation levels, strandedness, gene dynamics, coding/non coding ratios and mosaics, It outputs the precise alignments, several strand-specific coverage plots, gene expression, exon junctions discovery and quantification, recognition of the start and end of transcripts, polyA, chromosomal breakpoints, adaptors and transspliced leaders. All this is done in about 5 minutes on a single machine for a typical 5 Gigabases RNA-seq SRA dataset.