Oral Presentation 27th Annual Lorne Proteomics Symposium 2022

DIA-seq: A novel isoform-level multi-omics data analysis pipeline identifies brain-specific changes in mouse model of behaviour (#26)

Manika Singh 1 2 , Selvam Paramasivan 3 , Annette McGrath 2 , Michelle Colgrave 4 , Tony Parker 5 , Kevin Dudley 1 , Pawel Sadowski 1
  1. Central Analytical Research Facility, Queensland University of Technology, Brisbane, QLD, Australia
  2. Data61, CSIRO, Brisbane, QLD, Australia
  3. The University of Melbourne, Parkville, VIC, Australia
  4. Agriculture and Food, CSIRO, Brisbane, QLD, Australia
  5. School of Biomedical Science, Faculty of Health, Queensland University of Technology, Kelvin Grove, Brisbane, QLD, Australia

Traditionally, proteomics research has relied on publicly accessible sequence databases for mining peptide fragmentation patterns and derive biological information. The limitation of this approach is that databases are only as good as genome annotation, and this can be incomplete, erroneous, or not available for non-model organisms. Here we introduce a reference genome-independent proteomics pipeline termed DIA-seq that leverages the advancements of long-read RNA sequencing (Iso-seq) and deep learning neural network tools for isoform-specific peptide spectra prediction and SWATH-based quantification. When applied to a known mouse model of behaviour, the pipeline allowed the detection of couple hundred novel peptides. These corresponded to exon-extension, exon-skipping, frameshift mutations, splice junction variation, and the rest were mapped to 3’, 5’-end, and pseudogenes. A significant portion of novel peptides was next verified using RNA-seq and DDA datasets. Pathway analysis confirmed the association of differentially abundant isoforms with known behavioural phenotypes. We show that DIA-seq improves the annotation of a well-characterised model organism and suggest that it has the potential to fast-track discoveries in non-model organisms.