Computational Tools for the Identification and Interpretation of Sequence Motifs in Immunopeptidomes.
Alvarez, B., Barra, C., Nielsen, M. and Andreatta, M.
Instituto de Investigaciones Biotecnologicas, Universidad Nacional de San Martin, Argentina.
Department of Bio and Health Informatics, Technical University of Denmark, Denmark.
Recent advances in proteomics and mass-spectrometry have widely expanded the detectable peptide repertoire presented by major histocompatibility complex (MHC) molecules on the cell surface, collectively known as the immunopeptidome. Finely characterizing the immunopeptidome brings about important basic insights into the mechanisms of antigen presentation, but can also reveal promising targets for vaccine development and cancer immunotherapy. This report describes a number of practical and efficient approaches to analyze immunopeptidomics data, discussing the identification of meaningful sequence motifs in various scenarios and considering current limitations. Guidelines are provided for the filtering of false hits and contaminants, and to address the problem of motif deconvolution in cell lines expressing multiple MHC alleles, both for the MHC class I and class II systems. Finally, it is demonstrated how machine learning can be readily employed by non-expert users to generate accurate prediction models directly from mass-spectrometry eluted ligand data sets.
Proteomics 18(12): e1700252 (2018)