1. Academic Validation
  2. Identifying microbial protease allergens through protein language model-guided homology

Identifying microbial protease allergens through protein language model-guided homology

  • Cell Syst. 2026 Mar 18;17(3):101510. doi: 10.1016/j.cels.2025.101510.
Kumar Thurimella 1 Elena Wu 2 Chenhao Li 3 Daniel B Graham 3 Róisín M Owens 4 Damian R Plichta 5 Caroline L Sokol 6 Ramnik J Xavier 7 Sergio Bacallado 8
Affiliations

Affiliations

  • 1 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, UK; School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
  • 2 Center for Immunology and Inflammatory Diseases, Division of Rheumatology, Allergy and Immunology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
  • 3 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
  • 4 Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, UK.
  • 5 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA. Electronic address: [email protected].
  • 6 Center for Immunology and Inflammatory Diseases, Division of Rheumatology, Allergy and Immunology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA. Electronic address: [email protected].
  • 7 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Computational and Integrative Biology and Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA; Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA. Electronic address: [email protected].
  • 8 Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge CB3 0WB, UK. Electronic address: [email protected].
Abstract

Emerging research links the gut, skin, and oral microbiomes to allergies, with serine proteases (SPs) identified as potential allergens. This study leverages deep learning and pre-trained protein language models (pLMs) to uncover allergenic SPs in metagenomic data. First, we develop a model to identify the catalytic serine residue in serine hydrolases, demonstrating how pLMs capture structural information. Next, we create a deep learning framework to detect candidate SP allergens across gene catalogs, using the conserved catalytic triad to identify homologs in gut and oral sites despite low sequence identity. Our model predicts a putative SP allergen resembling V8 protease, a known trigger for Protease-activated Receptor 1. It also identifies a cysteine protease similar to Der f 1 from dust mites. Immunization with these proteases induced allergic responses, validating their allergenic potential experimentally. This approach uncovers candidate allergens beyond traditional methods, offering new targets for allergy research. A record of this paper's transparent peer review process is included in the supplemental information.

Keywords

catalytic triad; cysteine protease allergens; gut microbiome; metagenomics; oral microbiome; protein language models; serine protease allergens.

Figures
Products