1. Academic Validation
  2. Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker

Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker

  • Nat Biotechnol. 2020 Sep;38(9):1087-1096. doi: 10.1038/s41587-020-0502-7.
Miquel Duran-Frigola  # 1 Eduardo Pauls  # 2 Oriol Guitart-Pla 2 Martino Bertoni 2 Víctor Alcalde 2 David Amat 2 Teresa Juan-Blanco 2 Patrick Aloy 3 4
Affiliations

Affiliations

  • 1 Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain. [email protected].
  • 2 Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
  • 3 Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain. [email protected].
  • 4 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain. [email protected].
  • # Contributed equally.
Abstract

Small molecules are usually compared by their chemical structure, but there is no unified analytic framework for representing and comparing their biological activity. We present the Chemical Checker (CC), which provides processed, harmonized and integrated bioactivity data on ~800,000 small molecules. The CC divides data into five levels of increasing complexity, from the chemical properties of compounds to their clinical outcomes. In between, it includes targets, off-targets, networks and cell-level information, such as omics data, growth inhibition and morphology. Bioactivity data are expressed in a vector format, extending the concept of chemical similarity to similarity between bioactivity signatures. We show how CC signatures can aid drug discovery tasks, including target identification and library characterization. We also demonstrate the discovery of compounds that reverse and mimic biological signatures of disease models and genetic perturbations in cases that could not be addressed using chemical information alone. Overall, the CC signatures facilitate the conversion of bioactivity data to a format that is readily amenable to machine learning methods.

Figures
Products