Overview of AI-Driven Virtual Screening
Basic Methods of Virtual Screening
Virtual Screening (VS) is a computational strategy for identifying active compounds from large small-molecule libraries. It is generally divided into two main categories: structure-based virtual screening (SBVS) and ligand-based virtual screening (LBVS)[1].
LBVS relies on known target ligands and uses statistical models to identify structurally related lead compounds or analyze structure-activity relationships (SAR), enabling efficient prediction of compounds with potential activity.
SBVS, centered on molecular docking, depends on the three-dimensional structure of the target protein. It simulates ligand–target binding modes, affinities, and interaction features to identify compounds with potential binding activity.
Although both approaches are widely used in early-stage drug discovery, several inherent limitations remain. LBVS strongly depends on known active ligands, making it difficult to identify entirely new scaffolds or structurally novel molecules. In contrast, SBVS requires high-quality 3D structures of the target proteins, and its conformational sampling and energy calculations are computationally demanding, while the accuracy of scoring functions remains limited.
In recent years, AI, particularly deep learning (DL), has achieved rapid advances. Leveraging architectures such as Transformers and graph neural
networks (GNN), AI can autonomously learn high-dimensional representations and uncover hidden relationships from large-scale biological and molecular
data. This greatly improves molecular representation, target prediction, and interaction modeling, providing a new technological pathway to overcome
the limitations of traditional VS. Notably, candidate molecules designed using AI-driven approaches have shown success rates in Phase I clinical
trials that far exceed those of traditional drug discovery models, offering transformative opportunities to accelerate drug development and reduce
R&D costs[2].
AI has demonstrated immense potential in drug discovery. DL supports applications across five key areas: molecular dynamics simulations, molecular docking and virtual screening, end-to-end biomolecular structural modeling, structure-guided de novo drug design, and emerging sequence-based protein-ligand interaction modeling. Collectively, these advances reflect a shift from a physics-driven to a data-driven paradigm in protein-ligand interaction modeling[3].
Figure 1. Key methodological dimensions of deep learning in protein–ligand interactions[3].
Methods in AI-Driven Virtual Screening
Traditional docking methods perform poorly in handling protein flexibility (the induced-fit effect) and are too slow for large-scale library screening. DL, through approaches such as end-to-end prediction, generative modeling (e.g., diffusion models), and lightweight surrogate models, has significantly improved the efficiency and accuracy of drug candidates identification.
Figure 2. Overview of deep learning integration with molecular docking and virtual screening[3].
AI-Assisted Binding Pose Prediction
DL is increasingly offering new solutions to the challenge of protein flexibility, providing data-driven alternatives to traditional physics-based approaches and enabling more effective modeling of conformational variability.
DiffBindFR[4] introduces a flexible docking framework based on a full-atom SE(3)-equivariant diffusion model. It treats both the torsional degrees of freedom of side chains in the binding pocket and the ligand binding pose as optimization variables. Through iterative denoising in the joint space of ligand rigid-body motion, internal ligand rotation, and protein side-chain torsion, the model enables adaptive side-chain rearrangement to optimize interactions and reduce steric clashes.
Figure 3. Overall architecture of DiffBindFR[4].
During pose ranking, DiffBindFR can optionally use the Smina scoring function or a Mixture Density Network (MDN) to evaluate the confidence of generated conformations. The model generates diverse binding modes and selects those closest to the native state and most physically plausible. Across multiple benchmark datasets, DiffBindFR consistently outperforms traditional rigid or flexible approaches as well as existing DL–based docking models. Notably, it maintains high accuracy in reproducing ligand binding poses and protein side-chain conformations even when receptor structures are in the apo state or derived from AlphaFold2 predictions.
This precise modeling capability for fine-grained atomic interactions highlights the strong potential of DiffBindFR for virtual screening and drug discovery, particularly in research scenarios where protein flexibility plays a critical role in molecular recognition.
AI-Assisted Affinity Prediction and Scoring
Recent advances in DL have enabled a new generation of scoring functions designed to address key limitations of traditional approaches, including poor generalization, weak docking power (difficulty distinguishing near-native poses from decoys), limited ranking power (incorrect ranking of ligands for a target), and insufficient screening power (inability to distinguish active compounds from decoys).
DeepDock (DD) addresses these challenges using a quantitative structure–activity relationship (QSAR) DL model trained on docking scores from a subset of a chemical library to approximate docking outcomes for unscreened molecules, thereby iteratively eliminating unsuitable candidates. When integrated with the FRED docking program, DD can rapidly estimate docking scores for 1.36 billion molecules in the ZINC15 library across 12 protein targets, achieving 100-fold data reduction and 6000-fold enrichment of high-scoring compounds without significantly losing well-docked molecules.
Figure 4. Schematic illustration of DeepDock[5].
Beyond improving docking scoring functions, docking-generated poses have also been used to train binding affinity prediction models. reducing dependence on experimentally determined complex structures while increasing the diversity of training data.
PBCNet is a representative example. It is a physics-informed Siamese graph attention network that predicts and ranks the relative binding affinities of congeneric ligand series for a given protein target based on docking poses. With only 2–10 known ligands per target for minimal fine-tuning, PBCNet achieves performance comparable to FEP+, while providing approximately 100,000-fold higher computational efficiency. Retrospective active-learning simulations further demonstrated its ability to accelerate the lead optimization.
Figure 5. The framework of PBCNet[6].
AI-Assisted Ultra-Large-Scale Virtual Screening (ULVS)
A common strategy for accelerating VS is to train lightweight DL models to approximate and replace computationally intensive docking scoring procedures. These models typically use fixed-length ligand fingerprints as inputs and shallow neural network architectures, enabling rapid prefiltering of large compound libraries before detailed docking, thereby significantly reducing computational cost.
Active learning provides another effective strategy. In each cycle, the model selects the most informative compounds for docking evaluation, improving agreement between docking scores and neural network predictions with limited labeled data. The OpenVS platform exemplifies this approach, enabling efficient screening of billions of compounds. Applied to two targets—a novel ubiquitin ligase (KLHDC2) and the human voltage-gated sodium channel (Nav1.7)—OpenVS identified lead compounds with single-digit micromolar affinity within seven days.
Figure 6. Overview of deep learning guided virtual screening[7].
Case Studies of AI-Driven Virtual Screening
DrugCLIP — Ultra-Large-Scale Virtual Screening at the Human Proteome Level
The human genome contains approximately 20,000 protein-coding genes, yet about 90% of disease-related targets still lack effective pharmacological interventions. Traditional docking approaches face prohibitive computational costs when applied to such large chemical spaces. To address this challenge, researchers from Tsinghua University developed the deep contrastive learning framework called DrugCLIP[8].
Inspired by multimodal models such as CLIP, DrugCLIP maps protein structural features and small-molecule chemical features into a shared embedding space. This approach accelerates screening speed by seven orders of magnitude, reducing the time required to screen hundreds of millions of compounds from years to hours.
Figure 7. Ultrafast genome-wide virtual screening with DrugCLIP[8].
Experimental validation confirmed strong predictive performance. Screening against the norepinephrine transporter (NET) achieved a 15% hit rate, identifying inhibitors with stronger activity than existing drugs. For targets lacking known ligands or crystal structures (e.g., TRIP12), DrugCLIP achieved a 17.5% hit rate using AlphaFold2-predicted structures.
This work also produced GenomeScreenDB, a database containing docking results for approximately 10,000 targets and 500 million compounds, highlighting the emergence of the post-AlphaFold era, in which proteome-scale virtual screening becomes feasible.
AI-Driven Discovery of Food-Derived α-Glucosidase Inhibitors and Rational Dietary Recommendations for Diabetes
α-Glucosidase is a key therapeutic target for controlling postprandial blood glucose in diabetes. However, traditional screening approaches are inefficient when applied to large libraries of food-derived compounds.
Researchers from Wuhan University of Science and Technology developed an ensemble DL framework integrating directed message passing neural networks (D-MPNN) and graph convolutional networks (GCN) to construct a screening pipeline covering activity prediction, safety assessment, and interaction analysis.
Using this approach, the time required to screen 70,000 food-derived compounds was reduced from months to hours. The best-performing model achieved an AUC of 0.991, demonstrating state-of-the-art predictive accuracy.
The framework identified 75 high-potential α-Glucosidase inhibitors, 59 of which were previously unreported, and predicted several synergistic hypoglycemic combinations validated experimentally. Several compounds exhibited stronger activity than the clinical drug Acarbose, highlighting the potential of AI-driven strategies for dietary intervention and natural antidiabetic drug discovery.
Figure 8. AI-enabled screening of hypoglycemic drugs[9].
Summary
AI-driven virtual screening is systematically reshaping the paradigm of drug discovery, transforming traditional physics-based simulations into a data-driven approach. As AI-generated candidate molecules show higher success rates in clinical trials, VS is no longer merely an auxiliary tool but has become a core engine for exploring uncharted chemical space and developing therapies for previously “undruggable” targets.
MCE Virtual Screening
MCE provides professional molecular docking and virtual screening services supported by extensive compound database and high-performance computing resources. Optimized screening strategies can substantially reduce the number of compounds requiring experimental validation, increase the likelihood of identifying promising lead compounds, shorten screening timelines, and lower the risk of failure during subsequent lead optimization.
According to specific project requirements, customized and cost-effective service solutions can be designed to support high-quality early-stage drug discovery for researchers and scientific clients.
| Services |
Description |
| AI Driven Drug Screening |
Integrates AI with computational chemistry to enable high-throughput analysis of massive chemical databases. By leveraging machine learning (ML) algorithms for protein structure prediction and molecular design, this approach identifies key patterns and predictive scores to accelerate the discovery of potential drug candidates. |
| Virtual Screening |
A high-efficiency computational technique used to search libraries of small molecules to identify compounds most likely to bind to a specific drug target. Compared to traditional HTS, virtual screening offers a cost-effective and target-focused alternative for early-stage drug discovery. |
| Molecular Dynamics |
Utilizes Newtonian mechanics to simulate biomolecular motion and interactions at the atomic level. Molecular Dynamics provides critical insights into structural stability, conformational changes, and the mechanistic pathways of protein-ligand interactions. |
| Surface Plasmon Resonance (SPR) |
Utilizes the resonance between the extinction wave and the plasma wave of light in different media to construct a biosensing platform for detecting biomolecular interactions. It is used to detect the interaction between ligands and analytes on a biosensor chip. |