1. Academic Validation
  2. Development of a diagnostic model for ovarian cancer based on machine learning algorithms and functional analysis of key biomarker SOX17

Development of a diagnostic model for ovarian cancer based on machine learning algorithms and functional analysis of key biomarker SOX17

  • J Ovarian Res. 2025 Nov 4;18(1):237. doi: 10.1186/s13048-025-01809-w.
Xueyan Geng 1 Maopeng Yin 1 Hongxi Zhao 1 Zeyu Zhang 1 Shichao Liu 1 Yingjie Liu 1 Shoucai Zhang 1 Yongyuan Liang 1 Li Song 2 Guixi Zheng 3 4
Affiliations

Affiliations

  • 1 Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, 250012, Shandong, P.R. China.
  • 2 Department of Obstetrics and Gynecology, Qilu Hospital of Shandong University, Jinan, 250012, Shandong, P.R. China. [email protected].
  • 3 Department of Clinical Laboratory, Qilu Hospital of Shandong University, Jinan, 250012, Shandong, P.R. China. [email protected].
  • 4 Shandong Engineering Research Center of Biomarker and Artificial Intelligence Application, Jinan, 250012, Shandong, P.R. China. [email protected].
Abstract

Background: Ovarian Cancer (OC) demonstrates the poorest prognosis among gynecological malignancies, with five-year survival rates below 45%, primarily due to late-stage diagnosis. To address this challenge, we systematically identified OC-specific differentially expressed genes (DEGs) to develop a robust diagnostic model based on eleven machine learning algorithms. Furthermore, we explored the potential mechanism of key DEG in OC.

Methods: We acquired RNA Sequencing data of 426 tissues (352 °C tumor and 74 adjacent non-tumor) from the Gene Expression Omnibus (GEO) repository. Following rigorous batch effect correction and normalization procedures, DEGs were screened between tumor and non-tumor specimens. Furthermore, the resultant DEGs underwent comprehensive functional characterization, including Gene Ontology (GO) enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, protein-protein interaction (PPI) network, and immune microenvironment analyses. To optimize diagnostic feature selection, we implemented a tiered analytical approach combining F-test, LASSO regression and Pearson correlation. The curated gene subset served as input for developing machine learning classifiers, with the cohort partitioned into stratified training (70%) and validation (30%) subsets. Eleven distinct algorithms were evaluated through iterative 10-fold cross-validation, with model performance quantified via receiver operating characteristic (ROC) analysis, precision-recall (PR) metrics, calibration curve fitting, learning curve profiling, and decision curve analysis (DCA) assessment. Finally, we investigated the biological functions of one key gene, SRY-box containing gene (SOX17) in OC cell lines by in vitro experiments.

Results: We delineated 27 DEGs exhibiting distinct expression patterns in OC, with 16 upregulated and 11 downregulated genes. GO enrichment analysis suggested that DEGs were significantly enriched in response to folic acid, blood microparticle and alcohol dehydrogenase [NAD(P)+] activities. KEGG pathway analysis indicated that these DEGs were mainly involved in tyrosine metabolism, fatty acid degradation, ABC transporters and pyruvate metabolism. Immune microenvironment profiling revealed substantial M2 macrophage polarization and cytotoxic T-cell exhaustion in tumor tissues.The optimal diagnostic model was established based on five key genes (CD24, CLEC4M, SOX17, ADH1C and CHRDL1) and Logistic Regression algorithm was the optimal algorithm. The area under the receiver operating characteristic curve (AUC) and accuracy of the model were 0.93 and 0.875, respectively. SOX17 was upregulated in OC tumor tissues and knockdown of SOX17 obviously suppressed tumor cell proliferation and migratory.

Conclusion: Our multivariable diagnostic model based on five genes through logistic regression optimization, demonstrated robust discriminative capacity for OC. SOX17 functions as a suppressor and potential therapeutic target for OC.

Keywords

Diagnostic model; Immune cell infiltration; Machine learning algorithm; Ovarian cancer; SOX17.

Figures
Products