PARTICLE SWARM OPTIMIZED RANDOM FOREST FOR SURVIVAL PREDICTION: A LEAKAGE -AWARE, NOISE ROBUSTNESS STUDY USING SEER PROSTATE CANCER DATA

Authors

  • BHUPESH KUMAR GUPTA
  • BHAVYA ALANKAR
  • HARLEEN KAUR
  • PARUL AGARWAL

Keywords:

Prostate Cancer, Random Forest, Particle Swarm Optimization, Noise Robustness

Abstract

Survival prediction in prostate cancer is both clinically vital and methodologically challenging due to risks of data leakage and the impact of real-world noise. This study presents a novel, leakage-aware- noise robustness machine learning pipeline, where all post-outcome and treatment-related features are rigorously excluded to ensure fair modeling. We employed Particle Swarm Optimization (PSO) to optimize Random Forest (RF) hyperparameters for survival prediction using the SEER prostate cancer dataset. Data are label- encoded and MinMax normalized and model evaluation is performed via stratified 5-fold cross-validation, repeated twice for reliability. To systematically assess robustness, Gaussian noise is injected into all numeric features at levels of 0%, 10%, 20%, 30%, and 40% standard deviation.

Our framework achieves exceptional performance at 0% noise: accuracy 0.9915, precision 0.9979, recall 0.9656, F1-score 0.9814, and ROC-AUC 0.9984. Even as noise increases to 30%, the PSO-tuned RF maintains F1 > 0.92 and ROC-AUC > 0.95, evidencing high resilience. At 40% noise, performance declines only modestly (F1 > 0.89, ROC-AUC > 0.94). This explicit combination of leakage prevention and noise stress testing demonstrates that metaheuristic-optimized RF models deliver robust and trustworthy survival predictions, even under challenging data conditions. Our approach establishes a reproducible benchmark for future clinical AI and provides a blueprint for robust model development in other biomedical prediction domains where data integrity is essential.

Downloads

Published

2025-11-03

How to Cite

BHUPESH KUMAR GUPTA, BHAVYA ALANKAR, HARLEEN KAUR, & PARUL AGARWAL. (2025). PARTICLE SWARM OPTIMIZED RANDOM FOREST FOR SURVIVAL PREDICTION: A LEAKAGE -AWARE, NOISE ROBUSTNESS STUDY USING SEER PROSTATE CANCER DATA. The Bioscan, 20(4), 55–68. Retrieved from https://thebioscan.com/index.php/pub/article/view/4324